Posts tagged: API

Forbes : Why APIs Now Mean Business Transformation, Not Just Technology Infrastructure

comments Comments Off on Forbes : Why APIs Now Mean Business Transformation, Not Just Technology Infrastructure
By , March 23, 2016 12:12 pm

This article was originally written by Dan Woods and published on Forbes on March 23, 2016

Thirty-six years ago, my first consulting project was fixing an IBM 360 assembler language program that had broken because the behavior of one of the machine instructions had change subtly.  At that time, you could consider the definition of the IBM 360 assembler language an API to the hardware.

In the years that passed, the idea of the API has changed. We’ve moved from LPA libs, to JCL, to Microsoft DLLs, Java packages and Python libraries. The strict definition of the term “application programming interface” suggests that the API is a tool of abstraction and simplification. The application developer doesn’t want to want to have to worry about the details of how to send a text message from a mobile app or how to run an analytics routine in R or how to display a graphic using the D3 libraries.

Where is your API journey taking you?

Where is your API journey taking you?

Because of this legacy, APIs are most often seen as something deeply technical, and this impression lingers to this day. But as Google and Yahoo rose out of the wreckage of the dot com bust, as Facebook and Twitter were born, the technological essence of APIs is giving way to a more business-focused emphasis on enabling unfettered innovation.

The APIs of Google, Yahoo, Facebook, and Twitter in the mid and late 2000s were unusual in that they were public. These APIs did have a technological purpose. They abstracted the basic capabilities of publishers so that they became useful for developers. But the point and the value was unrestricted innovation. SAP had APIs decades before called BAPIs that allowed their massive ERP system to be controlled. The trick that the Internet darlings performed was to figure out that by making APIs public, by allowing self-service, a seething tribe of creative developers could put those APIs to use creating new ways to put the core services to work. John Musser, the founder of Programmable Web, was the prophet of this era.

For example, Google Maps was one of the first public APIs to be created, inspired in part because developers figured out how to use Google’s capabilities on their own in an unauthorized manner. Google quickly realized that this was a great idea, published APIs, and a flood of innovation and use of Google Maps followed. Yahoo, Twitter, Facebook, and Amazon followed the same pattern for many of their services.

So in this era, John Musser created a catalog of public APIs and Chet Kapoor of Apigee and other companies like Mashery, now part of Tibco, and others created products to allow everyone to get in the game of creating public APIs. Working with the CTO of Apigee, Greg Brail, and Daniel Jacobson, VP of Edge Engineering at Netflix, I co-authored APIs: A Strategy Guide (O’Reilly), a book about API strategy. (Disclosure: I have done work as an analyst and content marketer for Apigee, Intel, Tibco, Microsoft, and other companies who sell API technology. For a full list of my clients visit EvolvedMedia.com/clients.)

https://www.google.com/recaptcha/api2/anchor?ar=1&k=6LeLC2wpAAAAAHxdEQ59hxseRouCtGrCo22PUjuu&co=aHR0cHM6Ly93d3cuZm9yYmVzLmNvbTo0NDM.&hl=en&v=WV-mUKO4xoWKy9M4ZzRyNrP_&size=invisible&cb=fkbl3y6x75i2

Forbes Daily: Join over 1 million Forbes Daily subscribers and get our best stories, exclusive reporting and essential analysis of the day’s news in your inbox every weekday.Sign Up

By signing up, you agree to receive this newsletter, other updates about Forbes and its affiliates’ offerings, our Terms of Service (including resolving disputes on an individual basis via arbitration), and you acknowledge our Privacy Statement. Forbes is protected by reCAPTCHA, and the Google Privacy Policy and Terms of Service apply.

The point of that book was to explain the notion of APIs from top to bottom, but also to explain how APIs had become far more than just technology infrastructure.

How APIs Have Become the Building Blocks of Business Transformation

Daniel Jacobson’s evolution of APIs at Netflix shows how APIs can become building blocks for business transformation. Jacobson is in charge of creating the API infrastructure to support product development. Netflix, like many other companies, had a set of “resource-based” APIs that exposed the core capabilities of the key operational systems. Using these APIs, you could get to the catalog information on Netflix and invoke services to play movies.

Jacobson and his team determined that the code they were writing to create Netflix clients for various families of devices such as iPhones, iPads, Android tablets, DVD players, and so on was far more complex than it needed to be. The problem was that resource-based APIs were built to expose the capabilities of key systems in a general way. That mean that the client code had to do all sorts of stuff like transform and combine data, connect information from a multitude of sources, and create data structures and navigation that matched the needs of the device.

Jacobson’s team realized that a new layer of APIs was needed, which they called experience-based APIs (see “How A Netflix Tech Innovation Can Unleash Creativity in Your Business”). This layer of APIs was created not by the teams in charge of the key operational systems, which created the general purpose resource-based APIs. Rather, the developers in charge of creating clients for all the devices used to access Netflix defined just the APIs needed for each family of devices. The experience-based APIs moved all the code for transforming and formatting data and adapting the resource-based APIs to the needs of the device into the experience-based API. Jacobson’s team found that this process could be accelerated by allowing the experience-based APIs to be created using a scripting language. Jacobson and his team packaged this idea as the Nicobar open source project.

APIs as a Unit of Digital Business Design

FinTech and other industries have taken the general pattern of experience-based APIs and adapted it to different circumstances. Consider these examples:

  • Tradier has created a Brokerage-as-a-Service offering that is a combination of resource- and experience-based APIs. Tradier customers are able to embed the ability to research and trade stocks within their existing applications.
  • Orchard Platforms has created a platform powered by APIs that normalizes access to lending marketplaces. Using this platform, large financial institutions can make loans on platforms like Prosper and Lending Club using automated underwriting.
  • Xignite has created a set of APIs that normalizes access to a vast amount of financial data. Some of these APIs are resource-based, providing general access, and some are experience-based, focused on meeting a specific need.
  • Bechtel has introduced huge amounts of efficiencies by creating a platform as well as resource- and experience-based APIs to enable the creation of mobile apps. On job sites around the world, instant access to information has dramatically reduced delays because workers don’t have to go to sheds to access computers.

The point to remember is that all of these APIs are both technical artifacts and units of digital business design. As technology infrastructure, both resource-based and experience-based APIs need to be supported by generic API gateway capabilities such as:

  • Proxy support
  • Authentication/Authorization
  • SSL/TLS termination
  • Encryption
  • Logging
  • Load balancing
  • Routing
  • Throttling
  • Lightweight orchestration

But there is another business-focused process going on here as well: the process of developing the unit of business design. In other words, resource-based and especially experience-based APIs are not designed simply as technology artifacts, but as ways to enable a business purpose. The people involved in designing the desired digital business experience must be involved.

Enterprise-grade API management – the kind that can power digital transformation and drive a business – goes beyond an API gateway and supports an ecosystem of digital collaboration. Many types of software follow this path. The software starts out supporting specific, targeted functionality but ends up adding features because people want to collaborate and need supporting capabilities as the original function becomes more important.

When you are using APIs as a unit of digital business design, you need capabilities such as:

  • Analytics about developers, operational performance, app performance, and business metrics
  • A customized developer portal
  • API monetization
  • Multi-tenancy and support for high scale
  • Global policy enforcement
  • SDKs for all popular development environments to simplify the process of developing an API
  • Support for adding application code and data to the API infrastructure to enable distribution of code and data
  • A powerful transformation engine to speed the task of adapting resource-based APIs to experience-based APIs

It is perfectly possible to use APIs as a unit of business design without an enterprise-grade API management platform. The challenge then becomes creating enterprise capabilities when you need them and supporting them over time.

The API marketplace is evolving rapidly. Amazon recently released an API gateway product. Microsoft has one as well, and there are various types of open source toolkits and such. The functionality of many of these vendors’ products supports the notion that APIs are simply technology infrastructure.

But vendors like Apigee would argue that to really use APIs as a unit of digital business design, to accelerate the creation of business value that comes from apps that can be developed faster and that cost less to maintain because of practices like experience-based design, you need much more than an API gateway.

“APIs have gone well beyond just bits of technology – they are essentially the foundation for digital transformation,” said Apigee’s Kapoor. “We believe that businesses will need to use APIs and API management to support digital business initiatives or risk becoming increasingly irrelevant.”

The challenge in my view is the legacy that APIs have as a unit of technology infrastructure. The people who are buying solutions are often thinking small about APIs rather than thinking big. The challenge for anyone using APIs is to stop thinking of them only as technology and to start thinking about the results you want to achieve and how APIs can play a role in getting there faster.

Dan Woods is on a mission to help people find the technology they need to succeed. Users of technology should visit CITO Research, a publication where early adopters find technology that matters. Vendors should visit Evolved Media for advice about how to find the right buyers. See list of Dan’s clients on this page.

Forbes : How A Netflix Tech Innovation Can Unleash Creativity in Your Business

comments Comments Off on Forbes : How A Netflix Tech Innovation Can Unleash Creativity in Your Business
By , February 11, 2015 5:59 pm

This article was originally published by Dan Woods on Forbes on February 11, 2015

In the tech community, Netflix is a company that everyone learns from. But what is less widely celebrated are the broader business lessons of what Netflix has accomplished. I was bonked over the head by this point when I sat through a presentation about Experience-based APIs by Daniel Jacobson, VP of Edge Engineering for Netflix.

Jacobson and his team that builds and maintains the Netflix API platform have developed a way of working that solves a problem that is common in any company that is building its own technology: How do you best leverage the skills of everyone on a team so you can move as fast as possible? Jacobson and his team have figured this out for a general use case: Building many different applications based on a complex set of centralized resources. The structure of this solution, which relies on the idea of separation of concerns between different groups of developers, can help optimize the structure of many different types of teams.

For those who work on building software, Jacobson has embedded in the Nicobar open source project the technical capabilities needed to support for the separation of concerns and rapid innovation. I will describe briefly the Nicobar project, which was released yesterday, at the end of this article.

What are Experience-based APIs?

To understand the beauty of Experience-based APIs, we first must understand the challenge facing Netflix with respect to supporting many devices. Netflix client developers write software applications that allow Netflix to deliver its content over thousands of different devices types. Netflix has many teams of client developers, each of which specialize in creating applications for different types of devices. The goal of Jacobson’s team and the client development teams is to deliver a great user experience and make sure that Netflix works perfectly on any device you might use, from an iPad, to an Android phone, to a DVD player or smart TV.

In the Resource-based API platform the work gets done this way:

  • The central API team creates APIs that make it easy to get to all of the data and resources of a company. These APIs are all about creating a general way for programs to get access to customer data, data about films, recommendations, and so on. The resources that a client program may need include data and services for:
    • User info
    • Movie metadata
    • Movie Ratings
    • Similar movies
    • My list
  • The central team implements the APIs to deliver scalability and excellent performance.
  • The client team takes these APIs and creates applications for a variety of devices. The problem is that the code for handing errors and adapting the data to the needs of the device all resides in the client application code.

The Resource-based API paradigm works just fine and is used all over the world. But Jacobson and his team wanted to find a way to move faster, create better software, and enable more optimized experiences for Netflix subscribers. They realized that different devices potentially needed a different set of data to support the user experience. The best way to make the client software simpler would be to have a new API that allowed as much as possible of the data translation, error handing, and other utility functions to happen on the server, not on the client.

But if you asked the central API team to create versions of APIs for all the different families of devices, the team would be overwhelmed. In addition, it would be hard for each of the client teams to actually ask for the right design for the APIs they would need. It would require trial and error and that would be hard to accomplish with many client teams competing for the attention of the central team.

Jacobson and his team realized they needed to create a new layer of API they called an Experience-based API. The idea is that the client team should design the perfect API to serve the needs of the device on which the client application will run.

For example, when writing the application code to create a page on the client using the resource-based approach, the developer may have to call a dozen APIs, handle errors from all of them, and then format the data for use on the page. All of this communication takes place over HTTP, which is less efficient than the networking methods used on the server.

But if the client developers could create their own API to support the needs of that page, and move all of the code for data access from the client to the server, lots of good things would happen such as:

  • Client code would get simpler.
  • More code could be shared across families of devices.
  • New device families could be supported faster in a way that best fit the needs of the device.
  • Performance could improve because network communication is optimized, reducing round trips from the device to the server.

So Jacobson and his team set out to create a new layer to enable client developers to create their own APIs. Here’s how it works:

  • Client developers analyze the application that they want to build for a particular client and then decide on the APIs that they will need to support them.
  • They define the endpoints of the APIs and protocols to handle the requests and responses that are optimized for their client
  • The APIs are implemented by the client developers using the Groovy scripting language which is used to call Java APIs that will gather the data needed to handle the request. Groovy is easier for client developers to work with than Java but still is scalable enough.

This architecture, in the case of Netflix powered by Nicobar, allows the client teams to move faster and reduce the technical debt by making the client programs simpler. The code enabled by Nicobar is more shareable and easier to maintain.

“There are some significant engineering benefits to this structure, but the real win is increasing the velocity of innovation,” said Jacobson. “Because Experience-based APIs properly separate concerns across the two teams, the work gets done faster. The client teams control their own velocity and the server teams can focus on building the platform that supports them.”

Lessons for the Rest of Us

Jacobson, Greg Brail, Chief Architect of Apigee, and I were co-authors of “APIs: A Strategy Guide”. During the time we wrote the book in 2010, the world was focused on the impact that public APIs were having. Google, Facebook, Twitter, and Amazon all harnessed massive amounts of attention and energy as people put their public APIs to work.

At that time Jacobson argued that internal use of APIs would have the far larger impact. He saw that few companies, even Netflix, would be able to get a large amount of value from public innovation. Instead, the real victory would come as internal APIs increased efficiency, productivity, and the velocity of innovation. Experience-based APIs are one major proof point for Jacobson’s way of thinking.

I believe this two-tier structure that allows those closest to the problem to translate the Resource-based APIs to a more suitable form and create APIs that are adapted to their needs can be used in all sorts of companies who how have bottlenecks in their IT departments.

To make this work you need a front office IT staff, one that works with the business. These are the equivalent of the client teams at Netflix. The back office IT provides Resource-based APIs to the front office teams who then use simple tools, scripting languages, model driven development, content management systems, and so on to adapt the resources to the needs of the business. In large Wall Street Firms it is common to have highly skilled front office IT teams. I think that many companies could benefit from this model if they choose the right tools and create the right separation of concerns.

Nicobar 

For companies who create -multiple applications and are committed to the Java ecosystem, Nicobar is definitely worth a look. This library provides the core functionality to implement the Experience-based API program. Right now the Groovy language is the way that the adapter code to assemble the data needed for the Experience-based is written, but in the future other JVM languages may be supported.

The key capabilities of Nicobar are:

  • Rapid application delivery via dynamic updates of components
  • Ability to add support for multiple JVM languages

To learn more go to Nicobar: Dynamic Scripting Library for Java.

And, for the truly nerdy, here’s what the project wiki says about why the project was called Nicobar:

“Nicobar refers to a remote, archipelagic island chain in the Eastern Indian Ocean. It is aUNESCO World Biosphere Reserve, and like many other remote volcanic islands, it has rich biodiversity, and various endemic species are found here. Taken together with the surrounding Andaman group of islands, six different human tribes occupy these islands, each with its own spoken language. We felt that these isolated islands, evolving independently are a good metaphor for the polyglot, modular runtime framework we were setting out to build.”

Dan Woods is on a mission to help people find the technology they need to succeed. Users of technology should visit CITO Research, a publication where early adopters find technology that matters. Vendors should visit Evolved Media for advice about how to find the right buyers. See list of Dan’s clients on this page.

Presentation : API Revolutions and the API Strategy Conference

comments Comments Off on Presentation : API Revolutions and the API Strategy Conference
By , February 25, 2013 7:31 am

Congratulations again to Kin Lane and everyone at 3Scale for a very successful and fulfilling API Strategy Conference. There were a lot of great presentations and panels as well a many very interesting hallway conversations.

And I was exited to be able to speak at the event! Embedded below are the slides from my presentation, complete with copious notes on each slide to provide the context of what I said during the talk.

The focus of the presentation was on API revolutions. We have seen a number of them in the recent years, but there have been significant and substantial changes for Netflix and for some others that warrant discussion. The question that remains is: Are these changes specific to a small handful of companies or are these companies representing things to come for the API world as a whole?

-Daniel

7 Ways to Make Your API More Successful

comments Comments Off on 7 Ways to Make Your API More Successful
By , December 10, 2012 8:35 pm

The purpose of a content API is to make the content available to its audience in the most useful and efficient way possible. To be a useful API, it needs to help the developers make their jobs easier. This could mean a wide range of things, including making it easier to dig into the API, allowing for greater flexibility in the responses, improved performance and efficiency for both the API and its consumer. Below are seven development techniques (all of which are part of the NPR API) that can help content providers improve the usefulness and efficiency of their APIs on both sides of the track. These techniques played a critical role in the success of the API which now delivers over 700 million stories per month to its users (more stats on the NPR API coming soon on our Inside NPR.org blog).

Be Flexible: Support Multiple Output Formats
Making the API as available and accessible as possible is very important in drawing developers to use it. So providing the content in a range of formats will increase the likelihood that the developer can rely on existing libraries and make as few changes to the code as possible.

The NPR API offers eight different output formats in an effort to improve efficiency for the developers. Above is a graph demonstrating the distribution of requests for each of the formats in July of 2009. As you can see, the majority of requests are to our proprietary XML markup (NPRML). That also means that almost 50% of the requests, or about 20M requests per month, use the other seven formats. In offering offering these other non-proprietary XML formats, the API is able to support developers that may have existing applications that pull in content in one of these standardized format, such as MediaRSS or Atom.

To make it even easier for people to use the API, NPR also launched with JavaScript and HTML “widgets”. The other six formats require more sophistication in order to put the content in an application or website. The widgets, however, are pre-designed feeds of NPR content (based on the developer’s selections) that can be easily dropped into a page.

Be Efficient: Handle Partial Response
This concept is now starting to get some more traction, now that Google announced partial response handling for some of their APIs. NPR’s API also makes extensive us of this feature because it really is tremendously valuable to the provider and the consumer of the API. For example, NPR stories contain a wide variety of fields and assets in the API. If the consumer is forced to handle the complete document, even if they only want a few fields, they have to endure all of the latency issues from the API itself as well as the additional processing power needed to handle the undesired fields.

As a result, NPR incorporated a “fields” parameter (the same parameter name used by Google) that can be used in the query string to limit the resulting document to only the fields of interest. This approach creates documents that are smaller and much more efficient. Overwhelmingly, more requests to the NPR API contain the fields parameter than those that do not (in fact, it isn’t even close).

Here are a few examples of how the same query to the NPR API, returning the same stories, delivers different documents based on the fields parameter (you will need to register for your own NPR API key to execute these queries):

http://api.npr.org/query?id=1001&apiKey=your_api_key

http://api.npr.org/query?id=1001&fields=title&apiKey=your_api_key

http://api.npr.org/query?id=1001&fields=title,teaser,text,image,audio&apiKey=your_api_key

An extension of partial response is to allow the developer to specify the number of items they would like in return. Some APIs return a fixed number of results, which can bloat the document just like the extra fields can. The NPR API, to counter this, allows the developer to pass in the number of results desired (with a fixed ceiling for any given request). To dig deeper in the results, we incorporated a “pagination” feature in the API. Here are some examples of how to control the number of stories:

http://api.npr.org/query?id=1001&numResults=5&apiKey=your_api_key

http://api.npr.org/query?id=1001&numResults=5&startNum=6&apiKey=your_api_key

Give Them Control: Allow for Customizable Output Markup (“Remapping Fields”)
As mentioned in the transform section, if the API can easily serve existing applications that expect specific markup, it potentially increases adoption and improves developer efficiency. To extend that functionality, the NPR API offers a function that we call “Remap” which essentially lets the developer modify the name of one or more XML elements or attributes in the output at request time. This is done in the query string and the API transforms the markup accordingly in real-time. Here are a few examples:

In this example, the remap parameter changes the story title to < specialTitle >:

http://api.npr.org/query?id=1001&remap=list.story.title:specialTitle&apiKey=your_api_key

In this example, the remap parameter changes the story title to < specialTitle > and it changes the image caption to < imageCaption >:

http://api.npr.org/query?id=1001&remap=list.story.title:specialTitle,list.story.image.caption:imageCaption&apiKey=your_api_key

In this example, the remap parameter changes the audio element’s id attribute to be named audioId:

http://api.npr.org/query?id=1001&remap=list.story.audio~id:audioId&apiKey=your_api_key

Another benefit to remap (which we have fortunately not had to use) is that it can be used to handle backward compatibility as the API grows and changes. NPR’s philosophy is to make sure that upgrades do not adversely affect existing functionality. That said, if an element or attribute does need to change, we could execute apache rewrites for all old API calls and have the remap function applied to have the output match that of the old markup. Alternatively, the developer could simply modify their API call instead of having to change their codebase to match the markup changes. (Although we do not intend to change existing markup, if we do, we would advise developers to upgrade their code accordingly. That said, rather than having the applications fail during the transition, remap could be used to temporarily handle requests until the full codebase can be upgraded).

Be Fast: Set Up a Comprehensive Caching Architecture
Performance is another critical aspect of APIs when it comes to enticing developers to use them. After all, if the API is sluggish, developers may not want to depend their application on it.

Smart caching of queries and results can really improve the speed of the system. NPR has implemented several layers of caching for the API, as follows:

  • Base XML – Caching the full document for each item is important to prevent the system from executing disk I/O before doing any transform. We cache the Base XML first in memory and secondarily as XML files to eliminate the need to access our content database.
  • Full Query Results – When compiling the list of items to be returned for any given story, it is important to cache the full list because popular applications that have many concurrent users (such as NPR Addict) are very likely to execute the same queries and expect the same results. The cached result is a single document containing the full list of all items and the full base XML for each.
  • Transformed Query Results – The calling application, such as NPR Addict, expects the document to be transformed to fit the application’s needs. So, the results that get cached in Full Query Results may get transformed to MediaRSS while simultaneously removing extraneous fields. Caching the final results that get returned to the calling application enable fastest performance without compromising the system’s ability to use the other caching layers to produce different versions of the document.

npr_architecture_diagram_490
Click here for an enlargement of this architecture diagram

Give Them Tools: Provide a Query UI with the Documentation
There are two truths about developers and documentation: the former always expects the latter, but seldom uses it. Of course, you cannot have an API without providing comprehensive documentation. That said, offering a simple user interface that helps developers get what they need from the API wil increase adoption and make life easier for them.

NPR’s API launched with a tool that we call the Query Generator. This tool exposes more than 6500 query-able IDs, methods for controlling the output format, fields to be returned, date and search restrictions, pagination, and more. Using the interface, the developer can select their options and have the tool create the query string for their API request. The developer can also see the results of that query inline before commiting it to their application. Almost exclusively, developers (including the NPR staff) use this tool to create queries, rather than reading the documentation.

Be Open: Eliminate Rate Limiting
Throttling or limiting access to APIs is an inherent disincentive for developers. Moreover, it is actually a detriment to the API provider. After all, the purpose of the API is to grant access to the content. If a given developer can only call the API 5000 times a day, and that developer creates a hugely popular application, the rate-limiting will inherently stifle the developer and the viral nature of the API.

Granted, most APIs use rate-limiting or tiered access levels to allow business people to control the graduation of API users. This seems counter-productive to me though. The better approach is to open access completely, identify those incredibly successful usages, then work with the developer accordingly on a mutually beneficial relationship. This way, applications are given full ability to grow and mature without arbitrary constraints.

Other APIs implement rate-limiting to protect the servers from unexpectedly high load. This is a legitimate risk which, if encountered, can adversely affect the performance of all users. That said, building complicated features into the system, such as rate-limiting, can be much more costly than configuring a scalable server architecture. Moreover, each request to the API will see slight latency increases as a result of the rate-limiting analysis. I know that latency is marginal, but why introduce any additional latency, especially when creating disincentives for developers?

Be Agile: Practice Iterative Development
Building your API over time has several benefits. First, it signals to the developer community that this API is meaningful to the provider and will continue to grow and get supported over time. This sounds trivial, but it is a very important part of the relationship with the community. If developers are not sure about your commitment to the API, are they likely to spend their own time building an application around it?

Another benefit of iterative development is that you do not have to get the API perfect the first time. I will qualify that by saying that, as a matter of principle, any release for an API should be done with the expectation that it will be supported for a long time. This is important because changes to existing API features will break the applications of those that use them. When I say the API doesn’t have to be perfect, I mean it does not have to be complete. New features can (and should) be added over time, extending its capability and making it more attractive for potential developers.

To put it another way, you will not have every detail of the API solved at the initial launch. It is much better to go live with the features that you know well while deferring those that you do not. Trying to cram in tenuous requirements will create headaches for you and for the community down the road. Spend the time necessary on figuring out the features, the supporting markup, the access and error methods, etc. before you commit to an API feature.

Content Portability: Building an API is Not Enough

comments Comments Off on Content Portability: Building an API is Not Enough
By , December 10, 2012 8:20 pm

This post first appeared on ProgrammableWeb.com

My previous posts focused on COPE (Create Once, Publish Everywhere) and content modularity, the fundamentals for ensuring that content can be managed and distributed to virtually any platform. But ensuring that your content can be delivered to those other platforms does not mean that it can display appropriately on them.

Content often contains very important semantic markup, used to emphasize the content, relate it to other content, describe it, etc. By markup, I mean HTML, character encodings and microformats, among others. Although this markup is important to the content, it also makes it “dirty”, potentially compromising its ability to live and flourish in the myriad places to which it will get distributed. No matter how modular the content is in the database, if it is sullied by this markup, it is not truly portable. As a result, just building an API is not enough. The API needs to be able to distribute the content to any platform in a way that each platform can handle.

To demonstrate this problem around portability, I often use the pre-iPhone iPod as an example. This device did not parse HTML. Rather, tags would simply be printed as strings. When podcasting took off, some NPR titles had HTML tags in them, including < em > and < strong >. Because iPods were not able to render the HTML, titles would like something like, “This is a < em >great< /em > title!” Similar, another fail scenario that is relevant to NPR is an HD Radio display. These devices are also not able to render markup printing these tags to the screen.

There are two primary ways of handling this problem. The more common way is to store the dirty content in the database and to maintain a series of scripts that handle it on the way out. Although this is potentially effective for specific goals, there are some significant problems with it. For starters, stripping out the markup as it gets distributed means that the markup still lives with the content in the database. As a result, as new platforms arise and as markup standards evolve, the markup in the content will remain static. So, each distribution script that handles the markup will need to be carefully maintained and updated accordingly. Moreover, since each distribution platform could have its own compliance with the various forms of markup, each of these outputs may require their own script to handle the content (that is, the more distribution channels there are, the more scripts there are to maintain). Finally, the majority of systems that allow markup in this way do very little to limit the type of markup that is used. Because of the tremendous variance in how the markup is used in the content, these scripts will need to be increasingly complex, causing the accuracy to be tougher to guarantee.

Rather than handling the cleansing process on the way out, NPR has created a system that cleans the content on the way in. The goal here is to save the content in the database in a modular AND portable way. That means that each discrete object type is stored separately while ensuring that text content in each object is devoid of markup. I call this system “Markup Addressing” and here is how it works:

  • A range of fields in the system are markup-enabled, allowing Editors and scripts to include HTML and other markup values in the content directly.
  • For each field that allows markup, very specific values are allowed. Some fields allow more, some less, but all fields are limited to nothing more than the 25 tags and character encodings that the system as a whole allows.
  • We apply client-side handling to ensure that no markup beyond those allowed by the field are used for that field. We also enforce proper nesting and syntax for the markup.
  • Before saving the clean and acceptable markup to the database, we identify all markup for each field and begin our “addressing”, which is essentially identifying the character numbers of the markup in the text. For each tag or character identified, we find the character position for where it starts. If applicable, we also find the character position for the close tag. We then strip out the markup from the text and store in a relational table the address in the text that the markup was found.
  • This relational table does not include the markup itself. Rather, that is stored in a separate table that is the authority for which tags are allowable. The image below represents roughly how we store this kind of information.
  • The diagram above represents how NPR strips out markup from content fields prior to saving to the database. The markup is then “addressed” and stored in a series of relational tables, enabling any presentation layer to present the content with or without markup. It even allows the markup to be easily transformed as needed before pushing to different platforms. (Click here for an enlargement of this diagram)

There are several very tangible benefits to this approach, all of which improve overall portability of the content. These benefits include:

  • Distributing the content without any markup is as simple as pushing out the content from the database directly, without any further processing. This is helpful for platforms that are unable to render markup, including those mentioned in my examples above.
  • Distributing the content with the original markup is just as easy by reassembling the markup based on the addresses.
  • It is easy to only distribute only some of the markup based on what the markup is. An example of this is if the destination product wants to emphasize content but does not want to allow for links to other content.
  • As markup, such as HTML tags, get deprecated, this approach only requires a change to one field in the entire database, instead of having to cycle through the database to find all instances of the old tag to replace it with the new one. For example, < b > has been replaced with < strong >, so we simply need to modify the one record in the authority table for tags to make this change apply across the entire set of content.
  • As new platforms arise, if they require specialized markup, it is easy to transform the existing markup to anything else required for these new platforms.
  • Adding new allowable tags is easy by simply extending the client-side handling and the authority table. These tags can include microformats and other business-critical tags that help describe the content. For example, NPR could very easily create a tag for our internal purposes for < station >, such that for every station that gets tagged, rather than rendering this tag, the system will look up the station in our database and replace that < station > tag with a hyperlink to the station’s home page.

NPR’s system applies these methods to specific fields throughout our CMS. When distributing the content through the API, however, we only currently apply the power of Markup Addressing to the story full text. The API has a field for < text > which removes all markup for the syndication as well as < textWithHtml > which reassembles the content with all markup. Extending this to all other markup-enabled fields would be quite easy under this system, although there has not yet been a need to do so.

Regardless of which approach is taken, there is one other significant issue that prevents true portability of content… the content itself!

I create a distinction between “content” and “calls-to-action” to help clarify this problem. Content is the information that the users actually want to consume. It could also include metadata, which helps to accurately describe the content that the user is actually consuming. Within this content, applying markup that emphasizes it or relates it to other content should be done in such a way that the meaning of the content is unaltered by the abstraction of the markup from the content. Here is an example of an appropriate way to apply markup to the content:

This image is part of an NPR story that demonstrates appropriate use of HTML within the body of the text. The artists’ names are linking to artist pages, but the meaning of the story is completely unaltered by the removal of the markup.

In this scenario, removing the links to the artists’ names in the text, for example, does not alter the meaning of the content. Of course, it does diminish some of its power as the user cannot easily learn more about these artists within the context of this story. That said, distribution of this content without those links will not adversely affect the meaning of the story. The artist names are valid and appropriate within the body of the text.

Applying markup within the content that is calling the users to perform an action, on the other hand, poses a different problem. Here is an example of a call-to-action within the content:

This image is part of the same NPR story demonstrating the use of calls-to-action, which make the content unable to provide meaning without the context of the markup. These calls-to-action make the content less portable, specifically to platforms that are not markup enabled.

Notice that within this content there is a link to related content where the link text is “Listen to The Entire Album”. Abstracting away the link itself actually alters the meaning of the text as the text provides no information about the audio asset. There is no indication as to what album or who the artist is. So, as this content gets distributed to platforms (both known and unknown), pulling out the markup actually adversely affects the content.

This is a problem for every content producer, including NPR. Although we have gone through great measures to put the content in the best position to live and thrive in all platforms, there is still work to be done to ensure the success of our distribution strategies. Some of these efforts are technical in nature. Others could impact editorial processes and style guides. But in all cases, our goals are the same… to be a media organization that produces great content for our users, wherever they wish to consume it.

Panorama Theme by Themocracy