Posts tagged: Netflix

Redesigning the Netflix API

comments Comments Off on Redesigning the Netflix API
By , September 10, 2023 7:55 pm

This post originally appeared on the Netflix Tech Blog on February 8, 2011.

This is Daniel Jacobson, Director of Engineering for the API here at Netflix. The Netflix API launched in 2008 with a focus on the public developer community. The expectation was that this community would build amazing and inspiring applications that would take Netflix to a new level in serving our members. While some fantastic things were built by this community, including sites like Instant Watcher, the transformational moment for the API was when we started to use it to deliver the streaming functionality to Netflix ready devices. In the last year, we have escalated this approach to deliver Netflix metadata through the API to hundreds of devices. All of this has culminated in tremendous growth of the API, as represented by the following chart:

The tremendous growth of the API over the last year is due to a combination of the increased number of users, more activity by our users, Netflix’s steady adoption of new devices over time, as well as chattier interfaces to the API.

Growing the API by about 37× in 13 months indicates a few things to us. First, it demonstrates the tremendous success of the API and the fact that it has become a critical system within Netflix. Moreover, it suggests that, because it is so critical, we have to get it right. When reviewing the founding assumptions of the API from 2008, it is now clear to us that the API needs to be redesigned to carry us into the future.

Establishing New Goals

In the two-and-a-half years that the API has been live, some major, fundamental changes have taken place, both with the API and with Netflix. I already mentioned the change in focus from an exclusively public API to one that also drives our device experiences. Additionally, at the time of the launch, Netflix was primarily focused on delivering DVDs. Today, while DVDs are still part of our identity, the growth of our business is streaming. Moreover, we are no longer US-only. In October, we launched in Canada with a pure streaming plan and we are exploring other international markets as well. Because of these fundamental changes, as well as others that have cropped up along the way, the goals of the API have changed. And because the goals have changed, the way the API needs to operate has as well.

Decreasing Total Requests

An example of where the current design is inefficient is in the way the API resources are modeled. Today, there are about 20 resources in the API. Some of these resources are very similar to each other, although they each have their own interfaces and responses. Because of the number of resources and the fact that we are adhering very closely to the REST conventions, our devices need to make a series of calls to the APIs to get all the content needed to render the user interface. The result is that there is a high degree of chattiness between the devices and the APIs. In fact, one of our device implementations accounts for about 50% of the total API calls. That same device, however, is responsible for significantly less streaming traffic. Why is this device so chatty? Can we design our API to reduce the number of calls needed to create the same experience? In essence, assuming everything remains static, could the 20+ billion requests that we handled in January 2011 have been 15 billion? Or 10 billion?

Decreasing Payload

If we reduce the number of requests to the API to achieve the same user experience, it implies that the payload of each request will need to be larger. While it is possible that this extra payload won’t noticeably impair performance, we still would like to reduce the total number of bits delivered. To do so, we will also be looking at ways to handle partial response through the API. Our goal in this approach will be to conceptualize the API as a database. A database can handle incredible variability in requests through SQL. We want the API to be able to answer questions with the same degree of variability that SQL can for a database. Other implementations, like YQL and OData, offer similar flexibility and we will research them as well. Chattiness and payload size (as well as their impact on the request/response model) are just two examples of the things we are researching in our upcoming API redesign. In the coming weeks, as we get deeper into this work, we will continue to post our thinking to this blog.

If these challenges seem exciting to you, we are hiring! Check out the jobs on the API team at our jobs site.

Embracing the Differences : Inside the Netflix API Redesign

comments Comments Off on Embracing the Differences : Inside the Netflix API Redesign
By , September 10, 2023 7:50 pm

This post originally appeared on the Netflix Tech Blog on July 9, 2012.

As I discussed in my recent blog post on ProgrammableWeb.com, Netflix has found substantial limitations in the traditional one-size-fits-all (OSFA) REST API approach. As a result, we have moved to a new, fully customizable API. The basis for our decision is that Netflix’s streaming service is available on more than 800 different device types, almost all of which receive their content from our private APIs. In our experience, we have realized that supporting these myriad device types with an OSFA API, while successful, is not optimal for the API team, the UI teams or Netflix streaming customers. And given that the key audiences for the API are a small group of known developers to which the API team is very close (i.e., mostly internal Netflix UI development teams), we have evolved our API into a platform for API development. Supporting this platform are a few key philosophies, each of which is instrumental in the design of our new system. These philosophies are as follows:

  • Embrace the Differences of the Devices
  • Separate Content Gathering from Content Formatting/Delivery
  • Redefine the Border Between “Client” and “Server”
  • Distribute Innovation

I will go into more detail below about each of these, including our implementation and what the benefits (and potential detriments) are of this approach. However, each philosophy reflects our top-level goal: to provide whatever is best for the Netflix customer. If we can improve the interaction between the API and our UIs, we have a better chance of making more of our customers happier.

Now, the philosophies…

Embrace the Differences of the Devices

The key driver for this redesigned API is the fact that there are a range of differences across the 800+ device types that we support. Most APIs (including the REST API that Netflix has been using since 2008) treat these devices the same, in a generic way, to make the server-side implementations more efficient. And there is good reason for this approach. Providing an OSFA API allows the API team to maintain a solid contract with a wide range of API consumers because the API team is setting the rules for everyone to follow.

While effective, the problem with the OSFA approach is that its emphasis is to make it convenient for the API provider, not the API consumer. Accordingly, OSFA is ignoring the differences of these devices; the differences that allow us to more optimally take advantage of the rich features offered on each. To give you an idea of these differences, devices may differ on:

  • Memory capacity or processing power, potentially modifying how much content it can manage at a given time
  • Requirements for distinct markup formats and broader device proliferation increases the likelihood of this
  • Document models, some devices may perform better with flatter models, others with more hierarchical
  • Screen real estate which may impact the content elements that are needed
  • Document delivery, some performing better with bits streamed across HTTP rather than delivered as a complete document
  • User interactions, which could influence the metadata fields, delivery method, interaction model, etc.

Our new model is designed to cut against the OSFA paradigm and embrace the differences across devices while supporting those differences equally. To achieve this, our API development platform allows each UI team to create customized endpoints. So the request/response model can be optimized for each team’s UIs to account for unique or divergent device requirements. To support the variability in our request/response model, we need a different kind of architecture, which takes us to the next philosophy…

Separate Content Gathering from Content Formatting/Delivery

In many OSFA implementations, the API is the engine that retrieves the content from the source(s), prepares that payload, and then ultimately delivers it. Historically, this implementation is also how the Netflix REST API has operated, which is loosely represented by the following image:

The above diagram shows a rainbow of colors roughly representing some of the different requests needed for the PS3, as an example, to start the Netflix experience. Other UIs will have a similar set of interactions against the OSFA REST API given that they are all required by the API to adhere to roughly the same set of rules. Inside the REST API is the engine that performs the gathering, preparation and delivery of the content (indifferent to which UI made the request).

Our new API has departed from the OSFA API model towards one that enables fine-grained customizations without compromising overall system manageability. To achieve this model, our new architecture clearly separates the operations of content gathering from content formatting and delivery. The following diagram represents this modified architecture:

In this new model, the UIs make a single request to a custom endpoint that is designed to specifically handle that request. Behind the endpoint is a handler that parses the request and calls the Java API, which gathers the content by calling back to a range of dependent services. We will discuss in later posts how we do this, particularly in how we parse the requests, trigger calls to dependencies, handle concurrency, support fallbacks, as well as other techniques we use to ensure optimized and accurate gathering of the content. For now, though, I will just say that the content gathering from the Java API is generic and independent of destination, just like the OSFA approach.

After the content has been gathered, however, it is handed off to the formatting and delivery engines which sit on top of the Java API on the server. The diagram represents this layer by showing an array of different devices resting on top of the Java API, each of which corresponds to the custom endpoints for a given UI and/or set of devices. The custom endpoints, as mentioned earlier, support optimized request/response handling for that device, which takes us to the next philosophy…

Redefine the Border Between “Client” and “Server”

The traditional definition of “client code” is all code that lives on a given device or UI. “Server code” is typically defined as the code that resides on the server. The divide between the two is the network border. This is often the case for REST APIs and that border is where the contract between the API provider and API consumer is engaged, as was the case for Netflix’s REST API, as shown below:

In our new approach, we are pushing this border back to the server, and with it goes a substantial portion of the UI-specific content processing. All of the code on the device is still considered client code, but some client code now resides on the server. In essence, the client code on the device makes a network call back to a dedicated client adapter that resides on the server behind the custom endpoint. Once back on the server, the adapter (currently written in Groovy) explodes that request out to a series of server-side calls that get the corresponding content (in some cases, roughly the same rainbow of requests that would be handled across HTTP in our old REST API). At that point, the Java APIs perform their content gathering functions and deliver the requested content back to the adapter. Once the adapter has some or all of its content, the adapter processes it for delivery, which includes pruning out unwanted fields, error handling and retries, formatting the response, and delivering the document header and body. All of this processing is custom to the specific UI. This new definition of client/server is represented in the following diagram:

There are two major aspects to this change. First, it allows for more efficient interactions between the device and the server since most calls that otherwise would be going across the network can be handled on the server. Of course, network calls are the most expensive part of the transaction, so reducing the number of network requests improves performance, in some cases by several seconds. The second key component leads us to the final (and perhaps most important) philosophy to this approach, which is the distribution of the work for building out the optimized adapters.

Distribute Innovation

One expected critique with this approach is that as we add more devices and build more UIs for A/B and multivariate tests, there will undoubtedly be myriad adapters needed to support all of these distinct request profiles. How can we innovate rapidly and support such a diverse (and growing) set of interactions? It is critical for us to support the custom adapters, but it is equally important for us to maintain a high rate of innovation across these UIs and devices.

As described above, pushing some of the client code back to the servers and providing custom endpoints gives us the opportunity to distribute the API development to the UI teams. We are able to do this because the consumers of this private API are the Netflix UI and device teams. Given that the UI teams can create and modify their own adapter code (potentially without any intervention or involvement from the API team), they can be much more nimble in their development. In other words, as long as the content is available in the Java API, the UI teams can change the code that lives on the device to support the user experience and at the same time change the adapter code to deliver the payload needed for that experience. They are no longer bound by server teams dictating the rules and/or being a bottleneck for their development. API innovation is now in the hands of the UI teams! Moreover, because these adapters are isolated from each other, this approach also diminishes the risk of harming other device implementations with tactical changes in their device-specific APIs.

Of course, one drawback to this is that UI teams are often more skilled in technologies like HTML5, CSS3, JavaScript, etc. In this system, they now need to learn server-side technologies and techniques. So far, however, this has been a relatively small issue, especially since our engineering culture is to hire very strong, senior-level engineers who are adaptable, curious and passionate about learning and implementing these kinds of solutions. Another concern is that because the UI teams are implementing server-side adapters, they have the potential to bring down the servers through infinite loops or other processes that are resource intensive. To offset this, we are working on scrubbing engines that will hopefully minimize the likelihood of such mistakes. That said, in the OSFA world, code on the device can just as easily DDOS the server, it is just potentially a bigger problem if it runs on the server.

Example of how this new system works:

  1. A device, such as the PS3, makes a single request across the network to load the home screen (This code is written and supported by the PS3 UI team.
  2. A Groovy adapter receives and parses the PS3 request (PS3 UI team)
  3. The adapter explodes that one request into many requests that call the Java API to (PS3 UI team)
  4. Each Java API calls back to a dependent service, concurrently when appropriate, to gather the content needed for that sub-request (API team)
  5. In the Java API, if a dependent service unavailable or returns a 4xx or 5xx, the Java API returns a fallback and/or an error code to the adapter (API team)
  6. Successful Java API transactions then return the content back to the adapter when each thread has completed (API team)
  7. The adapter can handle the responses from each thread progressively or all together, depending on how the UI team wants to handle it (PS3 UI team)
  8. The adapter then manipulates the content, retrieves the wanted (and prunes out the unwanted) elements, handle errors, etc. (PS3 UI team)
  9. The adapter formats the response in preparation for delivery back across the network to the PS3, which includes everything needed for the PS3 home screen in the single payload (PS3 UI team)
  10. The adapter finally handles the delivery of the payload across the network (PS3 UI team)
  11. The device will then parse this optimized response and populate the UI (PS3 UI team)

We are still in the early stages of this new system. Some of our devices have fully migrated over to it, others are split between it and the REST API, and others are just getting their feet wet. In upcoming posts, we will share more about the deeper technical aspects of the system, including the way we handle concurrency, how we manage the adapters, the interaction between the adapters and the Java API, our Groovy implementation, error handling, etc. We will also continue to share the evolution of this system as we learn more about it.

Why You Probably Don’t Need an API Strategy

comments Comments Off on Why You Probably Don’t Need an API Strategy
By , November 9, 2013 1:04 pm

“Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.” – Sun Tzu, The Art of War

Over the course of 2013, the API industry has matured a great deal. Not only have we seen many of the major vendors (ie. ApigeeMashery3ScaleLayer7, etc) get acquired and/or receive large rounds of funding, we are also seeing an uptick in new players, new tools and services, new publications, and even a series of API-focused conferences.

Meanwhile, according to a recent survey by Layer7, more than 85% of companies expect to have an “API program” within the next five years. All of this is evidence that the appetite for tools and information about APIs is robust. Accordingly, there is no shortage of people and companies attempting to satisfy that hunger. The question I have amidst this growth, however, is whether the concepts around API strategies being served by some is the right meal for those who are eager to feast on APIs.

The problem: “API Strategy”

The majority of the non-technical conversations in the API industry seem to be focusing on terms like “API strategy” and “API economy.” In fact, I even co-wrote a book called APIs: A Strategy Guide a couple of years ago, further facilitating the use of those words in the API vernacular. There is absolutely a strong case to be made for needing an API strategy for certain situations. But how many companies should really be thinking about their API in that way?

Before continuing, it is worth being clear on what I mean by the terms “strategy” and “tactic”. Bobby Ghoshal puts it nicely in his post, Greeks Gave us Strategy vs. Tactics: Now Understand the Difference where he says, “A strategy is a grand plan, a tactic is a specific measure implemented to push the grand plan forward.” Applied to APIs, if there is an API strategy then it means that API is the product in-and-of-itself. In other words, the API is the target of a distinct business and opportunity (with its own metrics), which will then have a range of tactics to support it.

There are certainly cases where APIs are businesses and where a strategy is appropriate. The most common example of an API strategy is around companies who aspire to build a developer community as a new revenue source or as the foundation of their business. Twilio is an interesting example of such a company. Twilio’s strategy is to offer APIs that tap into their backend services to allow developers to build apps supporting their communication initiatives.

In this case, the API is a strategy, one that is fundamental to the business as a whole. Accordingly, Twilio invests heavily in the API, supporting documentation for it, fostering the developer community, and all of the other things one would expect such a company to do for their public API (and some would suggest that they do this as well as or better than anyone else). Twilio should invest heavily in this — a significant portion of the opportunity is predicated on the success of the API program.

The reality: “API as a tactic”

But most companies should not be trying to set up distinct businesses with their APIs as the focal point. They should not be trying to generate new revenue streams or reach new audiences through such programs. Instead, most companies should be focusing on their core business and then designing APIs that support larger strategy.

In pursuing that route, most companies should not be discussing their “API Strategy,” they should be talking about their API as a tactic in support of their broader business strategy and objectives.

An example that I am very close to is Netflix. In the early days of the Netflix API when the program was targeted exclusively to public developers, the API had its own metrics and its own objectives, all of which were designed to support the primary goals of the company.

In this sense, the Netflix API was a product designed to offer incentives to developers to motivate them to build applications around the Netflix experience. These applications would hopefully reach new audiences to generate new subscribers and/or create new user experiences for existing subscribers that would increase their satisfaction with our service. Although the API was treated as a new product within Netflix, it was still operating under the company’s larger business objectives.

While the original vision was incrementally valuable to the company, the results were not as transformative as originally expected. As a result, we pursued a new approach with the API, using it to drive the larger strategy of device proliferation for our growing streaming business. In this sense, the API was transitioning from a product to a tactic.

Today, Netflix can be watched on more than 1,000 different device types, the vast majority of which are developed by Netflix-employed UI Engineers. The API served as an excellent engineering tactic that allowed us to quickly get on more devices, which in turn allows us to create a better overall experience for our customers.

More changes have since been made with the API. Most recently, the Netflix API team, which used to provide traditional REST APIs to the Netflix UI teams, is now providing content distribution platforms that enable data to be pushed from our AWS backend systems to the devices in people’s homes and pockets. We are no longer truly an API team, we are a team that embraces the differences of the different devices and empowers the UI teams to customize and optimize the request/response models needed for their specific device. In other words, we are now a platform for API development.

All of these pivots within Netflix further demonstrate that our API is nothing more than a tactic to achieve our broader goals. There are no allegiances to a tactical solution. Tactics can (and should) be modified, discarded and replaced as appropriate. Strategies, on the other hand, should have longer shelf-lives, evolving over time but less frequently overhauled.

The majority of companies that are considering API implementations, based on my conversations and experience in the industry, are more like Netflix than Twilio. There are countless examples of companies who have made similar pivots to refocus their API attention towards supporting the company’s primary business objectives. These examples range from media companies (NPR, The New York Times, The Guardian) to financial institutions (PayPal, E-Trade) to social media sites (Twitter, LinkedIn).

Even service companies like Amazon and Salesforce, whose systems are differentiated in part by their APIs, use them as a tactic to provide increased value for the primary business, which is providing robust services supporting cloud computing and CRM respectively.

The bottom line

The key to a successful API program is to know your audience. Your audience is defined by your business opportunity. So, be very thoughtful about the opportunity and then define your API accordingly. In some cases, pursuing the public developer opportunity is absolutely the right thing to do and it may have a tremendous upside (although realizing that upside is quite rare).

However, if your opportunity is truly to support a broader business objective, then launching a public developer program is not likely to yield large dividends. It is more likely to come with increased costs and risks that will weigh down your returns, dilute your resources for the larger opportunity, and distract you from the real prize. Instead, focus all of your energy on building a great system that helps you optimize for the larger goal. And don’t be married to that system as it is nothing more than a tactic.

Ultimately, if you know your audience, you can define a strategy and then design the right tactics. Otherwise, brace yourself for a very slow route to victory or, more likely, a noisy defeat.

This post originally appeared on The Next Web on September 15, 2013.

 

Netflix Does Not Have an Unlimited Vacation Policy

comments Comments Off on Netflix Does Not Have an Unlimited Vacation Policy
By , October 28, 2013 1:02 pm

I am finding it increasingly common, when talking about working at Netflix, for people to tell me that their companies are embracing an unlimited vacation policy similar to the one Netflix has been employing for years. When I hear these kinds of statements, my first thought is that these programs are not likely to have the desired outcome. I think that because Netflix does not have an unlimited vacation policy. Rather, Netflix has a culture of “Freedom and Responsibility” (F&R), which is discussed extensively in our publicly available culture slides.

The culture of F&R extends much further than its application to vacations, sick days, and holidays. F&R is a mindset on how we treat each other and our jobs on a day-to-day basis. It is a simple mantra that basically states that we are all adults, so let’s all behave and treat each other like adults. And if a person is not consistently behaving like an adult, they should not work at Netflix.

So, what does it mean to “behave like an adult”? At the core, Netflix seeks out senior-level people who are seasoned, both in their primary competency as well as in how to work effectively with others. We expect people to live up to the values that we discuss in the culture slides, to be people on which we would want to trust the future of our business. In fact, we are trusting the future of the business on each and every person who works with us. And that is the central point of it all… trust. If we cannot trust you, you should not be working at Netflix. Period.

This kind of trust takes many shapes. One example is that there are no strict silos in our server access rights. Virtually every engineer at the company can easily get root-level access to virtually any system in the production environment. Another manifestation is the fact that virtually everyone in the company knows key strategic initiatives that we are focused on well before these initiatives go public. As a result, everyone in the company is subject to trading windows for our stock options. That level of openness, at a company the size of Netflix, is exceedingly rare. And we can do all of this because we hire (and fire), essentially, for that deep level of trust.

With respect to vacation, that is just another manifestation of our trust as it pertains to F&R. We trust our employees and peers to take care of what they are responsible for. As long as we are being responsible (ie. getting our part of the project done, communicating our situation to others, etc.), then we don’t have much interest or concern around how others budget the rest of their time. We are free to do as we please because we can be trusted to do what we are supposed to do. As a result, tracking people’s time for vacation, sick, holidays, etc. just does not make sense. Track the results of the output, not the amount of time that it takes to execute it (or even which hours of the day were used to complete it).

On a related note, for the companies that do track hours work and audit leave days, many of them do not also track late night hours working on production deployments, site outages, etc. If you are going to track one, you must track the other. But it really makes no sense to track either. Companies trust these people to deploy and maintain their entire digital presence at midnight on a Saturday, but they won’t trust these same people to be responsible with their own time. That makes no sense!

At the core, Netflix does not have an “unlimited vacation” policy, we have a trust policy. If we trust you, then we will trust you and will give you great liberty to accomplish amazing things. If not, we will part ways.

API Revolutions and the API Strategy Conference

comments Comments Off on API Revolutions and the API Strategy Conference
By , February 25, 2013 7:31 am

Congratulations again to Kin Lane and everyone at 3Scale for a very successful and fulfilling API Strategy Conference. There were a lot of great presentations and panels as well a many very interesting hallway conversations.

And I was exited to be able to speak at the event! Embedded below are the slides from my presentation, complete with copious notes on each slide to provide the context of what I said during the talk.

The focus of the presentation was on API revolutions. We have seen a number of them in the recent years, but there have been significant and substantial changes for Netflix and for some others that warrant discussion. The question that remains is: Are these changes specific to a small handful of companies or are these companies representing things to come for the API world as a whole?

-Daniel

My Presentation at Intelligent Content Conference

comments Comments Off on My Presentation at Intelligent Content Conference
By , February 11, 2013 3:22 pm

Last Friday (February 8th), I spoke at the Intelligent Content Conference 2013.  When Scott Abel (aka The Content Wrangler) first contacted me to speak at the event, he asked me to speak about my content management and distribution experiences from both NPR and Netflix.  The two experiences seemed to him to be an interesting blend for the conference. And when I got to the conference, I was absolutely floored by the number of people who had already heard about NPR’s COPE model!

I have to admit, it had been a while since I last thought that much about the NPR days, but doing so brought back a lot of interesting memories.  When more deeply considering those experiences alongside my Netflix experience, I was able to see commonalities in practice, philosophy, execution and results (although at different scales).

At any rate, embedded below are the slides from my presentation.  I spent a good chunk of time commenting each slide as my presentations tend to be very image-heavy, which often results in lost context.  The comments have added that context back in.

Thanks again, Scott, for having me at the conference.  And thanks to all of the attendees with whom I spoke before and after my talk.  The event was a lot of fun!

-Daniel

Panorama Theme by Themocracy