Making Netflix’s Data Infrastructure Cost-Effective

Wowow! This is really cool. Netflix has gone very deep on making sure that they have control over their data-related costs, not at all an easy thing given the heterogeneity of their data ecosystem. This is a topic that has come up a bunch of times recently and generally the feedback I've heard from practitioners is "we're not doing a very good job of that." So it's interesting to see what "good" looks like.

What's probably the most interesting about this post is the topic of time-to-live. Netflix's infrastructure can automatically provide recommendations for how long to keep historical data in given collections based on partition query access patterns (see the diagram above). As we all start generating and analyzing more and more data, this type of system is going to become an absolute requirement.


