Procella: Unifying Serving and Analytical Data at YouTube

Ummmm: holy shit. The folks @ Google are really living in your and my future. Stick with me for two quotes:

Large organizations… are dealing with exploding data volume and increasing demand for data driven applications. Broadly, these can be categorized as: reporting and dashboarding, embedded statistics in pages, time-series monitoring, and ad-hoc analysis. Typically, organizations build specialized infrastructure for each of these use cases. This, however, creates silos of data and processing, and results in a complex, expensive, and harder to maintain infrastructure.
The big hairy audacious goal of Procella was to “implement a superset of capabilities required to address all of the four use cases… with high scale and performance, in a single product”.

The post itself is a summary of a recent paper out of Google describing their system, Procella. It's a SQL-based system where you can have your cake and eat it too—it's blazing fast for each analytical use case listed above. This is an incredibly difficult achievement: rather than making the tradeoffs inherent, the team found ways of having optimal performance for each. The post goes pretty deep on the innovations required, including a new file format and an adaptive optimizer (which is insanely cool).

The past decade has seen a ton of innovation in SQL serving systems, but we're not done yet. The stuff we're going to see delivered over the next decade is going to be awesome.


Want to receive more content like this in your inbox?