Procella: Unifying Serving and Analytical Data at YouTube

Ummmm: holy shit. The folks @ Google are really living in your and my future. Stick with me for two quotes:

Large organizations… are dealing with exploding data volume and increasing demand for data driven applications. Broadly, these can be categorized as: reporting and dashboarding, embedded statistics in pages, time-series monitoring, and ad-hoc analysis. Typically, organizations build specialized infrastructure for each of these use cases. This, however, creates silos of data and processing, and results in a complex, expensive, and harder to maintain infrastructure.
The big hairy audacious goal of Procella was to “implement a superset of capabilities required to address all of the four use cases… with high scale and performance, in a single product”.

The post itself is a summary of a recent paper out of Google describing their system, Procella. It's a SQL-based system where you can have your cake and eat it too—it's blazing fast for each analytical use case listed above. This is an incredibly difficult achievement: rather than making the tradeoffs inherent, the team found ways of having optimal performance for each. The post goes pretty deep on the innovations required, including a new file format and an adaptive optimizer (which is insanely cool).

The past decade has seen a ton of innovation in SQL serving systems, but we're not done yet. The stuff we're going to see delivered over the next decade is going to be awesome.


