Airbnb: Supercharging Apache Superset

How Airbnb customized Apache Superset for business intelligence at scale.

I haven't read a post quite like this before! There are so many posts about "how we scaled our streaming data pipeline / Presto instance / etc" but I had never previously read about a company going to massive scale in a single BI tool. I think this is because in practice most companies do not succeed at doing this. While there are certainly very large companies that standardize on single centralized BI tools, one of three things tend to happen:

  • Company uses a top-down deployment and the BI tool ends up being a way to push high-level metrics out to the org, not to enable IC analysis work. IC analysis work happens in shadow IT.
  • Company buys Tableau / PowerBI licenses for everyone but there is no centralized experience; the organizing unit is the team / department individual.
  • Company attempts to deploy BI tool "the right way" but hits limits of scalability and splits the single environment into multiple, causing awkward tectonic rifts.

The picture Airbnb paints here is unique because there are 2,000 knowledge workers all collaborating together inside of a single Superset instance, sharing discoverability, governance, etc. The post really shines a light on the core aspect of Superset that enables this: its Apache 2.0 license. The Airbnb team has meaningfully extended and customized how the product works for them in ways that simply aren't possible in any proprietary product. I found it very interesting just how critical open source was for them at this layer of the stack.


Want to receive more content like this in your inbox?