Introducing OpenLineage

Metadata is a hard problem for product creators: if your product doesn't create the dataset, how do you get the metadata you need from it to power your product? This is one reason why metadata products like data catalogs have generally lagged behind the rest of the modern data stack; most of them rely on fairly thin data streams like parsing database query logs. This can go pretty far, but it has some inherent drawbacks too.

The new generation of products, mostly commercial versions of products developed in-house at BigTech, seems unwilling to accept this limitation (rightly so!). The OpenLineage Initiative is an attempt to create a standardized metadata schema such that all tools can publish and consume this standard schema, allowing for a much richer metadata experience across the entire tooling space.

It's a fantastic vision, and dbt is very much a participant here. Standards like this are also really hard to make sticky, though. I'm cautiously optimistic; it could be a real unlock to the next generation of tooling and data maturity.


Want to receive more content like this in your inbox?