Amundsen — Lyft’s Data Discovery & Metadata Engine

Another topic I think is incredibly important: data discovery and curation! If your organization is of sufficient size, how do users:

  • learn about new datasets that they have never worked with before?
  • understand the provenance of the data that their reports are built on?
  • know who to go to for more information?

...and more. Operating a data infrastructure at a company of 80 people is a completely different challenge than operating a data infrastructure for a company of 5,000 people, with the core difference being that you can't just shoulder-tap someone to get an answer. Lyft realized that 25% of their data teams' time was spent just trying to find the relevant data. That number was even higher when I worked at GE.

This post goes through Lyft's tool, Amundsen, that indexes their internal datasets and exposes that information to users. Amazing read, and very informative if you're a dbt user—this is exactly the experience we're on the path to facilitating for all dbt users with dbt Docs.


