Amundsen — Lyft’s Data Discovery & Metadata Engine

eng.lyft.com

Another topic I think is incredibly important: data discovery and curation! If your organization is of sufficient size, how do users:

  • learn about new datasets that they have never worked with before?
  • understand the provenance of the data that their reports are built on?
  • know who to go to for more information?

...and more. Operating a data infrastructure at a company of 80 people is a completely different challenge than operating a data infrastructure for a company of 5,000 people, with the core difference being that you can't just shoulder-tap someone to get an answer. Lyft realized that 25% of their data teams' time was spent just trying to find the relevant data. That number was even higher when I worked at GE.

This post goes through Lyft's tool, Amundsen, that indexes their internal datasets and exposes that information to users. Amazing read, and very informative if you're a dbt user—this is exactly the experience we're on the path to facilitating for all dbt users with dbt Docs.

Read more...
Linkedin

Want to receive more content like this in your inbox?