Nemo: Data discovery at Facebook

Facebook finally joins AirbnbLyftNetflix, and Uber (and many others) in creating its own in-house data catalog. If you're following the space (as I very much am), this isn't a revolutionary release—it's hitting on the same themes as other similar in-house products. And because it's built on top of Facebook's proprietary social graph search utility, Unicorn, it's unlikely to be open sourced at any point.

There are a lot of nice touches though. Here's my favorite paragraph:

Nemo indexing is generally aware of our data ecosystem. For example, if a data pipeline duplicates a column into a downstream table, the original column’s description and the upstream table’s name are also stored for the downstream artifact. Presto queries of data artifacts are noted, so if an engineer performs a Presto query, that will increase the Nemo score both generally, for that table, and for the specific engineer who performed the search.


