Netflix: Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

medium.com

Is this a big new open source release from Netflix(🤩) or just Yet Another Data Pipeline Framework(😞)? Metaflow has certainly become widely used within Netflix, but only time will tell if it receives meaningful community attention. Here's the most useful section of the post:

There are many existing frameworks, such as Apache Airflow or Luigi, which allow execution of DAGs consisting of arbitrary Python code. The devil is in the many carefully designed details of Metaflow: for instance, note how in the above example data and models are stored as normal Python instance variables. They work even if the code is executed on a distributed compute platform, which Metaflow supports by default, thanks to Metaflow’s built-in content-addressed artifact store. In many other frameworks, loading and storing of artifacts is left as an exercise for the user, which forces them to decide what should and should not be persisted. Metaflow removes this cognitive overhead.

There is a lot of movement in this space at the moment; the article didn't list newcomers Dagster and Prefect, both of which are doing interesting things. Lots going on.

Read more...
Linkedin

Want to receive more content like this in your inbox?