Dagster: The Data Orchestrator


In the status quo, traditional workflow engines such as Airflow work with a purely operational dependency graph: ensuring proper execution order, managing retries, consolidating logging, and so forth. These systems were a huge step forward over loosely coupled cron jobs and other solutions that did not formally define dependencies. A narrow focus on execution allowed those systems to be maximally general while demanding minimal change to the code that they orchestrated.
Dagster makes different tradeoffs, enabling a more structured programming model that exposes a richer, semantically aware graph.

I first introduced Dagster to this list about a year ago when it first launched. Since then, the project has come a long way. This post describes the thinking behind the project and how it's different than using something like Airflow. But to be honest, I didn't really get it until I saw a demo of dbt + Dagster being used together by a mutual customer—I was really impressed.

The two paragraphs above are the meat of the value for me: Dagster actually understands the computations that it is responsible for executing because it has a type system. In building dbt, we have found that understanding what type of computation is being expressed to be a critical design element in building a usable system, and this difference is at the heart of Dagster's value.


Want to receive more content like this in your inbox?