Data Orchestration — A Primer.

medium.com

Until recently, data teams used cron to schedule data jobs. However, as data teams began writing more cron jobs the growing number and complexity became hard to manage. In particular, managing dependencies between jobs was difficult. Second, failure handling and alerting had to be managed by the job so the job or an on-call engineer had to handle retries and upstream failures, a pain. Finally, for retrospection teams had to manually sift through logs to check how a job performed on a certain day, a time sink. Because of these challenges data orchestration solutions emerged.

Last issue I dug deep into orchestration and talked about why I'm excited about Dagster, but if you haven't been living in orchestration-land for a long time that post may not have hit home. This is a fantastic overview of the space if you're new to it.

The author, Astasia Myers, is a VC @ Redpoint, and she spends a lot of time in data. Not a bad feed to subscribe to; I read everything she writes.

Read more...
Linkedin

Want to receive more content like this in your inbox?