Down with Pipeline Debt / Introducing Great Expectations

medium.com

TL;DR: pipeline debt is a species of technical debt that infests backend data systems. It drags down productivity and puts analytic integrity at risk. The best way to beat pipeline debt is a new twist on automated testing: pipeline tests, which are applied to data (instead of code) and at batch time (instead of compile or deploy time).

I really love this. Analytics and data science rely on quality data pipelines, and it's shocking just how infrequently these pipelines are appropriately tested. The article does a wonderful job discussing just how challenging it can be to get this stuff right and the importance of doing so. We do a ton of data testing at Fishtown Analytics and are huge believers.

My only quibble with the tool itself is the technical approach it takes. Brian and Josh from dbt slack put it better than I could (below). Even so, I give this team a ton of credit for pushing forward a very important topic.

Read more...
Linkedin

Want to receive more content like this in your inbox?