Big Data Quality at Criteo

medium.com

Why Data Quality on Hadoop is key to get the most of your data and how Criteo addressed it.

Criteo has 450 PB of data, 300k jobs, and 7,000 data quality checks.

IMO, data quality is still the most under-invested-in part of the modern data stack. dbt's current testing functionality goes a long way here, but the Criteo team has done some pretty cool stuff above and beyond where dbt is at the moment. I'm excited to experiment more in this area of the product in 2020.

Read more...
Linkedin

Want to receive more content like this in your inbox?