Give Meaning to 100 Billion Events Per Day

Data scientists all seem to agree that a large majority of the work involved in doing data science well is gathering, cleaning, and making raw data available. Yet posts that describe how real companies are solving these problems in production are still quite rare. This one is a gem.

In this article, we describe how we orchestrate Kafka, Dataflow and BigQuery together to ingest and transform a large stream of events.

The post discusses the massive effort that the team at Teads went through to solve their large-scale pipeline problem to get event data into BigQuery efficiently. There were quite a few hiccups along the way, but in the end the Google Cloud stack served them well.


Want to receive more content like this in your inbox?