100 Billion Records Later, Refining our ETL Service

blog.stitchdata.com

This post, written by the VP Engineering at Stitch, goes deep into the challenges faced in building a data pipeline that has delivered 100 billion records over its first 10 months. My personal takeaway: data engineers may sometimes be too quick to incorporate open source tools. The Stitch team has now removed Spark from their stack and reduced its usage of Kafka. Very interesting lessons that may save you thousands of engineering hours.

Read more...
Linkedin Revue

Want to receive more content like this in your inbox?