Data Infrastructure @ Slack

If you stop and think about the number of Slack conversations you have personally participated in and then multiply by, oh, 100 million or so users, you start to get a sense of the scale problem that the data team at Slack faces. Their solutions? S3, Kafka, Presto, Hive, and Spark, all reading and writing Parquet. To me, this reads as an engineering-heavy and open-source-focused stack; the post goes into some of the (non-trivial) challenges they had in making this work. 

Exercise for the reader: compare and contrast with the Blue Apron experience below.


Want to receive more content like this in your inbox?