Open Sourcing Brooklin: Near Real-Time Data Streaming at Scale

This project comes out of Linkedin, birthplace of Kafka, and is a very close cousin. My read of this release post is that Linkedin is primarily using this as a layer on top of Kafka that will allow it to replicate Kafka streams to multiple (often quite disparate) environments. This functionality was originally provide by Kafka MirrorMaker, and this post does a good job of explaining the utility:

Kafka supports internal replication to support data availability within a cluster. However, enterprises require that the data availability and durability guarantees span entire cluster and site failures.

If you need your Kafka streams to be redundant across multiple datacenters, you're a serious data engineering organization. Very cool that this seemingly quite mature project has been released into the wild.


Want to receive more content like this in your inbox?