Tapjoy Engineering: Real-time Deduping At Scale


At Tapjoy, analytics is core to our platform. On an average day, we're processing over 2 million messages per minute through our analytics pipeline. These messages are generated by various user events in our platform and eventually aggregated for a close-to-realtime view of the system. (...) In this post we look at how we handled the at-least-once semantics of our Kafka pipeline through real-time deduping in order to ensure the integrity / accuracy of the data.

You may or may not need to solve a problem with this level of technical complexity today, but it's important to understand just how hard it is to solve this type of problem well.


Want to receive more content like this in your inbox?