Introduction to JOINs in Apache Druid

imply.io

In Apache Druid 0.18/Imply 3.3, we added support for SQL Joins in Druid. This capability, which has long been demanded from Druid by the community, opens the door to a large number of possibilities in the future. In this blog I want to highlight some of the motivations behind us undertaking the effort and give you, the reader, an understanding of how it can be useful and where we’re going with it.

This is a very interesting development. Druid is a powerful tool for in-memory analytics and has incredible response times in many contexts; to-date, the lack of joins was one of the big challenges associated with using it for many workloads.

Druid still doesn't support arbitrary joins from any table to any table (although that appears to be on the way; for details on the functionality please do dig into the article.

This is an interesting thread to follow for anyone in the industry, because the core data processing technologies define so much of what is possible both up- and downstream of them. They define the "laws of physics" for the data ecosystem at any one point in time. And Druid, Clickhouse, and Pinot are mounting an interesting challenge to the now-status-quo of Bigquery and Snowflake. What fundamental changes in the ecosystem could we see if Druid becomes a primary destination for SQL workloads in the coming years?

Read more...
Linkedin

Want to receive more content like this in your inbox?