Beginners Guide to Columnar File Formats

File formats can be confusing, so lets delve into Columnar file formats (like Parquet) and explain why they're different to regular formats (like CSV, JSON, or Avro).

I love this post because it almost exactly mirrors a talk that I give during the Fishtown Analytics onboarding process, but with better examples! Understanding columnar file formats is critical to understanding the internals of modern analytic SQL engines and therefore critical to using them well.

If you're not familiar with this topic, this is a must-read. If you already know this stuff cold, save this as a resource for future people who join your team.


