A Beginner’s Guide to Data Engineering


The more experienced I become as a data scientist, the more convinced I am that data engineering is one of the most critical and foundational skills in any data scientist’s toolkit.

The author, now @ Airbnb and previously @ Twitter, shares the best introduction to data engineering I've seen. My favorite section is on the foundational choice of writing ETL jobs in the JVM (Java / Scala) vs writing them in SQL. His stated preference for SQL is one I share, and it's why we invest heavily in our open-source product, dbt. dbt is a tool to develop and run DAGs of SQL data transformations.

There are emerging standards for how data engineering should be done and this post is a great intro.


