Databricks: Open Sourcing Delta Lake

databricks.com

Databricks' announcement of a new file format, Delta, promises things like ACID transactions and data versioning. If you're a Snowflake user, you'll recognize these as both major selling points of that platform, so it's neat to see these features make their way into an open source context.

I've been spending more time in the Spark / Databricks ecosystem recently on a client project and have had some file format challenges that have made me develop a new appreciation for their criticality. Delta is already solving real problems for us on that project.

This may not be the absolute most riveting topic in the world but I actually think this release could be quite important for the industry. We need to move away from data engineering jobs that have to write an entire 100-MB parquet file if they need to update a single row.

Read more...
Linkedin

Want to receive more content like this in your inbox?