Data Governance and the Death of Schema on Read

Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.

Data governance is a topic that has traditionally been dominated by massive companies and high-priced vendor solutions that were not particularly innovative. But that's changing: I'm seeing data practitioners at companies large and small start to really care about and innovate around data governance. This is for a couple of reasons, IMO: 

  1. Practitioners truly want their business users to self-serve, and they realize that can't realistically happen without good documentation.
  2. Regulations like GDPR will grind entire data organizations to a halt without real governance in place.

This post is a solid introduction to the topic and how things have evolved over the past decade or so. Comcast is clearly doing some cool things (which they present at the end) but even if you're just checking your analytic code into git and accompanying it with markdown files you're already ahead of 80% of organizations.


