Data Quality for Everyday Analysis

A while ago, a friend of mine presented a compelling analysis that convinced the managers in a mid-size company to make a series of decisions based on the recommendations of the newly-established data science team. However, not long after, a million dollar loss revealed that the insights were wrong. Further investigations showed that while the analysis was sound, the data that was used was corrupt.

You're likely on board with the idea that pipelines need to be tested, so this post won't be totally new ground. But: it's a very practical post that contains a list of 6 dimensions of data quality and suggestions for how to test each of them. Probably the best overview I've read.

If you haven't spent much time operationalizing data quality, this is a good intro. And also...if the above applies to you, you should get a full sprint on your team's calendar between now and the end of the year to fix that.


Want to receive more content like this in your inbox?