Modern systems must be aware of the quality of the incoming data and capable of identifying, reporting and handling erroneous cases accordingly.
Modern data organizations have dramatically increased their ability to ingest and process new data streams, but they frequently have no mechanism to validate that data either at the outset of a project or, just as importantly, on an ongoing basis. This article walks through the why and how of creating high-quality data.
We think a ton about this problem at Fishtown Analytics and it's exactly what we've built dbt data tests to help with. We're starting to see more companies beginning to take this issue seriously, but it's still rare.