Data Versioning

Great summary by the author:

Data science is hard to productionize, and one of the reasons it is hard is because it has so many moving parts. The notion of a "version" of a smart/AI/machine learning application has (at least) four possible axes on which it can drift. This poses a challenge in continuous delivery practices. These challenges can be addressed, but there are benefits and drawbacks to the various ways I've seen people try to address this in practice.

The post goes into the various approaches in a decent amount of detail. It's a thorough yet accessible treatment of this important topic.


Want to receive more content like this in your inbox?