Why You Need to Improve Your Training Data, and How to Do It


As part of my job I work closely with a lot of researchers and product teams, and my belief in the power of data improvements comes from the massive gains I’ve seen them achieve when they concentrate on that side of their model building. The biggest barrier to using deep learning in most applications is getting high enough accuracy in the real world, and improving the training set is the fastest route I’ve seen to accuracy improvements.

The recommendation itself is not new or surprising—data over algorithms has become conventional wisdom in the era of deep learning—but the stories the author tells in making the point are really excellent, as are the recommendations. The author knows his subject cold; this is the best post I've seen on this particular topic.


Want to receive more content like this in your inbox?