Embedding Stores


This is very interesting. The problem: how do you create a system that makes predictions from both a) image data, and b) associated dimensional data contained in a data warehouse, all the while preserving data privacy of the raw images? The answer: pass the images through a neural network to create embeddings, store those embeddings in the warehouse to join them with the rest of the data, and then train another neural net for the prediction task.

This is in and of itself an interesting solution, but I was particularly taken by the meta-point the author makes in the conclusion:

Solving for this specific ML problem, given the constraints we were working with, involved splitting up what is generally viewed as an ML-only problem (fine-tune a model) into a system design problem. That’s the biggest lesson we’re taking forward, and we regularly ask this question today: how can we break up a big ML problem into smaller, more manageable components?

I'm really so interested in this thread. Even five years in to curating this newsletter, most content written in the space is about "how to solve an ML problem" and not "how to build an ML system." I think the latter is the bigger bottleneck to getting more ML deployed in production and I would love to see it get more attention.


Want to receive more content like this in your inbox?