How to Serve Models

There are many ways to serve ml(machine learning) models, but these are the most common 3 patterns I observed over the years:
1) Materialize/Compute predictions offline and serve through a database,
2) Use model within the main application, model serving/deployment can be done with main application deployment,
3) Use model separately in a microservice architecture where you send input and get output

Yep yep yep. This is the clearest post I've read on this topic before; extremely helpful if you're thinking about how to design a production ML system right now. The author runs the search engineering team at, and his recommendations are those of an experienced practitioner: he doesn't push the reader straight to the most architecturally "pure" approach (#3), very much recognizing the overhead required to run the microservices architecture that it requires.


Want to receive more content like this in your inbox?