Taboola: How to Engineer Your Way Out of Slow Models

If you're not familiar with Taboola, they serve a tremendous number of impressions around the web in the "You might also like..." sections at the bottom of your favorite articles. In this post, a recommendations engineer talks about how they engineered a sophisticated model to return predictions in less than 200ms:

Sometimes using state of the art models can be problematic due to their computational demands. By caching intermediate results (embeddings) we were able to overcome this challenge, and still enjoy state of the art results.

Model prediction performance a great topic, and one rarely written about. If you've seen interesting stuff written on this topic, shoot me a link.


