Clustering Similar Stories Using LDA @ Flipboard

Flipboard, a popular news reading application, just released a "related stories" feature. From the article:

Although there are many sophisticated automatic clustering algorithms, story clustering is a non-trivial problem. Because each text document can contain any word from our vocabulary, most text document representations are extremely high-dimensional. In high-dimensional spaces, even basic clustering or similarity measures fail or are very slow.

This post on their engineering blog goes deep into the details of their implementation. Extremely useful.


Want to receive more content like this in your inbox?