Clustering Similar Stories Using LDA @ Flipboard

Flipboard, a popular news reading application, just released a "related stories" feature. From the article:

Although there are many sophisticated automatic clustering algorithms, story clustering is a non-trivial problem. Because each text document can contain any word from our vocabulary, most text document representations are extremely high-dimensional. In high-dimensional spaces, even basic clustering or similarity measures fail or are very slow.

This post on their engineering blog goes deep into the details of their implementation. Extremely useful.


