Clustering Similar Stories Using LDA @ Flipboard

engineering.flipboard.com

Flipboard, a popular news reading application, just released a "related stories" feature. From the article:

Although there are many sophisticated automatic clustering algorithms, story clustering is a non-trivial problem. Because each text document can contain any word from our vocabulary, most text document representations are extremely high-dimensional. In high-dimensional spaces, even basic clustering or similarity measures fail or are very slow.

This post on their engineering blog goes deep into the details of their implementation. Extremely useful.

Read more...
Linkedin

Want to receive more content like this in your inbox?