Our client is a large media company in Hong Kong that runs a daily news website, along with other content businesses.
Their objective was to develop a recommendation engine to enhance the user experience by creating personalized article recommendations for each known user. By presenting a user with articles that are relevant to them, the user will read more articles, thus spending more time on the site.
- Text of the articles was in Chinese – articles for the last 5 years available
- More than 1 million users access the website everyday – session log data available
- Articles can be editorial (once a day) or from a real-time feed.
We used a content-based clustering approach to build the recommendation engine.
- Determined a language model to represent documents in a structured manner
- Evaluated multiple similarity metrics, clustering algorithms, n-gram tokens
- Clustered similar news articles in each category, for a given period. Also, determined the optimal number of clusters.
- Evaluated various scoring approaches to classify new articles into the aforementioned clusters
- Analyzed a user’s history to find the clusters they have shown interest in the past
- Used the above to recommended appropriate articles to specific users
- Delivered the entire solution on an AWS 5-node cluster running Cloudera Hadoop
- The recommendation engine resulted in relevant article/video recommendations which enhanced the user experience and increased time on the website. Furthermore, it reduced editorial workloads.