Neutral Sentiment Score

Positive Sentiment Score

Negative Sentiment Score

Neutral Trendline (EMA)

Positive Trendline (EMA)

Negative Trendline (EMA)

About Hackermoods

Hackermoods uses sentiment analysis to figure out if a set of comments on a Hacker News story is relatively positive, negative, or neutral.

In general, sentiment analysis is hard. It's not great at picking up things like sarcasm and idiom, or reading into subtext. But looking at trends over time within a single group/community helps control for those errors. In other words, claiming "these comments are positive" is dangerous, but claiming "these comments are more positive than yesterday's" is more reliable. So we've designed hackermoods with the latter mentality.

The spread between positive and negative sentiment is usually more interesting than the value of the quantities themselves. Large spreads often indicate emotional topics and views. A spread that diverges can hint at a topic becoming increasingly contentious whereas a spread that converges suggests consensus.

Our Methodology

We only consider stories with eight or more comments - stories with few comments are often spurious sources of noise. Full text search targets the title and the URL of the story. We could target comments as well, but we're more interested in the sentiment of a story than a collection of random comments that contain a keyword. Sentiment scores are computed using VADER - Valence Aware Dictionary for Sentiment Reasoning. We tried several models and more sophisticated approaches, but VADER worked well for our purposes, which isn't surprising given that it was designed for social media. The scores that VADER gives are normalized on [0, 1] and broken into three categories - positive, negative, and neutral. It's possible for something to be all positive or all negative, but most things are a mix. Most content will be predominately neutral.

Note that at present, updates are applied daily. We have plans to support real-time processing in the future, as well as additional data sources. There are plenty of other opportunities for improvement, and we'll continue to iterate.