Well isn’t this just delightful: A team of super-clever computer scientists at The Hebrew University’s Institute of Computer Science in Jerusalem will be presenting their findings on an algorithm that detects online sarcasm at next week’s International Conference for Weblogs and Social Media (ICWSM) in Washington, D.C.
They call their algorithm SASI, short for “semi-supervised sarcasm identification algorithm.” Essentially, it works by taking a handful of sentences that have been tagged by humans as sarcastic (hence the “semi-supervised” part) and employs machine learning to make guesses as to other sarcastic sentences.
The algorithm has been tested out on Tweets and Amazon product reviews, and it’s done pretty darn good (no sarcasm intended!): SASI achieved a precision of 77% and recall of 83.1% “on an evaluation set containing newly discovered sarcastic sentences, where each sentence was annotated by three human readers.”
Though there were no hard-and-fast methods for ferreting out sarcasm, the researchers did find a few moderately useful rules of thumb, including excessive capital letters and exclamation marks, although these were less reliable indicators than large disparities between sentiment and review score for reviewed items:
A number of sentences that were classified as sarcastic present excessive use of capital letters, i.e.: “Well you know what happened. ALMOST NOTHING HAPPENED!!!” (on a book), and “THIS ISN’T BAD CUSTOMER SERVICE IT’S ZERO CUSTOMER SERVICE”. These examples fit with the theoretical framework of sarcasm and irony (see the Related work section) as sarcasm, at its best, emerges from a subtle context, hence cues are needed to make it easier to the hearer to comprehend, especially with written text not accompanied by audio (‘…’ for pause or a wink, ‘!’ and caps for exaggeration, pretence and echoing). Surprisingly, though, the weight of these cues is limited and they fail to achieve neither high precision nor high recall.
According to the study, the three most sarcastically reviewed items on Amazon are Shure and Sony noise cancelation earphones, Dan Brown’s Da Vinci Code, and Amazon’s own Kindle e-reader. We’re guessing they didn’t come across the Amazon reviews for a gallon of Tuscan Whole Milk or a certain popular lupine t-shirt.
The researchers hypothesize that “one of the main reasons for using sarcasm in online communities and social networks is ‘enlightening’ the mass that are ‘treading the wrong path.’”