The surprising effectiveness and longevity of the now legendary “that’s what she said” joke, recently popularized again with the help of The Office, has done more than provide millions with a knee-jerk response to casual conversation. It has now reached a new level of social significance, by inspiring serious linguistic research. It comes in the form of a research paper called That’s What She Said: Double Entendre Identiﬁcation, authored by two computer science students, Chloé Kiddon and Yuriy Brun.
In their paper, the pair outline their creation of the Double Entendre via Noun Transfer or DEviaNT approach that automatically identifies “that’s what she said” (TWSS) jokes. They call their approach “metaphorical analysis,” which carries a double-meaning all its own, and is based around weighting certain words as “sexier” than others. The team weighted several “sexy” nouns and verbs, and then ran their algorithm.
In their research, the pair also uncovered some interesting rules for TWSS jokes. For instance, the risk of invoking a TWSS joke incorrectly. From their study:
For example, in a social setting, the cost of saying “that’s what she said” inappropriately is high, whereas the cost of not saying it when it might have been appropriate is negligible.
To address this, and to produce better results, the team employed a learning algorithm. Among other things, this set the creation of false-positives 100 times higher than false negatives.
After running DEviaNT through a series of pre-identified TWSS joke material and random quotation. In their test, they used 1.5 “erotic” sentences, and 57,000 “non-erotic” sentences. The team says they achieved a success rate in excess of 71.4%. While that may not seem like much, the team says that with a larger data-set, they would expect results closer to 99.5%. Additionally, DEviaNT returned some interesting results. Again, from the study:
DEviaNT returned 28 such sentences (all tied for most likely to be a TWSS), 20 of which are true positives. However, 2 of the 8 false positives are in fact TWSSs (despite coming from the negative testing data): “Yes give me all the cream and he’s gone.” and “Yeah but his hole really smells sometimes.”
Some may detract from this research, calling it simply a lark and of little value. This would be a low-blow attack, as even the researchers describe their work as “hard natural language understanding problem.” (That’s what she said.)
Have a tip we should know? firstname.lastname@example.org