Computers Can Figure Out Your Gender From Your Tweets
There is an algorithm that can figure out your gender based on your tweets, which might be a bigger deal than you think. Considering that Twitter does not require you to specify your gender, it’d be worthwhile to plenty of people to have a program that can go through and throw users into one pile or the other based on their username and the content of their tweets. Also, it’s just cool to see what the data has to say about how girls tweet compared to how guys tweet.
Researchers at the Mitre Corporation rounded up a body of Twitter users whose genders could be identified through connections with other sites that indexed gender. They then compiled the information, fed it into the algorithm and found that compared to a base chance rate of 55%, they could guess at an accuracy rate of 66% based on just one tweet, 77% based on just the username and 92% based on all information available.
The crux of this study, and this algorithm, is that there are certain words, expressions and ways of formatting that are strongly indicative of a male or female user, like the words in the wordlist above. Not that you can get too carried away with this. While this data suggests that if you use those words you are likely to be female, it doesn’t necessarily suggest that if you are female, you are likely to use those words. Correlation vs. causation and such.
Some other categories of speech turn up more interesting information. For instance, when it comes to possessives (phrases based on a form of “my_[x]”) they found that seemingly benign things like my_zipper are strongly indicative of a male user in addition to more obvious ones like my_wife. On the flipside, my_research is strongly female in addition to things like my_bff or my_gosh.
It’s interesting to see this kind of information broken down into a statistical summary so that we can pour over it and be amused. Although a lot of it may seem like common sense, this data could prove extremely useful for people who have a vested interest in determining a user’s gender for the purpose of trying to sell that user birth control or Mountain Dew, which is probably where this technology will end up.
There are situations like the whole Gay Girl in Damascus hoax where it might be useful to have an algorithm that could look at things in a more “by the numbers” way than we humans can, but at the moment, it looks like this technology is pretty basic. Besides, reading this study is sort of like reading a walkthrough on how to fake the opposite gender, at least as far as linguistic statistics are concerned. Of course, swinging too hard in that direction could easily put you in a place where you look so stereotypical as to be a parody.
Unquestionably, the most valuable information this study provides is that, guys, if you tweet about yogurt and use emoticons, computers might think you’re a girl. Girls, be careful about mentioning your zipper or your bros.