Telltale Tweets: Geotagged by Language

By Max Eddy Jan 10th, 2011, 5:37 pm

Recommended Videos

I hail from the midwest, and most people can tell because of how I seek out a refreshing “pop” as opposed to a heathen “soda.” But a new paper filed with the Linguistic Society of America, claims your location can be determined to within 300 miles based off your 140-character tweets alone.

The New Scientist is reporting on some of the results of the study, which drew on a source of 9,500 users totaling 4.7 million words.

The researchers found that if you are cool in the San Francisco area, you will probably write “koo” on Twitter, but in southern California, you write “coo”. You are “hella” tired in northern California, “deadass” tired in New York, and in Los Angeles you use an acronym for an obscenity.

On the face of it, this revelation should not be so surprising. Everyone carries a little piece of their language into their online interactions, especially in so casual a setting as Twitter. Regardless, it is a very impressive proof of the researchers methodology, if a little creepy, that they were able to determine locations so precisely.

Of course, Twitter users shouldn’t worry so much about having their current locations gleaned from their speech. A Midwestern transplant to the East Coast, for instance, might simply appear to be living somewhere in Ohio instead of SoHo.

While physical stalking is probably less of a risk, another creepy repercussion is the possible data mining of users based on their language. Can we look forward to ads, tailored to our perceived desires, based off our estimated locations gleaned from our tweets? Leave it to marketers, they’ll find a way.

(A Latent Variable Model for Geographic Lexical Variation via The New Scientist)

Have a tip we should know? [email protected]