Monday, November 30, 2009

K is for Kanye

(image from Lemur King's Folly)

Wired magazine has a cute article on geek neologisms, 11 Ways Geeks Measure the World (HT kottke). Personal favs:

Warhols (fame duration)
1 Warhol equals 15 minutes of fame, So if you’ve been famous for three years, that’s just over 105 kilowarhols. I’m going to go out on a limb and say that there’s a critical point — varying from celebrity to celebrity — where that person has outstayed their welcome, and uh … becomes synonymous with a feminine hygiene product (and the bag it came in). In keeping with nuclear physics, I’m happy for this to remain as k=1 (where ‘k’ is for ‘Kanye’).

Frinks (geekiness)
I’m sure I’ll take a lot of flak for this, but take it as a suggestion, at least — a standard unit of geekiness called the frink, and that it be measured on the ‘Hoyvin-Glayvin’ scale. Simpsons fans won’t need to ask why. To figure out where you fall on the Hoyvin-Glayvin Scale, I’ve compiled a handy reference:

0 Frinks – thought the JockDad April Fool’s Prank was a good direction for this blog.
10 Frinks – believes Greedo fired first.
20 Frinks – you’re the family friend who “knows” computers.
30 Frinks – on Twitter, but only following Ashton and Oprah.
40 Frinks – you don’t hate sci-fi, but don’t have an opinion on things like Kirk vs. Picard either.
50 Frinks – You’re the family friend who actually does know computers. You probably watch the Battlestar Galactica reruns, too.
60 Frinks – Solidly geeky. Almost stereotypically so.
70 Frinks – Geeky enough to know geeks don’t like fitting into stereotypes.
80 Frinks – You’ve probably attended several cons, contemplated which dice to bring to the game, and own at least one Starfleet/Colonial Fleet/Galactic Empire uniform.
90 Frinks – It’s been a long time since you told a joke that didn’t reference C#, Linux or the Dune saga.
100 Frinks – Aren’t you Dr. Sheldon Cooper?

Sunday, November 29, 2009

On Pointiness

(screen grab from Stamp and Shout)

I've seen the Coexist bumper sticker above several times in the last week. I don't know how long it's been around, but a thought struck me the last time I saw it: there's no 'x'. All of the symbols used actually contain a version of the letter they are replacing, except the Star of David. There's no actual X figure within that symbol. Rather, it's the prevalence of pointiness that allows it to make for a suitable X replacement. I wonder if there is a different cognitive process at work? While we are reading the other letters, perhaps we are not actually reading the Star of David as an X, but rather engaging in some form of visual approximation (at least at first). Similar issues arise with textings like l8ter. Perhaps neuroscientist and reading expert Stanislas Dehaene has an answer in his new book Reading in the Brain. In Jonah Leher's review of that book, he suggests a possible answer:

One of the most intriguing findings of this new science of reading is that the literate brain actually has two distinct pathways for reading. One pathway is direct and efficient, and accounts for the vast majority of reading comprehension -- we see a group of letters, convert those letters into a word, and then directly grasp the word's meaning. However, there's also a second pathway, which we use whenever we encounter a rare and obscure word that isn't in our mental dictionary. As a result, we're forced to decipher the sound of the word before we can make a guess about its definition, which requires a second or two of conscious effort.

Perhaps this second pathway is the route needed to decipher the Star of David as X and 8 as -ate-. Just wondering out loud...

Oh, and btw, after staring at it a moment, I see that my initial reaction was wrong. There are actually four six Xs in the Star of David (thanks Q. Pheevr!), two each between each set of parallel lines. It takes a bit of magic picture blurry eye technique to see them (there's a more scientific term for that, right?). However, I doubt those Xs are recognized during the initial reading of the bumper sticker.

The Myth of 'Ghoti'

(cartoon found at Caldwell Reading)

In reviewing the new book Reading in the Brain by neuroscientist Stanislas Dehaene (do check out the cool Matrix-like book page), neuro-journalist Jonah Lehrer repeats the common claim that George Bernard Shaw coined the use of the spelling of fish as ghoti to demonstrate how weird English spelling is. I myself repeated this same claim to many students in the past, and in a few business presentations. Within linguistics, it has long been a truism. Rarely did anyone think to challenge its veracity. Until April 23, 2008 at 11:59 pm that is. Over a year and a half ago, Benjamin Zimmer debunked this claim as false on Language Log (see his post here). Zimmer showed not only that there is no record of Shaw having used it, but also that the use of ghoti goes back at least to "1855, a year before Shaw was born."

It remains a fun little example, mind you, just not attributable to Shaw.

BTW, if you do a Google image search on ghoti, as I just did, you will discover an underground, almost cultish devotion to the word involving Jedis, bimbos, and indie rock bands, oh my.

Saturday, November 28, 2009

Google Linguistics 2

(screen shot from WebCorp)

I have posted before about the use of Google as a linguistics search engine here. Today, I ran across WebCorp Live, which allows a user to perform some linguistically interesting searches over the web as a corpus. From their site:

WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. WebCorp LSE offers:

* enhanced sentence boundary detection
* date identification
* 'boilerplate' removal
* collocation and other statistical analyses
* grammatical tagging
* language detection
* full pattern matching and wildcard search

In spirit, this is quite similar to Mark Davies excellent BYU Corpus resources. If I get a chance to play with it some more, I might try running some of my old dissertation searches though it. That should be a good test.

UPDATE: see my original post titled Google Linguistics which more specifically talks about using Google for research.

Friday, November 27, 2009

Online Psycholinguistics Experiments (repost)

NOTE: Given this blog's recent surge in popularity (props to Language Log, Language Hat, and something called EastSouthWestNorth blog) I decided to update and repost this because I believe in increasing the use of online methodologies for linguistic research and I hope to send some of you good folks reading this right now over to these good folks below and hopefully you will participate in their experiments. Generally it takes little of your time and the results could help further our understanding of just how the heck language works ('cause honestly, no one really knows).

I happily request submissions of other online linguistics related experiments.

Original post here.

Experimental psycholinguists requires experimental subjects like any other empirical cognitive science. Unfortunately, researches are often constrained by limited resources. Typically, psycholinguists use college students bribed with money or extra credit as subjects. It's not unheard of for a published psycholinguistics study to have involved as few as 12 subjects. This has been a necessary evil because there has never been a good way to collect large numbers of subjects together and provide them with a coherent experimental design.

Lately, however, researchers are turning to the web as a place to conduct experiments with large groups of subjects. Yes, there are issues regarding control (e.g., if you need native speakers of English, how can you ensure that a subject really is a native speaker?), but these issues come up in all types of experimental paradigms. I believe that good standards and practices to ensure quality online psycholinguistic experiments will emerge over time. So, I'm all for moving ahead.

With that in mind, here are a set of sites offering online psycholinguistic experiments:

  • The Colour Imaging Research Group at the London College of Communication: Color Naming.
  • CogLab2 (the Cognitive Psychology Online Laboratory)

Purplish Blue

I just completed a nifty little online color naming experiment that is being conducted by The Colour Imaging Research Group at the London College of Communication. I'm a fan of using the web for linguistic experiments so I'm always looking for these kinds of things (see a related post here). The experiment is being conducted in four languages: English, German, Greek, and Spanish (and they are adding more). Try it for yourself here.

As you see from my responses above, I'm lacking in nuanced color naming skills. Apparently my world is a giant purplish nightmare. I had two impressions from my own responses:

1. I tended to want to blend names. Partly this was my own lack of lexical items (who knew there was a color named catawba?), but it was equally due to my visual perception. I perceived the colors as blends. Now, is this because I only had a few color names and language constrained my thinking about what I was seeing? Not sure and I ain't goin' there.

2. I tended to use a basic level term like blue when I first encountered a variation, then I was forced to come up with an adjectival variant like purplish blue when I encountered the next variation. However, the original color was not necessarily what I actually think of as basic level blue when given the colors together. I could imagine a second version of this experiment where all colors are given together and visual comparisons are made. I believe I would have assigned the color names differently. I do have a sense that there is such a thing as basic level blue, but I can't make that distinction in isolation.

BTW, there are thousands of color names. Check out this extensive site of various color name dictionaries: Color-Name Dictionaries.

And here's a nice Wikipedia page on the classic work by Berlin and Kay that started a revolution in cognitive linguistics: Basic Color Terms: Their Universality and Evolution.

Thursday, November 26, 2009

Gee Wiz, Alien Language

(image of USC professor Paul R. Frommer from LA Times)

There are certain topics in linguistics that are far more interesting to non-linguists than linguists themselves. Animal language is a classic example, as well as language evolution. And third on the list is alien languages from movies (as opposed to Kirby's artificial languages). For example, for decades now people have been fascinated by Marc Okrand's Klingon (this guy took it a little too far though; isn't this child abuse?).

When people hear that someone has "invented a language," they seem shocked, shocked! to discover that such a thing occurred. As if it's a difficult feat. There seems to be a gee wiz factor. In fact, the average second year grad student in linguistics can do it, and typically they do, just for fun. Logicians are required to do it. Here, let's make up a language right now:

Language X

bbhl = /bel/, intransitive, 'to run', (actor)
hhli = /hla:/, transitive verb, 'to hit', (undergoer, actor)
ttrsh = /dos/, proper noun, 'Wally'
pploi = /pli/, proper noun, 'Sparky'
8_9 = /ha_mu/, particle, simple past

S --> V + N
S --> V + N + N
V --> prt+V+prt

There. Done. I just invented language X and it took all of 20 minutes. Now, which of the following sentences are grammatical in language X and what do they mean? Which rules to do ungrammatical sentences break?
  1. bbhl ttrsh
  2. ttrsh bbhl
  3. 8hhli9 pploi ttrsh
  4. ttrsh pploi
  5. 8hhli9 ttrsh pploi
  6. hhli9 ttrsh pploi
Answers below.

The latest variation of this hoopla comes to us from James Cameron's latest big budget movie Avatar. Cameron recruited a linguist from USC, Paul Frommer, to create a language for his goofy blue aliens. But an article about this from the LA Times involved a bit of an exaggeration: "USC professor creates an entire alien language for 'Avatar'" (my emphasis).

Wow! An entire language, you say? That's gotta be at least 30 or 40 thousand words and at least a couple thousand rules, right? Nope. In fact, the language only contains about 1000 words. From the article itself: "Between the scripts for the film and the video game, Frommer has a bit more than 1,000 words in the Na'vi language, as well as all the rules and structure of the language itself." It seems a tad redundant to say "rules and structure" of a language, but that's neither here nor there. As far as I can tell (after just a little bit of Googling) the Na'vi language has not been released so it's not possible to follow up on just how extensive this language is beyond the word count reported in the article. I'm sure a grammar is on the way. Sci fi fans are notoriously detail oriented. But it brings up a more serious issue: what counts as a language? Language X above certainly counts as a language in the simple sense of having a lexicon and set of rules for combining them. Heck, I even threw in some phonetics. If we want to claim that language X is not an entire language, we're gonna have to come up with some guidelines for what counts as an entire language. The logicians have their rules for formal languages, of course, but we need some natural human language guidelines. I'm sure the pidgin/creole experts have thoughts on this and this is one of things that pidgin & creole expert Derek Bickerton ruminates on in his book Adam's Tongue. See my reviews here. He's concerned with what proto-language must have looked like when humans first used language.

Now, I do not mean to belittle professor Frommer's accomplishment. I can certainly imagine spending a lot of time and energy on creating a language. But it's not rocket science. It's closer to knitting.

  1. bbhl ttrsh = 'Wally runs'
  2. *ttrsh bbhl -- bad because all sentences in X begin with a verb
  3. 8hhli9 pploi ttrsh = 'Wally hit Sparky'
  4. *ttrsh pploi -- bad because all sentences in X must have a verb
  5. 8hhli9 ttrsh pploi = 'Sparky hit Wally'
  6. *hhli9 ttrsh pploi -- bad because past tense morpheme is not properly realized
UPDATE: cute HTML note. My original argument structure definitions used angle brackets and I only just now realized they didn't show up in the post, because, of course, those are interpreted as HTML tags. So I used parens.

UPDATE 2: a commenter points out a more complete interview with Frommer here.

UPDATE 3: I scooped Ben Zimmer on this one (HT Language Hat), another LL scoop for me.

UPDATE 4: Ben Zimmer has posted a gust post by Frommer in which he gives a brief description of the language here.

Wednesday, November 25, 2009

Delicious Martian Fruit

(screen shot from University of Edinburgh)

I assume you'll be having some yummy neluka pie, fresh kapihu, or baked lanepi with cinnamon to finish off your Thanksgiving meal tomorrow. Personally, I can't resist a stiff vodka & mola juice cocktail (only a radish garnish will do, people, I'm a stickler for proper cocktail garnishment).

Well, maybe this is what we'd eat if we spoke the spooky Alien Language Simon Kirby et al. are growing (HT LL). The good folks across the pond at the University of Edinburgh's School of Philosophy, Psychology and Language Sciences, Department of Linguistics and English Language Language Evolution and Computation Research Unit (takes a breath) have been trying to discover how languages evolve. To further this, they have been conducting some interesting experiments with artificial (aka 'alien') languages that begin small (e.g., with just a few fruit names), but which are then grown via cultural transmission of subsequent participants.

What they are finding, not unlike Marc Changizi in some ways (see here) is that "language has adapted to be good at being learned by us. This can happen because language evolves culturally through being repeatedly learned and used by generations of individuals."

They have also posted online what they call "an early version of an online cultural evolution experiment game relating to this work." However, it seems to be, at first at least, a version of the classic toy/game Simon (a sort-of prehistoric Play Station) where players have to repeat a series of sound/color stimuli. Unfortunately, unlike the familiar kid's toy, this one starts out at a fairly difficult level. No easy warm up period (hmmm, much like babies learning language???). In any case, I found it frustrating and my gaze was quickly distracted by milk and cookies...well, beer and cookies (I'm saving the vodka molas for tomorrow).

Tuesday, November 24, 2009

Abracadabra! I Win!

(image from

I tend to avoid these days because, frankly, I typically find myself scoffing at some idiot article they've published that promotes such a ridiculous mis-reading of academic research that it's hardly worth finishing... like this one from today: A Better Way to Fight With Your Husband which linked to this article: The Healthiest Way To Fight With Your Husband. It's a classic piece of idiot journalism worthy of a Full Liberman* if only it weren't so trivial and obvious as to be beneath the man, so I'll take a crack at it.

The big point is that fabulous new research from real life scholars (psychologists nonetheless, and they're almost like scientists) proves that women should use particular words when yelling at their husbands (the experiment used heterosexual married couples). Pretty awesome, ain't it! Just use the right words, and like a magic key you can unlock the mysteries of the brain and make it do what you please (okay, I'm starting to exaggerate, but less than you might think).

First let's look at the way the academic article is summarized in the puff piece that Slate linked to:

A new study of married couples, however, has found physiological evidence for one technique to diffuse tension: choosing the right fighting words. Couples who used analytical language, such as “think,” “understand,” “because,” or “reason,” during heated arguments were able to keep important stress-related chemicals in check, according to research published in the latest issue of the journal Health Psychology. Cytokines are inflammatory chemicals that spike during periods of prolonged tension and can lower your immunity and lead to early frailty, Type 2 diabetes, arthritis, and some cancers. The authors noted a curious gender twist in their results. Husbands benefitted from their wives’ measured language, but a man’s carefully chosen words had little effect on a woman’s cytokine balance.

To be fair, here is a passage from the authors' abstract of the original article:

Effects of word use were not mediated by ruminative thoughts after conflict. Although both men and women benefited from their own cognitive engagement, only husbands' IL-6 patterns were affected by spouses' engagement. Conclusion: In accord with research demonstrating the value of cognitive processing in emotional disclosure, this research suggests that productive communication patterns may help mitigate the adverse effects of relationship conflict on inflammatory dysregulation.

And here is a passage from this interview with the first author, Jennifer Graham, Penn State assistant professor of biobehavioral health:

"We specifically looked at words that are linked with cognitive processing in other research and which have been predictive of health in studies where people express emotion about stressful events," explained Graham. "These are words like 'think,' 'because,' 'reason' (and) 'why' that suggest people are either making sense of the conflict or at least thinking about it in a deep way."

For the study, the 42 couples made two separate overnight visits over two weeks.

"We found that, controlling for depressed mood, individuals who showed more evidence of cognitive discussion during their fights showed smaller increases in both Il-6 and TNF-alpha cytokines over a 24-hour period," said Graham, whose findings appear in the current issue of Health Psychology.

During their first visit, couples had a neutral, fairly supportive discussion with their spouse. But during the second visit, couples focused on the topic of greatest contention between them.

"An interviewer figured out ahead of time what made the man and woman most upset in terms of their relationship, and we gave each person a turn to talk about that issue," said Graham.

Researchers measured the levels of cytokines before and after the two visits and used linguistic software to determine the percentage of certain types of words from a transcript of the conversation. (my italics)

The researchers' results suggest that people who used more cognitive words during the fight showed a smaller increase in the Il-6 and TNF-alpha. Cognitive words used during the neutral discussion had no effect on the cytokines.

When they averaged the couples' cognitive words during the fight, they found a low average translated into a steeper increase in the husbands' Il-6 over time. There were no effects on the TNF-alpha. However, neither couple's nor spouse's cognitive word use predicted changes in wives' Il-6, or TNF-alpha levels for either wives or husbands.

Graham speculates that women may be more adept at communication and perhaps their cognitive word use had a bigger impact on their husbands. Wives also were more likely than husbands to use cognitive words.

Well, thank gawd they used fancy computers to count cognitive words! After reading these three descriptions, it was clear to me that the original work is likely flawed. I don't have access to the original study, unfortunately, but taken together, the abstract and first author's interview suggests to me that it makes the same mistake most non-linguists make: they assume the linguistics part is easy and don't put enough effort into it. Dr. Graham's initial claim in the interview jumps out at me: "We specifically looked at words that are linked with cognitive processing in other research..."

Hmm? Words that are "linked with cognitive processing?" What does this mean? I would love to see the references page to follow-up on this "other research." Graham later refers to these as "cognitive words." They are alternately referred to as analytical language, measured language, conflict-resolution words, and cerebral words. From the puff piece and the interview we have five examples:
  1. because
  2. reason
  3. why
  4. think
  5. understand
Huh? One conjunction, one interrogative, and three verbs of cognition. Hmmm. Is there any intuitive reason to believe that "because" is "linked with cognitive processing" in some special way that other words are not? Is it the fact that it grammatically links clauses? Many words do this. Are the verbs on the list simply because they are verbs of cognition? Are run and jump less "linked with cognition" because they are verbs of motion? I would have to speculate on what this "other research" discovered about the magical properties of the special words that make them the key to brain chemicals. Abracadabra! Poof! Also, it's not at all clear to me why they averaged the couples' frequency count. What is this average supposed to tell us?

However, the puff piece makes the leap into idiotsville all by itself:

"The study is significant because it’s one of the first to link language with biological markers and show what kinds of words help sparring couples rather than just recommending they “communicate more,” explains James Pennebaker, chair of the department of psychology at the University of Texas-Austin, who has studied the role of language on relationships." (my italics).

Nope. No link. Just a transcript. Given the study's methodology of counting words in a transcript, at no point could they possibly have been able to show any causal relationship between a particular word's utterance and the levels of a particular chemical in a person's brain.

The puff piece authors pull the classic journalist's trick of "being fair" by adding actual linguist Deborah Tannen's skepticism of the "link" between particular words and particular chemicals, but they abandon all skepticism just a few sentences later and end with a bang! "Even when it seems like he is ignoring you, your words may be having an effect—at least on a chemical level,” says Graham"


*I'm going to start using the term "The Full Liberman" to refer to Mark Liberman's excellent manner of debunking bad journalism (see here and here for examples).

UPDATE (11/28/09): A nice summary of Full Liberman's at LL here.

Sunday, November 22, 2009

Are All Writing Systems Alike?

(image from The Topography of Language)

Just started reading an interesting article by the evolutionary biologist Marc Changizi who claims in The Topography of Language that all the world's writing systems utilize the same set of shapes because these shapes were selected for during the evolution of our visual system (or something like that). More as I digest this interesting claim.

Money quote:
Amongst both non-linguistic and linguistic signs, some visual signs are representations of the world­ e.g., cave paintings and pictograms, respectively­ and it is, of course, not surprising that these visual signs look like nature. It would be surprising, however, to find that non-pictorial visual signs look, despite first appearances, like nature. Although writing began with pictograms, there have been so many mutations to writing over the millenia that if writing still looks like nature, it must be because this property has been selectively maintained. For non-linguistic visual signs, there is not necessarily any pictorial origin as there is for writing, because amongst the earliest non-linguistic visual signs were non-pictorial decorative signs. The question we then ask is, Why are non-pictorial visual signs shaped the way they are?

HT: Stanislas Dehaene (via The Daily Dish)

Wednesday, November 18, 2009

Random Linguistics

(randomly discovered blog miresua conlang)

For reasons that are not entirely clear to me, there is a remarkable prevalence of what I'll call quazi-linguistics blogs on Try, as I just did, using the "Next Blog" button above at the top left of this page ten or more times. Each time it will take you to a randomly selected blog within the blogger network of blogs (No, I'm wrong here. see update below). It's pretty cool. Almost as good as StumbleUpon. But I suspect you'll find, as I did, a preponderance of language/linguistic related blogs. My rough estimate was 60% of the blogs were language related. Now, this was driven up a bit by many ESL sites, but that counts, as far as I'm concerned. Unfortunately, the quality of these blogs was poor, at best (e.g., see the tiresome anti-passive voice post here).

Why are so many bloggers blogging about language issues? Maybe Geoff Nunberg is right and "the Internet turns everybody into a linguist" (see here).

UPDATE: Commenter MPJ cleared up the mystery.'s Next Blog button is NOT random (it used to be).'s explanation here (HT The Real Blogger Status). Money quote:

We've made the Next Blog link more useful, by taking you to a blog that you might like. The new and improved Next Blog link will now take you to a blog with similar content, in a language that you understand. If you are reading a Spanish blog about food, the Next Blog link will likely take you to another blog about food. In Spanish!

I'd be interested to know if they're using the same technology as their Ad Sense product to detect "similarity." How do they determine the anchor blog?

Also, I think I can still make a similar claim to my original one: of the blogs that are related to language, most are prescriptivist. Fair?

Crowdsourcing Annotation

(image from Phrase Detectives)

Thanks to the LingPipe blog here, I discovered an online annotation game called Phrase Detectives designed to encourage people to contribute to the creation of hand annotated corpora by making a game of it. It was created by the University of Essex, School of Computer Science and Electronic Engineering. Of course, they have a wiki, Anawiki. I'm not crazy about the cutesy cartoon mascot (they given it a name: Sherlink Holmes. Ugh. I guess Annie would be a bit too obvious?) . I've wondered aloud about this kind of thing before, so I'm glad to see it coming to fruition.

I haven't started playing the game yet, but I'm looking forward to it. For now, here is the project description:

The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words.

However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. AnaWiki is a recently started project that will develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).


Wednesday, November 11, 2009

The Right to Write

Last Friday, one of the world's most articulate and brave bloggers, Yoani Sánchez, was brutally beaten and kidnapped by her own government. Read her description of the events here A gangland style kidnapping.

Read her blog Generation Y.

Thankfully, she is recovering and remains resolute as a blogger and dissident. In her own words:

"Thank you to friends and family who have looked after and supported me, the effects are fading, even the psychological ones which are the hardest. Orlando and Claudia are still in shock, but they are incredibly strong and also will overcome it. We have already begun to smile, the best medicine against abuse. The principal therapy for me remains this blog, and the thousands of topics still waiting to be touched on."

Sunday, November 8, 2009

Infrequently Asked Questions

A nice example of a linguistic construction is Frequently Asked Questions because, as far as I can tell from the lists of questions on most of these pages, they are almost cerytainly NOT frequently asked at all. I've never once seen a page that lists the number of times a particular question has been asked nor any discussion of the method of counting said frequency. It simply goes without saying that "Frequently Asked Questions" are simply those that the creator of the page either a) perceives as important or b) wants readers to think about (some are clearly designed by marketers to push certain points of view).

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...