Sunday, February 28, 2010


Are gestures and words all the same to the brain? According to this article, yes. I haven't had time to review it yet, but it's a tantalizing morsel. Of course, the fact that's it's a Business Week article does not bode well. We'll see. Money Quote (for now):

But new research, co-authored by Patrick J. Gannon, a physical anthropologist and chairman of basic science education at Hofstra University School of Medicine, suggests that the brain doesn't really care how it receives information. A waving hand up in the air to summon a waiter for "check please" works just fine. The language areas of the brain -- the highly evolved frontal and temporal lobes -- process simple gestures with the same snippet of tissue that's used to hear the prose of Shakespeare, according to Gannon's study.

City of Languages Game

The Goethe Institute gets all 21st century on our asses: City of Languages (and yes, I'm alluding to It's Always Sunny in Philadelphia).

HT: Giselda dos Santos (via Twitter #linguistics)

Friday, February 26, 2010

"Welfare" vs "Aid to The Poor"

More discussion of how wording affects polling results. Unfortunately, as Liberman has pointed out, none of this addresses the fundamental question of why. Why do the words "homosexual" and "welfare" cause more negative polling results than "Gay Men & Lesbians" and "aid to the poor"? My own weak attempt at a first pass answer (in the LL comments) is that "in both cases cited ("Homosexuals" vs. "Gay Men & Lesbians" & "welfare" vs. "caring for the poor"), the first, seemingly more controversial term is a single word and the second is a phrase. It may be the case that we silly humans find it easier to attach strong emotional semantics to a single lexical item. One could imagine a study that looked at the role syntactic heaviness plays in survey response."

It may also be the case that longer phrases are harder to categorize.

Tuesday, February 23, 2010

The Audrey Fino Failure

Steven Levy has a new article out on Google's search algorithm (HT Boing Boing). It has a brief discussion of the problem of parsing n-grams (e.g., how do you know what Times goes with in "New York Times" vs "New York Times Square"). I spent a brief time working with a person name parsing group and they were just branching out into the business name parsing field while I was there, so I know how challenging this is (you noted how I just helped you with italics, right, hehe). Unfortunately levy's article is actually quite a light weight puff piece of the "gee wiz, ain't Google swell" variety. Anyone who has spent some time in a morphology class or computational linguistics 101 course will likely find it simplistic at best. 

"Gay" vs "Homosexual"

Chris Good at The Atlantic contributes to the discussion that American opinion poll results about DADT are strongly tied to the wording used to describe the sexual orientation of the individuals affected. Money quote:

Marc has noted that there's a nomenclature issue at play: gays in the military poll a lot better as "gays" in the military, while people don't seem to like "homosexuals" serving as much. The above phenomenon in CNN's results probably furthers that point--personal opposition to "homosexual relationships" doesn't mean opposition to letting "people who are openly gay or lesbian" serve--but it's hard to see CNN's results not expressing a willingness, on the part of some, to put aside personal moral feelings in their support of a Don't Ask, Don't Tell repeal.

Language Log recently discussed this same issue: Words and opinions.

Nate Silver' has also discussed the issue: Republicans are Conservative -- but are they this Conservative?

The Linguistics Of Urine

A nice discussion of the origin of the phrase piss poor over at The Grammarphobia blog. Money Quote:

The word "piss" here is "an intensifier, usually implying excess or undesirability," according to the Oxford English Dictionary. The usage originated in the United States in the mid-20th century.

Sunday, February 21, 2010

Who Dat in Maryland

Is the University of Maryland the hottest linguistics school in the US? I started thinking about this after reading that Hal Daume will be joining the faculty. We don't normally talk about schools this way, but we talk about sports teams like this every day. So I'm gonna play a little game and cast a few linguistics departments as contemporary NFL teams. While goofing off on this, I was surprised by how similar some of the schools are to their local NFL teams.

  • U. Maryland = NO Saints. Spent the last few years quietly building a top team and now everyone sees how good they are. All around quality in all positions and solid special teams. Depth and breadth in one team. Still adding new skill players, they're looking to the future. Tough to beat. Fun to watch. Can they repeat?
  • MIT = NE Patriots. They still get a respectable number of wins, but the dynasty is over and no one fears them any more. Not likely to be a factor in the near future. Who will replace Brady?
  • Penn = Philly Eagles. Always in the playoffs. Always tough. Lots of weapons. McNabb scares everyone. Fearsome reputation. But the Lombardi trophy haunts them.
  • SUNY Buffalo = Buffalo Bills. Flashes of greatness here and there, but you can't win the big one on special teams alone. Lots of talent has come through, but too many top players have come and gone without staying. Loyal fans, but still longing for the good old days. They need to retain players and show they can upset the big dogs to regain their reputation.
  • Stanford = Indy Colts. Always a factor. Always a threat to win it. Too many great players not to be pre-season #1. 
  • UC Santa Barbara =  Oakland Raiders. Still got some big names. Still can make the big play. But the brash boldness of its reputation doesn't carry as much weight these days. Who's the next Howie Long?
  • UT Austin = Dallas Cowboys. Dangerous team. Some scary weapons. They can beat anybody on any given day. But they can be beaten on any given day too. Need a spark to be seen as a top dog.
  • Harvard = Cleveland Browns. Umm ... they still have a team?
  • UC Berkeley = SF 49ers. My sentimental pic. I've been a fan for too long to give up on you, but the glory days are fading fast. The 80s ended 20 years ago and you're still looking for Joe's replacement. Your Hall of Fame is impressive, but what have you done for me lately?

Saturday, February 20, 2010

senses and metaphors

NLP guru Hal Daume (who just announced he's taking a new position at U Maryland) has a nice post on senses vs metaphors with interesting comments as well. Money quote:

But I can imagine a system roughly like the following. First, find the verb and it's frame and true literal meaning (maybe it actually does have more than one). This verb frame will impose some restrictions on its arguments (for instance, drive might say that both the agent and theme have to be animate). If you encounter something where this is not true (eg., a "car" as a theme or "passion" as an agent), you know that this must be a metaphorical usage. At this point, you have to deduce what it must mean. That is, if we have some semantics associated with the literal interpretation, we have to figure out how to munge it to work in the metaphorical case. For instance, for drive, we might say that the semantics are roughly "E = theme moves & E' = theme executes E & agent causes E'" If the patient cannot actually execute things (it's a nail), then we have to figure that something else (eg., in this case, the agent) did the actual executing. Etc.

Sounds like a job for FrameNet (if FrameNet were better ... and the page actually loaded, that is, you may have to settle for the Wikipedia entry). My own review of a sense disambiguation hypothesis here.

affect effect its it's dolphin

I find this search query remarkably disturbing. I just don't understand what this person could have possibly been searching for. I want searching to be more .. well .. rational. I may not get to sleep tonight.

Thursday, February 18, 2010

Paper Is The Enemy Of Words

Thanks to the Twitter hashtag #linguistics, I discovered 5 Must-See TED Talks On Language. It's an interesting collection of short videos from past TED talks (still waiting for most of the 2010 TED talks to be available).

I found Pinker's 2005 talk enjoyable, if a bit conventional for anyone who has spent time in a linguistics department, that is. He runs the gamut of ditransitive/direct object alternation, Gricean maxims, game theory, etc. His key point is that language is a way of negotiating relationships.

But the real gem by far is the 2007 TED talk by Erin McKean, Editor-in-chief of the American Heritage dictionary. She is one of those rare people whose enthusiasm and bright personality is infectious and delightful. Highlights of her talk:

  • Dictionaries are compiled, not carved.
  • Lexicographers get to say fun words like lexicographical = double dactyl like Higgledy Piggledy.
  • Lexicographers are not linguistic traffic cops, they're fisherman.
  • The idea of the dictionary was fixed in the 1800s by the OED (this is bad).
  • "Dictionaries are Victorian design merged with modern propulsion".
  • OMG! She references steampunk at TED (3:47 mark). This is awesome!
  • Bad online dictionaries take away serendipity -- this is bordering on brilliant.
  • She ascends into sublime genius as she explains the ham-butt problem with dictionaries (5:01 mark). 
  • Don't hate bad words, hate bad dictionaries.
  • Paper is the enemy of words (6:12 mark).
  • Interesting analogy: what if biologists only studied cute animals?
  • How do you know if a word is real? Not because it's in a dictionary; rather, a word is real because people use it.
  • Worry less about control, more about description.
  • Undictionaried words. Brilliant.
  • Asking for help is good.
  • "We're missing California from American English." (11:55 mark)
  • "If we can find comets without a telescope, shouldn't we be able to find words?" Preach it sistah!
  • "The internet is made up of words and enthusiasm."
  • Nice point: a word without its context is pretty... pretty useless.
  • In which she uses a word with which I am not familiar, and as yet am unable to discover: synochdocaly or signicdocically or cynicdocically...
  • Right now, dictionaries are imperfect samples, but we could make THE dictionary with ALL the words.
  • Web dictionaries mean we can discard the artificial distinction between good words and bad words.
  • I love this woman.

Wednesday, February 17, 2010

A Constraint Based Approach To Figure Skating

While perhaps not quite a pure crash blossom, this headline caught me off guard:

Honestly, my first reaction was to wonder if there was a new scoring system (yes, there is) and what was wrong with the old one (bias and collusion). In other words, what was broken and how was it improved? Of course, there's another meaning of fixed -- 'to cheat.'  In other words, are figure skating outcomes rigged by cheating?  Were this headline from any other publication than the increasingly dumbed down Slate, I'd assume the ambiguity was intentional, but with Slate these days, you just never know. Note that there are at least two other senses for the word fixed: to spay/neuter a pet and to have sufficient amount of something like money (British English as in 'You Kev mate, you fixed for goin' out later? HT Urban Dictionary). With at least 4 senses to choose from, no wonder I was a tad confused.

But how did my super duper human language processing system resolve this?

The World's Lousy Fart

Dear gawd I love Sitemeter. The Brits will never get over their love of fart jokes, will they?

Inuktitut's Millionth Word!!

For some time now, the English speaking linguistics world has anxiously awaited the arrival of our millionth word in English (see here and here).  I have a bottle of Freixenet permanently on ice just for that wondrous day. But alas! It appears that Inuktitut has beaten my native language to the prize of all prizes.  According to this story about Microsoft's Inuit language software,

More than one million words have been programmed in Inuktitut through the collaboration, about 5,000 of which are new Inuktitut words (emphasis added).

I'm a gracious loser. Congratulations Inuktitut. See you at the 2 millionth mark.

Tuesday, February 16, 2010


Generally I'm not a fan of new journals. Too much academic fluff is getting published already, I see no reason to fluff even more. However, this new journal struck me as having a novel and valuable mission behind it: The Journal of Serendipitous and Unexpected Results (JSUR).

An important component of scientific discovery is a disciplined examination of research results that contradict or negate extant hypotheses. Indeed the history of science is rife with examples of important discoveries arising from such results. However, there is a distinct lack of a forum in which such results can be presented and discussed in any meaningful way. We believe a forum for and dialogue on serendipitous and unexpected results will provide valuable insight and inform modern research practices (emphasis added).

It's like they created a whole journal just for Dan Everett! My first reaction was to double check that this wasn't coming from The Onion, but it appears to be legit. Jonah Lehrer recently made a similar point (see here) about the value of failure in science. In fact, there are informal forums for this kind of discussion; namely, meetings with advisors and lab meetings (as Lehrer points out). But rarely does this discussion get formalized and published. To pique the imagination of researchers, the journal editors pose a serious series of question templates. Which of the following are relevant to linguistics?

Can you demonstrate that:
  • Technique X fails on problem Y.
  • Hypothesis X can't be proven using method Y.
  • Protocol X performs poorly for task Y.
  • Method X has unexpected fundamental limitations.
  • While investigating X, you discovered Y.
  • Model X can't capture the behavior of phenomenon Y.
  • Failure X is explained by Y.
  • Assumption X doesn't hold in domain Y.
  • Event X shouldn't happen, but it does.
(HT Boing Boing)

Sunday, February 14, 2010

Having Reason To Discourse Upon The Particle -soever

Having spent the better part of this weekend reading Thomas More's Utopia for Monday's book club meeting, for truly no more suitable exercise of mind fits me than a quiet afternoon's reading, I'm naturally predisposed to write in a style more favorable to the musty halls of libraries, once the repositories of great and wonderful learning, now the lodgings of vagabonds and stools of too too solid a material, than this the new and vast tubular nebula...(shakes it off).

I discovered in the free PDF version I downloaded from HERE* a use of the particle -soever, that I found odd. In my dialect (Northern Californian American English), there is one and only one acceptable use of -soever: 'whatsoever.' All other uses sound awkward or flat ungrammatical. But in this book, I discovered five distinct uses:
  • 12 - whatsoever
  • 8 - 'how X soever'
  • 1 - whichsoever
  • 1 - whithersoever
  • 1 - 'as X soever'
The 'how X soever' construction first jumped out at me as surprising, then I noticed the other uses. For me, 'whichsoever' is flat ungrammatical and 'withersoever' is clearly archaic (wither anything sounds archaic to me). I decided to do just a tiny bit of research on these constructions to see what I could find (in a short time, using freely available resources).

What I discovered was ...

Tumblr, Flickr, rrrrrrrrrrrrrrrrr

After considering a post on names like Tumblr and Flickr, I discovered that linguistic mystic was a couple years ahead of me having posted on the use of syllabic consonants in Web 2.0 apps HERE. Money quote:

...people seem to be recognizing the syllabicity of these final consonants, and skipping the written vowels altogether when creating their site names. The flickr -r may well have started the game, but now completely unrelated sites are becoming Web 2.0 by not including the written vowel in words with syllabic endings. Pooln chose its site name over “Poolin” or “Poolen”, tumblr over “tumbler”, and I suspect it’s only a matter of time before the first sites ending in /l/ pop up (at the time of writing, rumbl, tumbl and bumbl were already reserved). Interestingly, I’m yet to see a syllabic M site (perhaps because we generally just write the m with now vowel, as in “chasm” or “orgasm”). Who knows, though, maybe “phantm” is the next Web 2.0 ghost hunting site.

Thursday, February 11, 2010

A Brief History of 'Snowmageddon'

Following a lead from a Facebook response I saw on a friend's comment, I thought I had discovered the origin of the term Snowmageddon from a 1998  2008 storm in Minnesota HERE. However, being a linguist, I decided to follow-up a bit.  Of course, I started with Mark Davies' BYU Corpora, but had no luck discovering the term.  Then I did some Googling/Binging. In fact, the earliest instance of the term I could find comes from that distant year 2007 HERE.  2008 seems to have been a banner year for the term across the whole country. Numerous examples follow:


(screen grab from The Daily Show)

The twitter world is abuzz with snowmageddon-fever and the synonyms are coining at a rapid pace. Here's a modest list of known hashtags referring to the recent storms hitting the East Coast of the US (personal fav = #KaiserSnowze)

Twitter Hashtags

More After The Jump

Tuesday, February 9, 2010

Snowmageddon 2010!!!

(image from AP)
As winter's fury descends yet again on the Metro DC area (and my personal list of words for snow grows even larger), two words are competing for the right to name this bloody awful event. Snowmageddon & Snowpocalypse. So which is it to be?  As of right now, Snowmageddon is leading the Google/Bing frequency counts. I'm not sure if Bing always gives higher counts, but my faith, what little there ever was, in Google counts is all but gone (see here, here, here for relevant discussion).

Snowmageddon = 801,000/1,880,000
Snowpocalypse = 375,000/1,060,00

UPDATE (02/13/2010): Snowmageddon maintains its lead.
Snowmageddon = 855,000/2,280,000
Snowpocalypse = 791,000/1,350,000

For what it's worth, I personally prefer Snowmageddon because the w-m transition seems more natural (i.e., in accord with English phonotactics) than the w-p transition. Diphones are the backbone of speech synthesis systems. Surely someone has published frequencies of diphone transitions, right? I found one paper referencing frequency counts but I haven't found the data.

Kuperman, V., Ernestus, M. and Baayen R. H. (2008). Frequency distributions of uniphones, diphones and triphones in spontaneous speech. The Journal of the Acoustical Society of America 124(6), 3897-3908.

Math Rocks

(image from NYT)

The post title is intentionally ambiguous. In this case, rather than it being a full clause where math is the subject and rocks is the intransitive verb, it is a simple NP where math modifies the plural noun rocks. This is the better reading in relation to this post simply because I am referring to the second installment of Steven Strogatz's excellent NYT series wherein he explains the elements of mathematics to a lay audience. His first topic was the value of abstractness. His second, the value of rocks (or rather, the value of concrete teaching methods like using groups of rocks to demonstrate the meaning of squares, primes, odd vs even numbers, etc). This series is fast turning into a must read. In case anyone wonders why a linguist is referencing a math blog, read THIS.

Sunday, February 7, 2010

Dolphin-Bikes and The Iconicity Effect

Since the journal Cognition typically allows free online access to its current volume, I was able to read a recent paper on a topic that I've always found interesting: the role of embodied experience in language processing. The basic question is, how does our size and shape and orientation as human beings affect our language? Think about a creature that's physically very different from us, like jelly fish or bacteria or dolphins. Now imagine those creatures magically had the same cognitive capacity that we do. 

Would our language system work for them or would it necessarily have to be different? 

Speaking in Tongues

A couple good blog posts on neurolinguistic research on the phenomenon of glossolalia (aka, speaking in tongues). The take away message seems to be that yes, there is some curious brain activity correlated with speaking in tongues, it's just not clear what it means and there's so little data that not much can be confirmed or denied. But as Brain Blogger put it, the studies point to "the act of speaking in tongues as a verifiable language phenomenon that invites further study."

Friday, February 5, 2010

My Many Words for Snow

As the snow descends upon Northern Virginia in the latest winter storm, and as DC's elite line-up at their local Whole Foods and Trader Joe's clutching their reusable bags filled with heavily packaged prepared meals, cardboard-container salads, 6 bottles of wine, and one bottle of water ('cause, ya know, it's an "emergency"), I am struck by the fact that the great Eskimo vocabulary hoax (pdf) is no hoax at all!  It turns out that I too have a great many words for snow. This evening, while running a few modest errands before the night's predicted 20 inch snow drop, I meticulously recorded the various terms I uttered as synonyms for the fluffy white stuff  which descended, rather gracefully, upon the landscape.

A few choice examples (NSFW):

Thursday, February 4, 2010


I'm not normally much of a pun guy, but this one got me giggling. Speaking about the much discussed Belgian patient in a vegetative state who recently showed surprising brain activity, Dr. Allan H. Ropper, a neurologist at Brigham and Women’s Hospital in Boston, similarly warned against equating neural activity and identity. “Physicians and society are not ready for ‘I have brain activation, therefore I am,’ ” Dr. Ropper wrote. “That would seriously put Descartes before the horse” (original here).

UPDATE (02/14/2010): hehe, still makes me giggle 10 days later...


Wednesday, February 3, 2010

100 Years and Counting

(image from The MacGuffin)

Neuroblogger, and all around skeptic, The MacGuffin has a nice review of the remarkable relevance of Brodman's 100 year old map of functional areas of the brain HERE.  Money quote:

Brodmann's work helped to revolutionize modern neuroscience. While many other maps have followed Brodmann's, and even though contemporary research has shown that "his map is incomplete or even wrong in some of the brain regions," many of the areas do correlate very well with various functional areas of the cortex, which is why his work still has relevance 100 years later. 

Tuesday, February 2, 2010

Good For Them

Titled Software Company Helps Revive 'Sleeping' Language, NPR just did a story on software-based revitalization efforts for Chitimacha, a dead language once spoken by the Chitimacha tribe in Southern Louisiana. According to the story, "the last native speaker died in 1940" so the revitalization efforts utilize "hundreds of hours of scratchy recordings on wax cylinders, along with extensive notes from linguist Morris Swadesh." Since I did my graduate work at a linguistics department steeped in descriptive field linguistics, the name Swadesh is well known to me (I've actually used the Swadesh lists). He was crucial to the early 20th century efforts to classify the indigenous languages of North America.

But the story really piqued my interest when they noted that Rosetta Stone, who is creating the software package, will not own the final product. Rather, the Chitimacha tribe will and they will have the right to distribute it for free (or charge, whatever they want, they'll own it). Rosetta Stone has a web page describing their revitalization and preservation efforts here. They appear to work with communities to procure funding through government and private foundation grants. I was impressed with the description of their process:

You select the team of language experts, teachers, and speakers from your community. Rosetta Stone provides the language teaching template, training, technology, recording and photography services, and project planning. Rosetta Stone turns your knowledge into the final user-ready software.

After 5 years in industry, I have come to respect the value of smart leadership at the project planning level. It sounds like Rosetta Stone is leveraging their considerable skills and resources at the project planning and execution level to help small communities realize their language and culture related goals. Good for them.

(PS: just to be clear, I have absolutely no connection, professional or otherwise, to Rosetta Stone. I've never even used any of their products; this just struck me as a good example of corporate responsibility).

Unreasonable Effectiveness

Let's be honest, many of us find math intimidating. But it need not be. I recently explained why linguists should study math; now, over the next several weeks, Steven Strogatz,  professor of applied mathematics at Cornell, will be blogging an informal introduction to the basic concepts of mathematics from pre-school to grad school. He starts with Sesame Street and counting fish to explain the basic idea that numbers are abstractions:

The creative process here is the same as the one that gave us numbers in the first place. Just as numbers are a shortcut for counting by ones, addition is a shortcut for counting by any amount. This is how mathematics grows. The right abstraction leads to new insight, and new power.

This is a NYT blog, so let's hope they don't put it behind their new paywall..

HT kotkke

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...