Monday, April 26, 2010

On The Campus Frame

(UT Austin's Main Building)

On Saturday morning, I found the above sign pragmatically odd.

Wondering down Guadalupe that morning after my latte at The Hideout (and wishing I'd known about the Texas Round-Up 5k ahead of time so I could have run), I decided to check out the UT Austin campus. The morning was a gloriously sunny 70 degrees, no clouds or wind, and I love exploring college campuses.  UT has a nice, almost stereotypical layout with large academic buildings, rolling hills, stone staircases, the large football stadium to the West, and the UT Austin Tower ominously presiding over all. My meandering tour brought me up a series of stairs to the face of the tower's building. Academic buildings tend to be named after people (e.g., the building next to the tower is called the Dorothy L. Gebauer Building). But when I walked up to the tower building's sign, all I found was a pragmatics puzzle: Main Building.

I snapped the pic above and strode over to Caffé Medici to ruminate on why I found this sign so pragmatically odd. It is, in fact, less obscure than Dorothy L. Gebauer, right? Quite straight forward. This is one building amongst many which serves as some sort of center point for activity. First among equals, to borrow a term from the political realm. This should be a perfect instantiation of FrameNet's Locale_by_use frame (of which campus is in fact a lexical unit) whereby the NP Main Building evokes a Constituent_part ("Salient parts that make up a Locale") of a Locale (A stable bounded area). But why did did I find it odd? 

After lunching at Veggie Heaven (and escaping a near death experience crossing Lavaca), I could only come up with the suspicion that the high frequency of person names for academic building trumps the logic of the frame model. In other words, I accept that there probably exists some cognitively real conceptual object roughly equivalent to a frame, and our human language system uses frames in some way to build a semantic representation of an input like Main Building in order draw inferences about the role of that object in some state-of-affairs; nonetheless, if objects within that state-of-affairs have a statistically significant tendency to be named using highly specific non-functional terms, then a building with a general and functional name will stand apart as somehow not a proper member of the state-of-affairs. Membership in the group is NOT determined by its role in a frame, but rather by its similarity to other members of the group.

I'm reminded of the beer from Repo Man:

(image from

This generic BEER (which was, ever so briefly, a real product in American stores in the early 1980s) never quite took hold. It just didn't fit. I suspect BEER is a nice example of monopolistic competition. They flouted the need to distinguish their nearly identical product in a tough competitive market, hoping their floutestation alone would distinguish it (yep, I made that word up and I'm sticking with it). It would, however, take some logical flips and leaps to make the connection to the Main Building example (not saying there ain't a cognitive connection, just sayin I'm a lazy blogger). Phew! That took a lot of words to state the obvious...and explaining the card game frame necessary to understand my use of a trumps is another post entirely.

NOTE: Yes, I challenged myself to include as many Austin sites as possible in this post. Just 'cause I've been spending the last few weekend sin Austin. But rest assured, my morning followed almost exactly this story.

BTW: What the hell is that image on the banner of UT Austin Linguistics homepage? Is that an FSA leading into a spectrogram? Huh? If yes, shouldn't the nodes have state labels and the arcs have transition labels? And why does the final node transition to the little stop image? 

Oh yeah, and I really hate this:   (hint, see source for HTMl code).

Text Messaging and Language Use Survey

Brennan Gamwell, a student at Georgetown, has posted on online survey for language and text messaging HERE.

Sunday, April 25, 2010

When Is Bilingualism Bad?

When it's a litmus test for Supreme Court nominees, and Canada might go there: Linguistics above knowledge.

Money quote:

If the Senate does not defeat it, Bill C-232 will amend the Supreme Court Act to insist that all future appointees to our highest court be fluently bilingual, and not just fluent in conversational French and English, but in both official legalistic languages. It will make it a prerequisite for justices to be able to hear all cases without the aid of translation.

In practical terms, the bill will restrict appointment to a very small number of bilingual legal scholars and lower-court judges. It will make it difficult for Canadians outside a narrow strip from Ottawa, through Montreal and Quebec City, and into Moncton, to ever be appointed to the court that has the final say over how the Charter will be interpreted and what rights we may have.

I don't know what the chances are that this Canadian bill passes, but the article suggests it's highly likely.

HT:morsmal via Twitter #linguistics).

Saturday, April 24, 2010

Syntactic Structures of the World's Languages

A new free online resource for linguists: Syntactic Structures of the World's Languages.  I haven't had time to play around with it, but the list of contributors is impressive.Money quote:

SSWL is a searchable database that allows users to discover which properties (morphological, syntactic, and semantic) characterize a language, as well as how these properties relate across languages. This system is designed to be free to the public and open-ended. Anyone can use the database to perform queries.

Emphasis added (yes, that's for you LDC, haha).

(HT WordAficionada via Twitter #linguistics)

Thursday, April 22, 2010

Boring Volcanoes

While debating the pronunciation of Eyjafjallajökull has been all the rage in the blogosphere (see here), a more ominous threat has emerged, the eminent reuption of the great and powerful Katla! ...yeah, my reaction too. Somehow, the pronunciation difficulty of Eyjafjallajökull added to its pop cultural caché. I fear Katla, regardless of the might of its wrath, will suffer a sort of pop cultural Marsha Marsha Marsha syndrome.

For what it's worth (not much), Wikipedia's pronunciation is here.

Tuesday, April 20, 2010

Word Frequency Lists

Mark Davies and company over at BYU have released quite a collection of English word frequency data HERE.

Here's a taste:

Our data is based on the only large, genre-balanced, up-to-date corpus of American English -- the 400 million word Corpus of Contemporary American English. You can be sure that the words in these lists and in this dictionary -- sorted from most to least frequent -- are really the most common ones that you will encounter in the real world.

The frequency data comes in a number of different formats:
  • An eBook containing up to the 20,000 most frequent words, along with the 20-30 most frequent collocates (nearby words) and the synonyms for each word -- which provide valuable insight into meaning and usage.
  • A printed book (from Routledge) with the top 5,000 words (including collocates) and thematic lists.
  • Lists with the top 200-300 collocates for each of the 20,000 words, giving more than 4,300,000 node word / collocate pairs
  • Simple word lists of the top 10,000 or 20,000 words, but without collocates or synonyms.
  • A free word list -- top 5,000 words, but no collocates or synonyms.
  • N-grams: more than 155 million trigrams, which can be queried by word form, lemma, part of speech, etc

Saturday, April 17, 2010

and a thousand new dissertations were born...

The U.S. Library of Congress will be creating "a digital archive of Twitter as a historical record." Money quote:

In an extraordinary agreement with Twitter's founders, the Library of Congress – the world's largest library and America's oldest federal institution – is to create a digital archive of the several billion tweets publicly posted on the social networking site since its inception in 2006.

Sounds like one deeeeeeelicious linguistic corpus to me. Me want.

Thursday, April 8, 2010

Tweeting Kluges

The Twitter hashtag #linguistics is ablaze with links to this Scientific American article about Gary Marcus' claim that language is far from "optimal." It's a pretty short and simple article, not much meat, but it has a lot of links (maybe too many?).  Money quote:

Visual abilities have been developing in animal predecessors for hundreds of millions of years. Language, on the other hand, has had only a few hundred thousand years to eke out a place in our primate brain, he noted. What our species has come up with is a "kluge," Marcus said, a term he borrows from engineering that means a solution that is "clumsy and inelegant, but it gets the job done." 

Tuesday, April 6, 2010

John’s grandmother feeds the monkey every morning

There's a brief and shallow puff piece out discussing new research about differences in how the brain processes word order versus inflection with the absurd title Languages use different parts of the brain. Even if you know nothing about linguistics you can quickly determine that the title is absurd because the article itself admits that the study involved used only ONE language! This was not a cross-linguistic study. It says nothing about what parts of the brain different languages use. The author makes the leap of logic assuming that (A) because languages can be typed according to their morphology (fusional, agglutinating, etc) that (B) therefore languages that are predominantly agglutinating must be processed differently than fusional languages. Nope. The study did not show this.

The research paper which spawned this puff piece is Dissociating neural subsystems for grammar by contrasting word order and inflection Aaron J. Newmaa, Ted Supalla, Peter Hauser, Elissa L. Newport, and Daphne Bavelier, but it's behind a firewall, of course. As far as I can tell from the abstract, the researchers used sign language stimuli to discover that sentences which relied on word order to convey case information activated different patterns in the brain than sentences using inflections (which the puff piece quaintly calls "tags"). From the abstract:

During functional (f)MRI, native signers viewed sentences that used only word order and sentences that included inflectional morphology. The two sentence types activated an overlapping network of brain regions, but with differential patterns. Word order sentences activated left-lateralized areas involved in working memory and lexical access, including the dorsolateral prefrontal cortex, the inferior frontal gyrus, the inferior parietal lobe, and the middle temporal gyrus. In contrast, inflectional morphology sentences activated areas involved in building and analyzing combinatorial structure, including bilateral inferior frontal and anterior temporal regions as well as the basal ganglia and medial temporal/limbic areas. These findings suggest that for a given linguistic function, neural recruitment may depend upon on the cognitive resources required to process specific types of linguistic cues. (emphasis added).

The final sentence of the abstract is compelling as it makes a claim about neural recruitment and cognitive  resources. NOT about different languages using different parts of the brain!  There are some respected linguistics on the author list, so I suspect the paper worth reading (if they would let me, that is!). But the original puff piece did provide two of the stimuli:
  • John’s grandmother feeds the monkey every morning.
  • The prison warden says all juveniles will be pardoned tomorrow.
Psycholinguistics stimuli are often funny because they need to be constructed to contain very specific features, so I can forgive them these awkward sentences, but really? They couldn't have gramma feeding a dog? It had to be a monkey? Hmmmmm. Probably has something to do with the inflections for nouns, but c'mon, a monkey? Sounds down right lewd.

Saturday, April 3, 2010

It's My Bar Of Chocolate!

I'm having a Veruca Salt moment. All I want is to read a paper in Cognition, but the dirty bastards at Elsevier have locked it up behind a big dirty wall. Having left the sweet comfort of The University, my greatest frustration is not having access to papers and data that I used to take for granted. This is the 21st Century people. There's lots of free linguistics stuff out there (just look at my own most excellent list of resources to the right). Everything is supposed to be free. Google said so, and I believe them. This goes for you too LDC with all that sweet delicious data locked up behind $$ signs. Now give me everything I want right now. To quote my hero:

I want the works
I want the whole works
Presents and prizes and sweets and surprises
Of all shapes and sizes
And now
Don't care how
I want it now
Don't care how
I want it now

Friday, April 2, 2010

On Statistical Anomalies

(the table lists Hand #, Table Name, My Hole cards, Winner, Pot)

Having nothing to do with linguistics, I challenge my fellow online poker player Nate Silver to walk through the probability that I would be dealt pocket 22, 33, 44 successively in NLHE. I have proof positive that it happened (see image above). And I note that the probability of being dealt any three pairs in a row should be the same as the probability of being dealt three consecutive pairs; it's us silly humans who care about the difference between 22 and KK, not the poker gods.

Thursday, April 1, 2010


Thanks to a desperate need to brush up on my German (i.e., was thoroughly embarrassed at a German meet-up in NOVA), I just discovered that the German farewell tschüß is a cognate of French adieu (I know, right?). Wiktionary's explanation: From Low Saxon, from Walloon adjüs (the equivalent of adieu in French).

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...