Wednesday, February 23, 2011

the linguistics of 404 FILE NOT FOUND

A cute site providing humorous translations of the world's most frustrating search result. Personal favs:
  • American South - Ah cain't find th' page yer lookin' fer.
  • Australia - Strewth mate yer bloody page has shot through.
  • Blond - like omg! ur file has not been found, go paint ur nails and try back later, lol^^....I FOUND A QUARTER!
  • Cockney - No chance luv, carrnt find it neever.
  • Pirate - Haaarr, Lubber! I've sailed yon seas with toil and trial, and yet I cannot find ye file!
  • Pittsburghese - This page needs fixed n'at... it's all caddywhompus! Yinz needs look somewheres else.
  • Zombie - Arrgrg 404 BrAiNs aAAArrggh No ggrrgrh page brAiNz heRe BrAAAAIIINNSSSS!

fuck C++

Andrew Vos provides us with valuable data analysis of the correlation between programming languages and profanity:

The plan was to find out how much profanity I could find in commit messages, and then show the stats by language. These are my findings: Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words).



Oh, Python, beautiful Python ... no wonder the NLTK guys chose it as their NLP language of choice.

Sunday, February 20, 2011

economists are bad linguists

Dominik Lukes at Metaphor Hacker has a thorough discussion of Harvard economist Ed Glaeser's mis-use of metaphor theory by trying to use NYC restaurants as a metaphor for schools. Lukes teases out the mis-mappings that Glaeser fails to recognize. Money quote:

[Restaurants] also use a number of tricks to make the dining experience better – cheat on ingredients, serve small portions on large plates, etc. They rely on ‘secret recipes’ – the last thing we want to see in education. And this is exactly the experience of schools that compete in the market. They fudge, cheat and flat out lie to protect their competitive advantage. They provide the minimum of education that they can get away with to look good. Glaeser, as he conveniently forgets, there is a huge amount of centralized oversight of New York restaurants – much more, in some ways, than on charter schools.

The full discussion is thorough and well worth reading.

Friday, February 18, 2011

evolution = chaos?

Kottke points to a graphical variation of the Chinese whispers game whereby an original sign (in this case, a line drawn by a human) is rapidly degraded by multiple repetitions (the more people try to repeat the original line, the less line-like it becomes, eventually degrading into chaos).


A Sequence of Lines Traced by Five Hundred Individuals from clement valla on Vimeo.

Kottke marvels that "The lines get really messy surprisingly fast [...] this is a nice demonstration of evolution."

But is it? Is it the case that evolution leads to chaos*? I don't think so. Evolution leads to variation and change, sure, but chaos? The difference between evolution and this line transformation, I think, is pressures. In evolution there are pressures that greatly effect which changes last more than one generation and hence become permanent stable. But in this game, there are no pressures, as far as I can tell. There is no survival of the fittest because each turn gets to survive for exactly one generation with no pressure to be fitter than another in order to persist beyond one generation. So this exercise, cute as it may be, does not resemble evolution at all, I don't think.

*or messiness in Kottke's phrasing

Tuesday, February 8, 2011

Linguist List FAIL

I've been kicked around a few NLP blocks in my time so I've developed a sixth sense about what employers are looking for when they post job announcements. When I read this one from Intelius on The Linguist List today, my reaction was clear, concise, and unconditional: This is NOT for linguists.

This posting says engineers only to me! There's nothing wrong with that, but why use the Linguist Lists' job postings board with a job that no actual linguist will be considered for? My reaction is based on what I consider to be engineering dog-whistles that are designed to encourage the "right" people to apply (i.e., engineers) and the wrong people to go away (i.e., linguists).

A quick breakdown of their rhetorical dog-whistles:
  • The Data Research Group is a team of scientists at Intelius... Much as I would like linguists to be considered scientists, the truth is, in the "real world" of job announcements, they are not. This is a red flag.
  • Team members have published papers in top research conferences...Ah hah, not "conferences" per se, but "research conferences". This means ACL.
  • Mentors will include Dr. Vitor R. Carvalho and Dr. Andrew Borthwick (diss PDF)... NOT linguists.
  • Required Skills: Strong hands-on skills in Java and/or Python... i.e., we assume you lay awake at night worrying about arrays and functions, not unnaccusative marking and tone sandhi
  • Required Skills: Self-motivated, creative, and independent researching skills ... we will teach you nothing. You are on your own. Your teachers are gone. What can you give us?
FYI: Recently, bulbul has quite rightly taken me to task for being a tad hypocritical in arguing two seemingly contradictory points: (1) that 21st Century linguists should study math and (2) that the time consuming effort of learning computational tools is a deterrent to being a linguist. I can imagine this post as falling victim to that same complaint. My pre-defense is that I believe there is a skill set distinct to linguists that is valuable and worthy of investment by NLP capitalists that has been largely ignored.

Engineers alone will not solve the critical language issues necessary to create the great products of the next generation of NLP tools. I believe in team building where linguists and engineers work together as equals 

our foundational tongues?

A commentator at The Daily Dish writes: I recently learned that in our foundational tongues of Latin, Greek, and Hebrew the words for breath and spirit are one and the same: spiritus, pneuma, and ruach [emphasis added].

I'm not sure what the author had in mind for "our foundational tongues." Assuming the author is referring to English, then Latin, okay sure, Romance languages have had an important influence on English. Greek, less so. But Hebrew??? What's most striking is the notable lack of Germanic languages as "foundational." This author needs a Ling 101 class.

And as for the author's claim about words for breath and spirit being the same, there is a related poetic pairing common to good ol' fashioned English. The word breath is often used as a metonymy for life or spirit. Here are a few choice examples:

The Bard
Henry V -- King Henry's Once more unto the breach, dear friends speech (III, 1):

Now set the teeth and stretch the nostril wide,
Hold hard the breath and bend up every spirit
To his full height. On, on, you noblest English.
Whose blood is fet from fathers of war-proof!

In my reading of this line, King Henry pairs holding of the breath with spiritual courage to draw a parallel between the two.

Hamlet -- Hamlet's Mother, Queen Gertrude, whilst arguing with her tortured son (III, 4):

Be thou assured, if words be made of breath,
And breath of life, I have no life to breathe
What thou hast said to me .

Prior to this line, Hamlet prods his mother to stop sleeping with his uncle/king and to "break your own neck down." In my reading of her lines, Gertrude connects the dots between words, breath, and spirit because of her son's harsh words. She is saying it is not in my spirit to do what you are asking of me.

And here is a really nice 2009 discussion of poetry and breath by Melissa Zeiger: Grace Paley's Poetics of Breath. Money quote:

The Romantic poets reemphasized breath as a force in poetry, liking to imagine that poetic breath mediated between the human and the transcendent, as, famously, in Coleridge's “The Eolian Harp,” where the wind joins breath to participate in “one Life within us and abroad,/ Which meets all motion and becomes its soul

And this trope is not limited to Western literature either. The traditional Chinese concept of Qi is deeply rooted in an analogy of breath = life. From the Wikipedia page:

Qi is frequently translated as "energy flow". Qi is often compared to Western notions of energeia or élan vital (vitalism), as well as the yogic notion of prana, meaning vital life or energy, and pranayama, meaning control of breath or energy. The literal translation of "qi" is air, breath, or gas. Compare this to the original meaning of the Latin word "spiritus", meaning breathing; or the Koine Greek "πνεῦμα", meaning air, breath, or spirit; and the Sanskrit term "prana", meaning breath.

What this suggests to me is that there is something deeply natural to our cognitive perceptions about this analogy between breath and life. It is natural for humans to perceive breathing and thinking to be related somehow. Without breath, you cannot think. Fair enough. But this might be a deeply human logic insofar as ants or dolphins may not conceive of this relationship in the same way. I blogged about this last year in Dolphin-Bikes and The Iconicity Effect. I'm still waiting for a dolphin bike.

Sunday, February 6, 2011

why we need good tools...

Because we're not all interested in being R experts. By far, the single most frustrating part of my own graduate linguistics experience was the fact that in order to study the kinds of linguistic phenomena I wanted to, I had to spend most of my time learning tools that I didn't actually care about, like Tgrep2, Perl, Python*, R, etc. As a linguist, I don't really give a damn about any of those things. They were all obstacles in my way. The more time I spent learning tools, the less interested in linguistics I became. I respect the hell out of engineers who build great tools that are valuable to linguists, but if those tools are not user friendly, I might as well scream into the darkness.

Which is why I am impressed with The Stanford Visualization Group's recent Visualization Tool for Cleaning Up Data:

Another thing I often hear is that a large fraction of the time spent by analysts -- some say the majority of time -- involves data preparation and cleaning: transforming formats, rearranging nesting structures, removing outliers, and so on. (If you think this is easy, you've never had a stack of ad hoc Excel spreadsheets to load into a stat package or database!).

Yes, more help please.

HT LingFan1

*Mad props to the NLTK!

Wednesday, February 2, 2011

Neuro-blogger Bradley Voytek posts a nice discussion helping us all understand how to consume neuroscience in the news:

In this post, I will teach you all how to be proper, skeptical neuroscientists. By the end of this post, not only will you be able to spot "neuro nonsense" statements, but you'll also be able to spot nonsense neuroscience questions.

Well worth the read.

Tuesday, February 1, 2011

my classic snowclone rant

As yet another winter storm threatens the US, lingo-tweeter cum lingo-grad student Lauren Ackerman marvels at the media's lust for snowmageddon and terms of its ilk, and I was reminded of my own ruminations on the many words for snow in my own peculiar dialect (it helped that I spent 6 hours in near motionless traffic a few days ago while the DC metro region was castrated by a vicious and sudden sleet storm that halted traffic as well as sanity). So I offer this re-post from February 5 2010:

As the snow descends upon Northern Virginia in the latest winter storm, and as DC's elite line-up at their local Whole Foods and Trader Joe's clutching their reusable bags filled with heavily packaged prepared meals, cardboard-container salads, 6 bottles of wine, and one bottle of water ('cause, ya know, it's an "emergency"), I am struck by the fact that the great Eskimo vocabulary hoax (pdf) is no hoax at all! It turns out that I too have a great many words for snow. This evening, while running a few modest errands before the night's predicted 20 inch snow drop, I meticulously recorded the various terms I uttered as synonyms for the fluffy white stuff which descended, rather gracefully, upon the landscape.

A few choice examples (NSFW):


A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...