Monday, November 26, 2007

"X Collar Job"

The Polyglot Conspiracy blogs about a construction which may count as a new snowclone.

white collar = professional class job

blue collar = working class job

green collar = eco-friendly job

pink collar = a job that is traditionally performed by women

Wednesday, November 21, 2007

YIKES! or The New Information Extraction

The term information extraction may be taking on a whole new meaning to the greater world than computational linguists would have it mean. As someone working in the field of NLP, I think of information extraction as in line with the Wikipedia definition:

information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents.

But my colleague pointed out a whole new meaning to me a couple weeks ago, the day after an episode of the NBC sitcom My Name Is Earl aired (11/1/2007: Our Other Cops Is On!). Thanks to the wonders of The Internets, I managed to find a reference to the sitcom’s usage at TV Fodder.com:

Information extraction in a post-9/11 world involves delving into the nether regions of suspected terrorists....

In other words: TORTURE! The law of unintended consequences has brought the world of NLP and the so called War on Terror into sudden intersection (yes, there are "other" intersections... shhhhhhh, we don't talk about those). Perhaps the term IE is obsolete in CL anyway. Wikipedia described it as a subfield of IR. Manning & Schütze’s new book on the topic is called Introduction to Information Retrieval , not Introduction to Information Extraction. They define IR, on the link above, essentially as finding material that satisfies information needs (note: I'm not quoting directly because the book is not yet out).

Quibbling over names and labels of subfields is often entertaining, but it’s ultimately a fruitless endeavor. I defer to Manning & Schütze on all things NLP. Information Retrieval it is.

Oh, you fools!

Geoffrey Pullum has a cute post over at Language Log today about the uses of language, the least of which, he declares, is to inform:

I'm sorry, I don't want to sound cynical and jaded, but language is not for informing.

His whole post is worth the read, but this sparked my memory about a paper I wrote many years ago. In my life previous to linguistics, I was a damned filthy English major but I took a course once that had something to do with discourse and conversation analysis (but, ya know, utterly vacuous in the way only English department courses can be) and I recall being frustrated by the assumption in the literature that communication was fundamentally "cooperative". Being the damned filthy English major that I was, I wrote an entire seminar paper without doing any empirical research at all, not even a Liberman-esque Breakfast Experiment; rather, I argued from my gut (as Colbert might say) that human communication was fundamentally competitive with each participant trying to "win" something, or at least in some sense trying to outperform the other. Unfortunately, that's about all I can remember of the whole event.

Monday, November 19, 2007

This goes without saying...

>




I got this link from Polyglot Conspiracy. Too frikkin funny.

yeah right

There are 3 interpretations of “yeah, right” in American English, but I only have two of them in my dialect (I’m originally from California). I’m in my late 30s and I hear this particular version from younger folks a lot (I can imagine my teenage niece saying it this way), but I’ve also heard it from a 30-ish father of 3, so I’m not sure what generation it’s most closely associated with (perhaps I just missed it). The three interpretations I know of are as follows:

1) Normal (factual agreement): yeah right = ‘yes, that is correct’
2) Sarcastic (opposite meaning): yeah right = ‘no way in hell’
3) Back-channel (sentiment agreement): yeah right = ‘mm-hmm’

Thanks to the influence of Seinfeld and Friends throughout the 90s, (2) sarcastic is probably the default use these days, but it is the 3rd use that I don’t have in my dialect. I would say that (3) is in the same class of back-channel expressions as “you go girl!”

These three interpretations all involve different prosodic realizations; roughly, they have different tones. I’ll dig deep into my past when I studied the tone languages Mandarin Chinese and Cantonese (12 years ago) and when I actually took a phonetics course (10 years ago) to see if I can offer a plausible hypothesis about the F0 differences.

1) Normal: yeah = falling mid-low; right = falling mid-low
2) Sarcastic: yeah = rising low-high; right = rising low-high
3) Back-channel: yeah = steady mid-mid; right = rising low-high

I have little confidence in my intuitions about the prosodic properties of (1), but I feel (2) and (3) are a pretty good guess.

BTW, I happen to run across this paper by Joseph Tepperman et al. from USC: “YEAH RIGHT”: SARCASM RECOGNITION FOR SPOKEN DIALOGUE SYSTEMS. I haven’t read it, but it seems somewhat relevant to my point: “This paper presents some experiments toward sarcasm recognition using prosodic, spectral, and contextual cues.”

Wednesday, November 14, 2007

The Perils of Orthography or Words Without Vowels

I just signed up for the new online linguistics magazine Cambridge Extra as advertised on The Linguist List. The magazine is apparently going to run little Q&A competitions every issue – “In each issue you will have the chance to win different prizes from Cambridge University Press.” The inaugural question is this

What is the longest word in the English Language without a vowel in its spelling?

Now, the key here is “in its spelling”. When I first read the question, I missed that part and thought real hard about this. Hmmmm, I thought, is there a word in English that has no vowel when pronounced? It’s true that there are expressions that we utter that are voweless, like “Hmmmm” above, or answering a question like my mom with “MmmHmm”. But it’s real easy to get tricked by orthography when analyzing a “word” like nth as in ‘the nth iteration”. While spelled with no vowel, it is pronounced with an initial vowel, something like /ε/, an open, mid, front vowel. I Googled the question and discovered quite a range of attempts at answering the question, almost all of them consistently mistook orthography for phonology.

This is like frikkin crack to a linguistics blogger! I found this juicy, but representative answer posted on Yahoo! Answers posted by “Mrs. C”:

Is there any word without a Vowel?
Best Answer - Chosen by Voters
sky, rhythm

PS: To the people screaming 'Y is a vowel' ... er, no it's not! A E I O U are the only 5 vowels. Y SOUNDS like a vowel in certain words, but it doesn't 'become' a vowel just because it sounds like one! Even my 8 year old students can tell you this!

Although this answer is consistent with the intent of the spelling constraint, I love this part: “Y SOUNDS like a vowel in certain words, but it doesn't 'become' a vowel just because it sounds like one!” Heehee. With an exclamation point for emphasis too!!! Has there ever been a more convoluted blurring of the difference between letters and phonemes?

To put it simply, Yes! If something sounds like a vowel, it does indeed become one! Regardless of what orthographic representation it may take. Although phonology and writing systems were never my interests in linguistics, I am quite certain that orthographies are never more than convenient hacks engineered to approximate the phonology and phonotactics of a language. They are always imperfect.

Saturday, November 10, 2007

The Perils of Prescriptivism and PGSLTS!

If ever there was evidence that prescriptivist maxims are unnatural and ultimately subservient to psycholinguistic priming, this sentence is it. It comes from an email sent in to Andrew Sullivan which he posted online here:

The people on whose doors I knocked on universally described the candidate as thrilling...

The linguistically delicious part is the unnecessary repetition of on. The author appears to be trying to form an NP with a relative clause that would perhaps be better rendered as “The people whose doors I knocked on”. However, still suffering from post grade school linguistics traumatic stress syndrome (PGSLTS), the author is consciously trying to avoid ending sentences with prepositions (but seems to re-analyze the rule to apply to phrases as well); desperate for grammaticality, the author at first valiantly tries to avoid ending the phrase with on by dislocating it to the front of the RC, but then arrives at the verb “knocked” which is probably naturally primed to be followed by a preposition (at least in this context, perhaps treating the verb as a particle-verb, knocked-on), and so tacks on another on, ya know, just to be sure.

Alas! The power of priming wins the day.

Thursday, November 8, 2007

Nounhood

Jessica Hagy is by far one of the most witty and creative bloggers. Her Indexed site never ceases to impress me with its range of clever reasoning. And now, she tackles grade school linguistics.

Dream Job ... or ... WTF!

This job announcement was posted to The Linguist List just yesterday:

Performance Space 122 in association with Movement Research and Instituto Cervantes seeks an English speaking cognitive linguist for 4 weeks of exploratory research with Spanish choreographer Juan Dominguez.

The individual chosen will provide Juan with the linguistic knowledge and will guide him during one on one research sessions (Nov 26-Dec 14, Mon-Fri, for 3 hours/day) and during a larger workshop with ten participants (Dec 17-21).

During the individual research with Juan, the main focus will be studying how language is built, how we use it, and how we understand reality through language. In the four previous workshops, Juan has tried experiments that influence the way of perceiving time and space through the verbs of movement. This will be a continuation of that research and experimentation.

During the larger workshop the linguist will spend the first two days giving the participants an introduction about verbs with special focus on the verbs of movement. The linguist will be present during the workshop (5 hours/day) as a reference for further questions and of course to give his or her point of view about what the participants will work on. (my emphasis)

PS122 is a legitimate place which promotes itself as a "multi-disciplinary arts center dedicated to finding, developing and presenting new artistic creations from a diversity of cultures and points of view."

They're giving themselves 4 weeks with one linguist to figure out

1) how language is built
2) how we use it
3) how we understand reality through language

On top of that, they hope to "influence the way of perceiving time and space through the verbs of movement."

Good Luck.

Sunday, November 4, 2007

Buffalo Buzz

As a follow-up to my earlier, unusually non-linguistics posts on Buffalo’s economy which I discussed here (this is also featured on Mankiw’s post here) and here.

I’d like to note that this week, Buffalo’s famed weekly magazine Artvoice included an extended response to Glaeser’s critique of Buffalo’s renewal woes, What It Will Take by Bruce Fisher.

Bruce Fisher is Deputy Erie County Executive; he presumably knows the details of Buffalo’s economic situation better than Glaeser. I skimmed the article (rather quickly) at Spot Coffee this morning while doing laundry and I was impressed that Fisher does what Glaeser does not, provide pragmatic suggestions to fix Buffalo’s problems, but he also seems to tread awfully close to the deep end of silly Canada-envy and claim that Buffalo should follow Ontario’s lead. It makes sense to look to models of urban renewal like Toronto and Ottowa for ideas, but there is a peculiarly USA-American tendency (amongst liberals particularly) to fawn over Canada as if it’s some sort of Utopia. I’ll happily stipulate that I like Canada, love Toronto, and am impressed by many aspects of Canadian society. But I’m not predisposed to gushing.

Anyhoo, Fisher basically agrees with Glaeser that “if federal funds come the way they’ve always come, nothing here will change.” He then goes on to disagree with the assertion that Buffalo is a lost cause (that’s my phraseology). Fisher’s basic claim is this: “Quality attracts and retains density.” So, he reasons (contra Glaeser), we should invest in Buffalo the place. He wants to invest (public money, of course) in changing what he refers to as “land-use policy”, especially the policy of suburbs, and so he’s in favor of regionalism. I’ll leave it to you to read the entire article to appreciate Fisher’s complete argument.

I’m no macro-economist (though it has become increasingly my hobby over the last few years), so I’m not in a position to decide if Glaeser’s or Fisher’s prescriptions for Buffalo’s future are wisest. As an unapologetic urbanite who has lived within the city borders of Buffalo for 8 of the last 10 years, I have no problem with disparaging the evils of suburbia, but I also see that preservation does not seem to be doing much good. If the taxpayers of New York state and Kansas and Arizona and Washington (etc.) are going to invest hundreds of millions of dollars into Buffalo over the next ten years, I’m trending towards Glaeser’s position that it should be spent on the people (to me, that means primary education and law enforcement: a well educated, safe populace is more powerful than any other force on Earth).

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...