Friday, January 31, 2014

The SOTU and Reading Level

Evan Fleischer wrote a cheeky little bit about the reading level of the SOTU over at Esquire: Is the State of the Union Getting Dumber?

It was triggered by this graph in The Guardian:

Even emailed me and several other linguists to get some reactions. He quotes me, Ben Zimmer, and Angus B. Grieve-Smith. We generally agreed that trend noted by the graph probably had more to do with changing trends in who the speech is for, rather than any change in intelligence level.

It's a fun little read.

Tuesday, January 28, 2014

Anticipating the SOTU

In anticipation of President Obama's 2014 State Of The Union speech tonight, and the inevitable bullshit word frequency analysis to follow, I am re-posting my post from 2010's SOTU reaction, in hope that maybe, just maybe, some political pundit might be slightly less stupid than they were last year ... sigh .. here's to hope

BTW, Liberman has been on top of the SOTU story for a while now. here's his latest.

(cropped image from Huffington Post)

It has long been a grand temptation to use simple word frequency* counts to judge a person's mental state. Like Freudian Slips, there is an assumption that this will give us a glimpse into what a person "really" believes and feels, deep inside. This trend came and went within linguistics when digital corpora were first being compiled and analyzed several decades ago. Linguists quickly realized that this was, in fact, a bogus methodology when they discovered that many (most) claims or hypotheses based solely on a person's simple word frequency data were easily refuted upon deeper inspection. Nonetheless, the message of the weakness of this technique never quite reached the outside world and word counts continue to be cited, even by reputable people, as a window into the mind of an individual. Geoff Nunberg recently railed against the practice here: The I's Dont Have It.

The latest victim of this scam is one of the blogging world's most respected statisticians, Nate Silver who performed a word frequency experiment on a variety of U.S. presidential State Of The Union speeches going back to 1962 HERE. I have a lot of respect for Silver, but I believe he's off the mark on this one. Silver leads into his analysis talking about his own pleasant surprise at the fact that the speech demonstrated "an awareness of the difficult situation in which the President now finds himself." Then, he justifies his linguistic analysis by stating that "subjective evaluations of Presidential speeches are notoriously useless. So let's instead attempt something a bit more rigorous, which is a word frequency analysis..." He explains his methodology this way:

To investigate, we'll compare the President's speech to the State of the Union addresses delivered by each president since John F. Kennedy in 1962 in advance of their respective midterm elections. We'll also look at the address that Obama delivered -- not technically a State of the Union -- to the Congress in February, 2009. I've highlighted a total of about 70 buzzwords from these speeches, which are broken down into six categories. The numbers you see below reflect the number of times that each President used term in his State of the Union address.

The comparisons and analysis he reports are bogus and at least as "subjective" as his original intuition. Here's why:

Sunday, January 12, 2014

causation in verbal semantics

Causation is a major area of study within linguistic semantics. There is a thorough wiki page on the Causative that provides a good overview. Also, unsurprisingly, Beth Levin has written a nice discussion of the issues in these LSA 09 notes: Lexical Semantics of Verbs III: Causal Approaches to Lexical Semantic Representation.

To list the troubles with defining causation would fill a dissertation, so I won't bother here. Often, semanticists are interested in argument realization (see Levin's notes above). But there are deeper issues with causality that often go unaddressed. The deepest of all: what the hell is causality?

To this point, I ran across an old draft of a grad school buddy's qualifying paper on causation. It's just a draft, and it's old, but it had a nice section that tried to outline the constitutive criteria for causation*. I have since lost touch with this guy (I'll call him "BB"), but I thought this list of criteria is good food for though for anyone interested in causation. I post these as discussion points only. And if BB sees this, give me a buzz :-)

First, here's a taste of the range of causative types taken from the wiki page on Causation (don't be fooled by these English examples, the issues permeate all languages. Causation is tough):

  • The vase broke — autonomous events (non-causative).
  • The vase broke from a ball’s rolling into it — resulting-event causation.
  • A ball’s rolling into it broke the vase — causing-event causation
  • A ball broke the vase — instrument causation.
  • I broke the vase in rolling a ball into it author causation (unintended).
  • I broke the vase by rolling a ball into it  agent causation (intended) 
  • My arm broke when I fell  undergoer situation (non-causative).
  • I walked to the store  self-agentive causation.
  • I sent him to the store  caused agency (inductive causation).

BB's Nine Criteria for the treatment of causation (c. 2002)
  1. Change of state. The caused event must denote a change of state.
  2. Causers must be events. The causer A can not simply be an individual but must be an event.
  3. Argument sharing. The causing event must contain the causee in its representation.
  4. Impingement. There must be a clear indication of impingement between the causer and the causee such that the causer impinges on the causee.
  5. Occurrence condition. The caused event must occur.
  6. Co-occurrence condition. The occurrence of the caused event must be conditional with the occurrence of the causing event, that is, the caused event can only take place if the causing event takes place.
  7. Non-co-occurrence condition. The non-occurrence of the caused event must be conditional with the non-occurrence of the causing event; that is, the caused event does not take place if the causing event does not take place.
  8. Directness of causation. It must be apparent when indirect causation is allowable for causality in lexical items.
  9. Spatiotemporal equivalence. The causing event and the caused event must have an equivalent time and place.

BTW, I recall objecting to #5 "the caused event must occur" because of negative causative verbs like prevent (feel free to read my previous post on these kinds of verbs). I don't know how or if he addressed that in his final version.

* There's so much literature on causation, it would take years to review it all to see if anyone else has done such a thing at quite such a level (many authors mention criteria, but not quite as exhaustively). I wouldn't be surprised if there is a better variation out there, and I'm happy to post it if someone wants to point it out to me.

A linguist asks some questions about word vectors

I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...