Who would have guessed that when you remove
Friends, meet Jon Arbuckle. Let’s laugh and learn with him on a journey deep into the tortured mind of an isolated young everyman as he fights a losing battle against lonliness and methamphetamine addiction in a quiet American suburb.
Wednesday, February 27, 2008
Who would have guessed that when you remove
Monday, February 25, 2008
I never took grammatical gender seriously when I studied German. I just made everything feminine ‘cause, ya know, that was the easy one. The rest of my German was so bad, I figured it didn’t really matter anyway, right? (I frikkin LOVED studying Mandarin Chinese because, ya know, who needs morphology?)
…native French speakers don't agree on the genders of French nouns. They really don't agree. Fifty-six native French speakers, asked to assign the gender of 93 masculine words, uniformly agreed on only 17 of them. Asked to assign the gender of 50 feminine words, they uniformly agreed only 1 of them. Some of the words had been anecdotally identified as tricky cases, but others were plain old common nouns.
… second language speakers of French, take heart! Make your grammatical gender agreement mistakes with confidence. There's a chance that your native-speaker interlocutor will agree with your version!
Danke, Heidi! Viel Danke!
Köpcke, Klaus-Michael and David A. Zubin 2003. “Metonymic pathways to neuter-gender human nominals in German”. In Metonymy and Pragmatic Inferencing, Panther, Klaus-Uwe and Linda L. Thornburg (eds.), 149–166.
Friday, February 15, 2008
It has the advantages of being fast, easy to use, covering corpora from multiple languages (plus allowing you to add new corpora) and providing user friendly output.
One disadvantage is the brevity of the sketches it provides. For example, I performed a sketch of the verb "prevent" in the BNC and it returned a list of subjects and objects that occur with the verb. Sweet! This is really important stuff if you're interested in FrameNet type semantic description (see my related post here). Unfortunately, it maxed out at 100 (that's a small sample of the 10,000+ examples).
Nonetheless, this utility goes a long way to providing the sort of user-friendly (yet still sophisticated) online corpus query tools that I think the average non-computationally minded linguist would benefit from greatly.
I've used Mark Davies' BNC interface a lot too and that's also an excellent, entirely online search tool. Davies provides a nice interface to a variety of corpora here.
Thursday, February 14, 2008
From his site,
Being from a small city in
And from Ethnologue
Tigrigna -- A language of
Population -- 3,224,875 in
Alternate names -- Tigrinya, Tigray
Classification -- Afro-Asiatic, Semitic, South, Ethiopian, North
Language use -- National language. 146,933 second-language speakers.
Language development -- Literacy rate in first language: 1% to 10%. Literacy rate in second language: 26.5%. Ethiopic script. Radio programs. Grammar. Bible: 1956.
Comments -- Speakers are called 'Tigrai'.
Monday, February 11, 2008
One of the most challenging tasks a linguist can engage in is that of annotating natural language text for semantics. It is simultaneously interesting, tedious and tricky, which makes it altogether maddening. We perform this task for a variety of reasons. Sometimes to create training data for learning algorithms (which was a big topic of discussion at last year's NAACL HLT) or to explicate the semantics of events like the FrameNet project. Part of my dissertation is very FrameNet-like, so I do a lot of annotating (I will save my bile-filled hateful remarks about the general crappiness of annotator apps for another post).Generally speaking, the annotator's task is to read naturally occurring sentences, then identify and tag the semantic roles of the participants involved in the particular event represented by the sentence. It would be easy if all of English was composed of sentences like "Bobby kicked the ball"; that would be sweet. "Bobby" is an AGENT, "the ball" is a PATIENT. Done. Let's move on. But that's not how real language works, is it?
In any case, I have been annotating sentences involving the verb "exclude" recently and I find it's a particularly challenging set. The BNC “exclude” sentence below was difficult to annotate because the exclude event is not clear about its participants:
The new Minister for Health, Dr Noel Browne, a dedicated reformer of the health services and much concerned in-particular with the eradication of tuberculosis in
At first, I thought “Dr Noel Browne” was the agent doing the excluding, but then I realized it was the bill which excluded. But which bill? I concluded that “the earlier bill” is NOT participating in the exclude event because, logically, it must be the version of the bill that came AFTER the early one which did the excluding. So, this requires a presupposed later bill. So, should I annotate the good Dr. as the agent, or leave this participant alone (FrameNet's annotator app has the ability to mark an unexpressed element, and I believe this is exactly why, but I don't use their app). Also, it’s not clear if the “to” means “in order to” as a purpose statement. Is the bill explicitly, directly excluding, or was that simply the intent of the changes? If it’s indirect, that makes Dr. Noel a better candidate for the agent of exclusion.
Friday, February 8, 2008
The use of Chinese characters also serves to compact sentences. Since you don't have to actually spell out entire words, as in English, but can represent them with an ideogram, you can say a lot more in a much smaller space.
I will provisionally accept that kanji and kana make typing out written Japanese on a cell phone more efficient than typing out English (in the sense of requiring fewer key strokes; I'd have to test to see if this is really true), but I reject the logical fallacy that this mechanical efficiency leads to greater meaning.
This strikes me as a variation of a phenomenon Ben Zimmer over at Language Log has written about regarding the all too often misrepresented meaning of the Chinese word for ‘crisis’ wēijī . Underlying both of these is the naïve belief that logograms are inherently more meaningful than alphabetic words. This belief, I reject.
I could be wrong about this, but my hunch is that the human language system takes all written representations of language and converts them into an internal mental representation it’s happy with. There may be differences between the way the brain accesses the meaning of kanji and the way the brain access the meaning of alphabetic words (in terms of recognition), but I don’t see any reason to believe that the internal semantic representation of kanji is somehow different than the representation of words. If I’m wrong and there is a difference, this would be an interesting piece of data for the Sapir-Whorf folks.
FYI: The Sapir-Whorf hypothesis (aka linguistic relativity) has re-emerged in recent years. Some of the most interesting empirical work is being done by
Saturday, February 2, 2008
We’re not that far from the Universal Translator , right?
Skype has their version too
Universal Chat Language Translator and Speaker for Skype
It goes without saying that the boys and girls at Carnegie Mellon have already developed their version and gotten it to market: Franklin 12-Language Speaking Global Translator.
I have at best a passing familiarity with word vectors, strictly from a 30,000 foot view. I've never directly used them outside a handfu...
I used the phrase god awful in a comment at Language Log and it occurs to me that it's an odd little creature. From the OED *: Pronu...
Purpose: This post reviews my experience interviewing for a Linguist position at Google in Santa Monica, CA on February 29, 2008. I've ...
Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take sta...