Saturday, April 26, 2008

Smitten with Kunis

This is another, still rare, non-linguistics post about movies (I suppose I could try to draw some connection to the Netflix Prize or Recommender Systems, but, yawn, this is what it is, a movie post).

I watched Forgetting Sarah Marshall yesterday. I feel the need to defend that choice, but I’ll do that later. After seeing it, I find I’m smitten with Mila Kunis, and not just because her name is Mila Kunis. I knew of Kunis through That 70s Show (although, like most people, I stopped watching midway through the third season, and that was a long time ago), but more through her voicing of Meg on Family Guy.

In Forgetting Sarah Marshall she is given the right blend of sweetness and tenacity to play to her talents (her screaming match with an ex-boyfriend was literally laugh-out-loud funny) plus she has an awesome tan. Her tan is so awesome, it’s like a separate character. They could have just put Kunis and her tan next to the ocean and I probably would have watched for the same 112 minute run time. It’s an impressive feat to get a Ukrainian THAT tan and not kill her. I don’t know what combination of chemicals and baby oil they used, but it worked. Zonker Harris would be proud.

And this is the essential hook, isn’t it? In order for a romantic comedy to work, the viewer has to become smitten with one of the leads (or both, if that’s your thang baby, make Paglia proud … on a random related note, is Torchwood the most bi-curious TV show in history?). In any case, I walked away from this movie smitten with Mila Kunis.

While watching this movie, I couldn’t help but reflect on the lack of women in Hollywood who have the two most important characteristics of a romantic comedy lead: adorability and comedic talent. Meg Ryan had lots of one and little of the other; frikkin Sandra Bullock had neither yet still managed a decade long career.

Kunis has both. She’s cute as all hell and she can bring the funny (and did I mention the awesome tan?). The only other actress today with both of these crucial qualities (sans tan) is Ellen Page (my first impression of her is here) but I fear Page may be limited to the wise-cracking smart-ass. I haven’t seen her step out of that role yet (even her small roles in the X-Men movies had this tinge to them).

Unfortunately, since the corporate takeover of Hollywood in the 1980s, the romantic comedy has been staffed by pretty dolls with little talent (both male and female). But this is why most romantic comedies fail. They have dull leads. The corporate suits create a table of demographics, then plot a script accordingly, then plug in the two actors de jour and voilà!

Now the romantic comedy may finally be coming out of its stupor. Forgetting Sarah Marshall is the latest installment of Apatow Inc’s refashioning of the genre, and god bless ‘em because most romantic comedies suck.

Box Office Mojo has a list of the 300 top grossing romantic comedies since 1978, and it’s depressing. The highest grossing romantic comedy of all time is, by itself, reason to contemplate suicide. Even as you scan the large list of movies, it’s a wasteland of forgetability. But that’s the downside. The upside is that the romantic comedy genre has produced a handful of unforgettable films like His Girl Friday, Harold and Maude, and Annie Hall. There is nothing wrong with the genre itself. Hell, most epic poems suck ass, but that’s no reason to throw out The Odyssey.

More to the point, there are good romantic comedies (and John Cusack has been in most of them; if you haven’t seen Grosse Point Blank or High Fidelity, you’re missing out). I've highly recommended Juno as a great version of the genre (regardless of what my colleague may think, thppt!), but I can't equate Forgetting Sarah Marshall with Juno, smitten or not. But it is a good romantic comedy, just worth the matinée price I paid.

And that brings me to my reasons for choosing this particular film. I have no shame in going to see a romantic comedy, because I want to see another Annie Hall. I want the genre to succeed. I think Apatow Inc. stresses writing and comedy talent more than most producer-driven entourages, so they’re producing films that, in the very least, are funny and entertaining. Plus, I was bored and M. Faust gave it a good review, even though he doesn’t mention Mila Kunis (Bastard! Did you not see her awesome tan?).

Wednesday, April 23, 2008

"LingPipe, I hate you"













Actually, no, I do not hate LingPipe. But someone does. It is the entertaining aspect of Sitemeter that led me to this discovery.

Occasionally I check my Sitemeter page view details because it's comforting to see that people actually do read my blog (even if y'all don't comment, thpppt!) . But far more entertainment value is gained from the information about how someone came to my site. I can see what search words brought someone here. I've been collecting some of the more amusing ones and I've been meaning to post about it, but today I discovered someone had gotten to my blog by searching Google for, and I quote, "lingpipe ihate you".

I don't know what deviltry the evil duo at LingPipe is up to, but they appear to have made an enemy.

Monday, April 21, 2008

On Jobs and NLP Degrees...

Thomas posted an interesting quandary recently. I'll summarize it this way: How does a person choose which M.S. program in NLP to attend? As far as Thomas and I are aware, there are no rankings for computational linguistics/NLP programs; so, is word of mouth all anyone has to go on? Does anyone out there know of any resources for helping someone like Thomas?

Does the NLP community out there care to contribute words of wisdom to the next generation of CL/NLP newbies?

You may wish to read my own discussion of what I perceive to be the difference between CL and NLP here.

Here was my advice to Thomas. You're free to attack it viciously.

I think the crucial question is about your goals: do you want to be an academic working on high level problems like parsing and discourse (in which case you're looking at getting a PhD), or do you want to get a job in industry (a PhD is good in industry, but there are plenty of NLP jobs for Master's level, even some for Bachelors)?

If industry is your answer, the school you choose won't really matter that much; it's the skills you develop. I'd strongly advise you to develop competency with machine learning, if you haven't already. You don't have to be great at it, just competent. That's a highly marketable skill set now, and will be for the foreseeable future. General competence with statistics and corpus linguistics is highly valued.

So, I'd ask each program where stats and ML fit into their programs (or how much flexibility they give you for taking electives).

And, just for the record, SUNY Buffalo has an M.S. in CL too. Not too late to apply. You can kinda surf Lake Erie (gotta be better than Georgia surfing).

(pssst, context for the surfing reference can be found on Thomas’ profile).

I scanned the last 10 or so NLP related non-academic job postings on The Linguist List and found a fair bit of consistency in the skills they were asking for. Above all else, they all wanted good programming skills. If you search Monster.com for "computational linguistics" I think you'll see an even greater emphasis on programming skills.

Here's a representative sample of the "requirements" from those Linguist List job postings. Taken all together, they may look intimidating, but this is a mash-up of ten+ postings. It's just meant to sketch what industry is looking for.
  • Experience in one or more of the following: MS SQL Database Server; Internet Information Services/Apache Tomcat; Windows operating systems;.NET; Java.
  • Strong programming skills in at least two of the following programming languages: Python, C++, Java and Perl
  • Multimodal statistical algorithms for language processing and modeling in both speech and handwriting applications
  • Develop tools for efficiently processing corpora of speech and/or sketch/handwriting data;
  • Work with a team of researchers and developers to successfully integrate research components and validate functionality;
  • Experience desired with statistical language modeling for either speech or handwriting applications (e.g., familiarity with CMU-Cambridge LM toolkit, SRILM toolkit, ATT FST toolkit, MALLET, Libbow, etc.);
  • Strong algorithmic skills and analytical background;
  • Demonstrated success in working in a fast-paced environment;
  • Ability to work effectively and successfully either independently and/or in a collaborative team environment.
  • Experience in the creation and exploitation of domain and task ontologies in text analytics
  • Strong background in statistical modelling required.
  • Knowledge of machine translation or natural language processing techniques
  • Ability to perform linguistic data analysis.
  • Proficiency in one or more scripting languages (Perl, Python, Ruby) or programming languages, particularly C++, is a plus.
  • MS or PhD in Computational Linguistics or related field.
  • Work experience in production-quality NLP systems.
  • Familiarity with Unix/Linux operating system environment is a plus.
  • Experience in machine learning, information retrieval, or data mining are all pluses.
  • Experience in the building of domain-specific ontologies is useful
  • Experience in statistical analysis and machine learning
  • Development, analysis, and support of grammar engine rules for English
  • Experience in corpus or text analysis, conversation analysis, or computational linguistics
  • Experienced architect/developer to design scalable enterprise application friendly implementations of spell checking, sentiment, named entity extraction

Monday, April 14, 2008

Bacon Strength

Having only just recently taken the NetFlix plunge, I had been ignoring the flurry of interest amongst computational linguists about Recommender Systems. I am now fully aware of the profound need and utility of improving said systems. Somehow, NetFlix got from the set [Blue Velvet, Chinatown, Midnight Cowboy] to the recommendation The Wild Bunch. There must be a sub-culture growing around the absurdity and humor derivable from such recommendations. Imagine you decided to follow such recommendation religiously. Honestly, how long would it take you to get to Glitter? Scary thought, huh? Now you realize how crucial Recommender Systems are to the survival of humankind.

It seems to me that an automated version of Six Degrees of Kevin Bacon ought to work AT LEAST this well, right? You simply recommend any movie that shares a cast member with a rated movie. The closer two movies are in a Kevin Bacon network, the more strongly you recommend it. Let's call this Bacon Strength. Hmmmmm, wait a second, I might be on to something ... this could be bigger than Google ... why am I telling YOU people about this ... the idea is mine, do you hear! MINE!!!!

Plus, I'm completely amazed that at least four Chuck Norris movies are available for immediate online viewing, but only the first season of the new Dr. Who. wtf?

NLPers: How would you characterize your linguistics background?

That was the poll question my hero Professor Emily Bender posed on Twitter March 30th. 573 tweets later, a truly epic thread had been cre...