Wednesday, March 16, 2011

open science

Recently in North Carolina, moximer & David Dobbs and others discussed the value of opening up science research (such that all research is freely available for searching and interpretation, even draft versions and failed experiments, at least under the strong proposal). It's an interesting discussion (audio is a bit crappy, but whaddayagonnado?):

What's Keeping Us from Open Science? Is It the Powers That Be, Or Is It... Us? from Smartley-Dunn on Vimeo.

Hence, I thought it might be nice to list some open source journals offering free access to scientific research:
  • PLoS is a nonprofit organization of scientists and physicians committed to making the world's scientific and medical literature a freely available public resource.
  • The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.
  • CiteSeer: The NEC Scientific Literature Digital Library incorporating autonomous citation indexing, awareness and tracking, citation context, related document retrieval.
  • e-Print archive: Open access to 664,014 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.
  • Directory of open access journals: This service covers free, full text, quality controlled scientific and scholarly journals. We aim to cover all subjects and languages. There are now 6271 journals in the directory. Currently 2722 journals are searchable at article level.
  • Free Full Text: a search engine returning full text scientific articles with no access fees.

Saturday, March 12, 2011

Korean in Killeen

Having spent nearly 4 months of the last year and a half working at Fort Hood, in Killeen Texas, I finally decided to leave the safe confines of the hotel-centric chain restaurants and Target/Wal-Mart shopping centers and take a drive to historic downtown Killeen. I found pretty much what I expected to find, empty one storey store fronts, dusty unused parking spaces, and lots of lots of Hangul ... (screeching sound) ... huh?

Yep, turns out historic downtown Killeen, heartland of America, is being somewhat revitalized by Korean immigration. My favorite grocery store by far is the Korean O-Mart (not the one pictured above, btw), where I can find genuinely fresh vegetables and dumplings (as well as shitake mushrooms, plenty of seaweed for soup, and a wide array of spicy sauces that I have been eagerly experimenting with).

It was a nice lesson in American multi-linguialism.

Monday, March 7, 2011

turning gaga into water = 200 terabytes

How much storage would it take to store the first 5 years of a child's linguistic environment? Apparently, 200 terabytes. From Fast Company:

...cognitive scientist Deb Roy Wednesday shared a remarkable experiment that hearkens back to an earlier era of science using brand-new technology. From the day he and his wife brought their son home five years ago, the family's every movement and word was captured and tracked with a series of fisheye lenses in every room in their house. The purpose was to understand how we learn language, in context, through the words we hear. A combination of new software and human transcription called Blitzscribe allowed them to parse 200 terabytes of data to capture the emergence and refinement of specific words in Roy’s son’s vocabulary.

The data visualization techniques he uses are pretty cutting edge ... and awesome! I love the fact that he is trying to use visualization techniques to help us understand something beyond raw statistics (which is where most graphs and pie charts die  miserable deaths). Statistics are like molecules. Visualize them one by one and it's difficult for the average person to conceptualize the big picture of how they work together to create a grander whole. Roy appears to be trying to get beyond the yawn-inducing graphs that plague modern science. I mean, he uses freaky-deaky time-worms! How cool is that!

Roy talk's about feed-back loops as well:

..."Caregiver speech dipped to a minimum and slowly ascended back out in complexity.” In other words, when mom and dad and nanny first hear a child speaking a word, they unconsciously stress it by repeating it back to him all by itself or in very short sentences. Then as he gets the word, the sentences lengthen again. The infant shapes the caregivers’ behavior, the better to learn.

He gave a TED talk recently, but the video is not yet available.

Thursday, March 3, 2011

Hosni prefers "Hosny" in transliterated attire

Rachel Maddow et al. discovered a delicious gem fit for the annals of transliteration. Namely, how to write a specific Arabic name in the Roman alphabet (what we English speakers like to call "regular spelling"). She (and her staff) reported that Hosni Mubarak attended a head-of-state meeting in Albania a couple years ago wearing the world's most narcissistic pinstriped suit*, where the pin stripes were actually composed of lines of his name written in Roman alphabetic transliteration (this man really knows how to live the life of a tyrant, am I right?):

It is a troublesome fact of human language that writing the damned thing down is never easy. It's difficult enough to construct a writing system that is consistent for a single language, more difficult still to take a linguistic term (like a person's name) and write it down in a script which was not designed for that particular language. So when English language writers (like journalists) have to write down Arabic names in "regular spelling" they inevitably face difficult choices about which letters to use to represent particular sounds. Vowels are particularly difficult creatures to pin down with alphabetic rope (e.g., the whole and sometimes y fiasco).

The act of writing a linguistic term in a foreign script is called transliteration, and it's troublesome enough to have spawned a cottage industry sub-field within computational linguistics. For example, if you wanted to Google information about the currently exiled president of Egypt, you would be wise to Google the term "Hosni Mubarak." That is by far the most common spelling of the man's name on the internet (by a better than 20-1 margin, at least according to Google hit counts). Even if you choose the "Hosny" variant, you're basically just redirected to the "Honsi" results anyway. Yet the tyrant himself, ever the maverick, prefers the road less traveled.

Sadly, there's not much more to say about this than to emphasize the simple fact that transliteration is largely arbitrary and disputes about guidelines are largely trivial. Just flip a coin and move on ... (I just seriously pissed off the world's four transliteration experts).

...and in closing I'd like to repeat my assertion that Hosni/y Mubarak looks suspiciously like The Face of Bo**:
*FYI, I have no independent verification of the truth of this story. If Maddow's staff got punk'd, their bad.
**Damn you Captain Jack!!

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

I recently watched Andrew Ng's excellent lecture from 2016 Nuts and Bolts of Applying Deep Learning and took notes. I post them as a he...