Category Archives: web 2.0

Culturomics: Not Quite Yet

Well, it’s time to stick my oar in on the Google Ngrams discussion. While a number of computational linguistics scholars have pointed out the pitfalls of Google’s latest toy, I think I have a unique perspective to offer on the issue. I understand what the Ngrams creators were trying to do, because I’m trying to exactly the same thing: get some things cooking. My research on contemporary literary reception is not exhaustive or dependent on highly complex statistical models. That’s because literary reception is a huge, multiply mediated field ranging from café conversations to book reviews, and my access to data is limited. But where I have adopted a “core sample” model, choosing a few accessible data sources to make some robust but limited generalizations about readers and reading culture, Google has gone for the moon shot. By creating an opaque front-end to their 5 million book archive, they offer the illusion of a truly global Ngram search—and they emphasize the scale of their ambition by claiming their tool isn’t merely a corpus search mechanism but the portal to a new science of “culturomics.”

As my colleague Matthew Jockers noted in his own oar-insertion post, “To call these charts representations of ‘culture’ is, I think, a dangerous move.” He goes on to suggest it “may be,” but I have to go a bit farther and say “definitely not.” Here’s the problem: we can’t get reasonable, arguable claims about things like culture or literary history unless the limitations of the corpus are acknowledged and dealt with from the outset. Typically, projects like this limit themselves either by going too small or too big, and Google has gone way big. Let me explain what I mean.

Too Small:

The opposite example would be a research project on a small, meticulously tended patch of texts. Classic humanities research, really, but of limited usefulness for making grounded claims about larger literary-historical or cultural issues (at least until enough such small projects emerge with commensurable results that we can begin to construct some causal chains). Traditional humanities as a whole is full of projects that are “too small” for making broad cultural claims because they are limited to a small data footprint. The walled garden of closely tended results is fascinating and lovely to explore, but it’s difficult or impossible to compare the work to anything outside.

Too big:

Google, by contrast, flies off the macro end of the scale by trying to do too much and claim too much. The corpus is amazing, but nevertheless limited and contingent in many ways. As others have pointed out, the OCR is problematic; the metadata is sloppy; the text distribution almost certainly has a number of biases (how could it not? What is the gender, historical and language distribution of the world’s universal library supposed to be anyway?). By choosing to obscure these limitations instead of illuminating them, Google turns “culturomics” into a toy, not a tool.

Fortunately, the data is all there, and these problems can be fixed. Google loves a good algorithm and will presumably figure out solutions to the various technical problems. With luck (and the persistence of its academic research partners) the Ngrams team will also come to acknowledge and reveal the limitations on its data. Once that happens, we can really get cooking and make a clear case for when this vast corpus really does reveal broad cultural trends.

For now, Ngrams is a blunt object but it still has some value as a tool. I’ll post some examples next time.

Gross National Happiness

Researchers have begun using Facebook as a social dataset for some very interesting research, including the recently released Gross National Happiness Index. The metric tracks aggregate “happiness” (based on the use of words like “happy,” “joy,” “awesome,” etc) on a daily basis.

This is exciting news for me for two reasons. First, it means there are other people out there using these commercial websites to produce real research. It’s validating to see others agree that online mega-sites are turning into social resources in their own right, spaces diverse and vast enough to support (more or less) general population research. Second, it’s led me to LIWC, an intriguing piece of software for measuring different kinds of aggregate themes in texts–positive and negative emotions, for example. There are a number of similar efforts out there, but this one does seem to be fairly comprehensive, and it’s been put to impressive use on the Facebook project. I’m thinking about how to analyze professional and consumer book reviews in more sophisticated ways and this route has some strong appeal.

A Very Finite Summer

Since I’m working on the changing nature of reading and on contemporary American literature, it seemed almost obligatory for me to check out Infinite Summer, a massive blog-based reading group organized around David Foster Wallace’s massive Infinite Jest. The reading group’s pace is quite reasonable by grad student standards–75 pages a week–but in the true spirit of studentdom I started weeks late and have been struggling to catch up.

That means I haven’t yet really delved into the culture of the online exchange, but I am curious to see how things are going over there. From my brief perusal of the site so far, it seems the basic structure is for a few authors to post on their reading experiences, and the rest of the community is left to hang out in the comments. This works well for your average blog, but it seems a little limiting for a book discussion group, which would really work better with a forum architecture. Maybe there is one and I haven’t found it yet?

The site’s structure does seem to emulate the deceptive orderliness of Infinite Jest, with its footnotes and acronyms.* There are guides and summaries and a schedule, but I find the site disorienting as a whole, as a place to talk about the book, much as Infinite Jest ends up being disorienting. Readers quickly realize that the acronyms are explained inconsistently, at random, in medias res; that they’re thrown in and out of numerous plot-lines like hapless tennis balls; that the end notes and gestures toward structure are deeply satirical and philosophically agnostic about the whole idea of knowledge. Hence, on the site: the conversation goes on through a Twitter tag, comments, Tumblr, Facebook…and I just found the forum. They do have one after all.

I guess this isn’t a bad way to honor Wallace’s passing, but is it a good way to talk about his book? Obviously I’m thinking of a different kind of conversation, one where people lean forward around a table and interrupt each other, whereas Infinite Summer is a beast that can only exist online: an imaginary space full of people zooming in and out, talking about the book or not, employing various means of intellectual transportation.

I love the idea of this online reading group, so my question isn’t meant to be hostile, merely inquisitive. I’ll report back when I’ve learned more (and, say, actually read more than a handful of posts from the various zones of Infinite Summer).

* Acronyms, while cryptic, always imply a bedrock of rational thought, convention and informational structure, however ludicrous that implication might be.

Book Seer

I can’t decide whether to be excited or annoyed that somebody else has come up with the same idea I’ve been playing around with for several months now in my dissertation research. Well, the beauty of the web is that they can slap a quick implementation up overnight, whereas it’s going to be months if not years before I really get my work out into the open. Where my six professional readers can really delve into it.

So while we’re waiting for that glorious day, we can play around with Book Seer, a recommendation site that asks you for a book and then scrapes Amazon and LibraryThing to suggest further reading for you. Neat!

More Culture Maps

The images linked below are two more examples of the material I’m generating for my dissertation. The first is a visualization of the authors and literary references (in proper noun form) made by New York Times reviewers of Pynchon’s books. The second image is the same, only drawn from Amazon customer reviews of Pynchon’s books. Comparing the two, you can see how different sorts of cultural reference (and different levels of density of reference) exist in the sets of text.

Both images were created using the wonderful web gizmo Wordle, which allows users to upload their own data and create custom visualizations.

Culture Map: NYT Reviews

Culture Map: Amazon Reviews

The New Open Culture

My good friend Dan Colman has recently moved his great site Open Culture to its new Internet home, the one it should have had all along: I wrote a few blog posts for Dan back in the day (far fewer than I’d actually said I would, alas), and I love the site.

If you’ve never seen it, be sure to check it out, especially his incredible, expanding archive of free high-quality podcasts, lectures and more–including a great list of free audio books.

Culture Map #1

I’m trying to work out different ways of mapping out the networks of books, ideas and writers that build up around different novels over time–a concept I’m calling ideational networks. The web is fostering a lot of these networks (think Web 2.0) and at the same time preserving them, allowing me to map some of the connections.

One of the things I’ve been looking at is the ecology of book recommendations and reviews on sites like Amazon and LibraryThing. Below is a map of the book recommendations branching out from LibraryThing, which we can assume is driven largely by the book choices that users of the site have made over time.

As you can see from the image below, the network is fairly diffuse, but with some interesting connection points. Nabokov’s work, particularly Pnin, seems like a major intersection between different cultural sub-networks. I’ll have more to say about this and other maps as I continue working, but for now I thought this might be a cool image to share. If anyone’s interested I’ll share some of the technical details in a future post.

Culture Map 1

Finding the Poetry in the Desert of the Real

You’ve got to love that Slavoj Žižek. I developed a fondness for his inspired/crazed lacanian readings of popular culture when I put together a course on the Matrix trilogy a couple of summers ago. So I think the author of Welcome to the Desert of the Real might have some interesting things to say about the clip below. Fortunately it’s my blog so I’m going to say some interesting things instead. But go ahead and watch it first.

What I love about this is the way the creator finds poetry in the many wasted moments of our blasted media landscape. I mean no insult to Charlie Rose, but I love the way the quirks, gaps and nuances that usually speed by too quickly for thought are captured here like fireflies in a jar. The shaggy, lurching bizarreness that makes us human lurks behind even the most poised and professional mask, and I think this clip helps bring it out.

Thanks to friend Dan at Open Culture for posting this!

More on Bourdieu + Lab Notes

I’m going to drop the Dissertation Update titles in lieu of the “dissertation” tag below. The blogs gets to be even more monotonous than usual when all the titles start off the same.

Today I thought to look up for the first time when Bourdieu died and what sorts of things he was up to in his later life. There’s a deeply cynical side to academic research, one where the news of Bourdieu’s death in 2002 provides a sense of frank relief. After all, what if he was still out there, thinking about all the new media things I’m planning to write about? It’s much easier to work with a fixed body of work, no matter how great (or just controversial) that achievement is. I found a wonderful little obituary for Bourdieu in The Nation, written by Katha Pollitt.

Finally, I’ll add a link to Work Product, a “research diary or lab notebook” put together by Matthew Wilkens, a postdoc at the Humanities Research Center at Rice University. Wilkens is doing some very interesting stuff and his blog is a more sophisticated (and consistent) example of what I’m hoping to accomplish here. He’s evaluating Part of Speech taggers right now, which is a major service to us all. Way to go, Matthew!

Digital Fiction

I just came across a post on BoingBoing to some new digital fiction put together by Penguin. I’m excited about this for two reasons. First of all, each of the pieces (there are six in all) experiments with a different digital form. Second, a major publishing house is demonstrating interest in digital literature–great news for someone who’s hoping to write, and write about, some digital lit himself one day.