Category Archives: digital humanities

Papers, papers

In all the excitement of the holidays, MLA and then a trip to Egypt (!), I didn’t have a chance to post about an exciting update from the publication front. Since then I also had some good conference news, so here’s the skinny.

I’m really delighted to be participating in an awesome book project co-edited by Lee Konstantinou and Sam Cohen considering the impact of David Foster Wallace. The collection is under contract with Iowa and it got a great writeup in The Chronicle of Higher Education. Very exciting! I’m working on revisions to my chapter right now. My piece will explore how different groups of readers are defining Wallace’s legacy through book reviews and literary consumption.

I had a paper accepted to Digital Humanities 2011! This may not seem like as big a deal until you start reading the comments on Twitter from people who didn’t get in: the acceptance rate was only 31% for panel proposals. I’ve really enjoyed my previous two DH conferences, and I’m looking forward to presenting with fellow LitLabbers Zephyr Frank and Rhiannon Lewis, with Franco Moretti as moderator. The panel is titled “Networks, Literature, Culture” and it’s going to be fantastic. I’ll save you a seat.

Culturomics: Not Quite Yet

Well, it’s time to stick my oar in on the Google Ngrams discussion. While a number of computational linguistics scholars have pointed out the pitfalls of Google’s latest toy, I think I have a unique perspective to offer on the issue. I understand what the Ngrams creators were trying to do, because I’m trying to exactly the same thing: get some things cooking. My research on contemporary literary reception is not exhaustive or dependent on highly complex statistical models. That’s because literary reception is a huge, multiply mediated field ranging from café conversations to book reviews, and my access to data is limited. But where I have adopted a “core sample” model, choosing a few accessible data sources to make some robust but limited generalizations about readers and reading culture, Google has gone for the moon shot. By creating an opaque front-end to their 5 million book archive, they offer the illusion of a truly global Ngram search—and they emphasize the scale of their ambition by claiming their tool isn’t merely a corpus search mechanism but the portal to a new science of “culturomics.”

As my colleague Matthew Jockers noted in his own oar-insertion post, “To call these charts representations of ‘culture’ is, I think, a dangerous move.” He goes on to suggest it “may be,” but I have to go a bit farther and say “definitely not.” Here’s the problem: we can’t get reasonable, arguable claims about things like culture or literary history unless the limitations of the corpus are acknowledged and dealt with from the outset. Typically, projects like this limit themselves either by going too small or too big, and Google has gone way big. Let me explain what I mean.

Too Small:

The opposite example would be a research project on a small, meticulously tended patch of texts. Classic humanities research, really, but of limited usefulness for making grounded claims about larger literary-historical or cultural issues (at least until enough such small projects emerge with commensurable results that we can begin to construct some causal chains). Traditional humanities as a whole is full of projects that are “too small” for making broad cultural claims because they are limited to a small data footprint. The walled garden of closely tended results is fascinating and lovely to explore, but it’s difficult or impossible to compare the work to anything outside.

Too big:

Google, by contrast, flies off the macro end of the scale by trying to do too much and claim too much. The corpus is amazing, but nevertheless limited and contingent in many ways. As others have pointed out, the OCR is problematic; the metadata is sloppy; the text distribution almost certainly has a number of biases (how could it not? What is the gender, historical and language distribution of the world’s universal library supposed to be anyway?). By choosing to obscure these limitations instead of illuminating them, Google turns “culturomics” into a toy, not a tool.

Fortunately, the data is all there, and these problems can be fixed. Google loves a good algorithm and will presumably figure out solutions to the various technical problems. With luck (and the persistence of its academic research partners) the Ngrams team will also come to acknowledge and reveal the limitations on its data. Once that happens, we can really get cooking and make a clear case for when this vast corpus really does reveal broad cultural trends.

For now, Ngrams is a blunt object but it still has some value as a tool. I’ll post some examples next time.

Map Marathon

I received an email about a wonderful new exhibit/collaboration “Map Marathon” organized by the Serpentine Gallery in London and those intrepid thinkers at Edge. The whole online gallery is fascinating, but what really caught my fancy was this image, apparently submitted by Bruce Sterling. It’s a map of writers who are associated with Sterling, and therefor it has a lot in common with my research.

After some investigation it looks like the map was generated with Gnod, or Gnooks to be exact: “a self-adapting community system based on the gnod engine.” I’m intrigued–it seems like the site’s connections are based on user input to its adaptive learning system. I’d love to compare these networks to my own data.

London Dispatch

I’ve once again fallen way behind in my blogging, but fortunately I have much to report. I’m writing from Digital Humanities 2010, where I’ll be presenting my latest research on Saturday. The conference is in London and it’s been exciting and a little befuddling to wrestle jet-lag amidst an exciting array of panels and posters.

The paper I’m giving is on Toni Morrison, the subject of the recently completed Chapter 2. It’s in its fourth iteration now, after a trial run among the friendly brains at Stanford and great panels at ASU’s Southwest English Grad Students conference and ACLA. At each point I’ve been refining my methodologies and slides (lesson one: visualization is endlessly finicky).

As before, this is a case study where Morrison’s work is really a jumping-off point for an exploration of her reading publics and the nature of literary fame. When I presented at DH2009, I was still working out how to approach these questions and adopted a kind of shotgun strategy, using every data set and methodology I could think of to see what worked. That paper, on Thomas Pynchon, had a lot going on: networks of Amazon recommendations; Wordle images based on word counts of book reviews; bar graphs of library copies; graphs of MLA citations and comparisons of MLA, Amazon and newsgroup publications by year.

Most of these ideas were interesting, but only some of them ‘stuck’ for me. The cyclical nature of academic and other kinds of publication, for example, was revealing to see but a point that probably only needs to be proven once. This year I’ve decided to focus on the richest results from the past and push the envelope. My paper will look at the social lives of Morrison’s novels, and the ‘social’ networks they inhabit online. I’ve worked hard in the past year to create collocation-based networks and to use network analysis to identify the most significant nodes and clusters in Morrison’s ideational networks online. These are the most interesting, and the messiest, of my datasets, and network analysis has revealed some surprising patterns that I’ll be sharing on Saturday.

So that’s the major news. I have a couple of other projects cooking that I’m going to write up when I have some solid bulletins to report.

A Big Year

2010! Where is my jetpack?

It’s been a busy year so far, and I’m hoping to keep up with this new, futuristic energy. After a bit of a slow autumn (we use the term metaphorically here in Phoenix) and the usual distraction of the holidays, I finally got to check a few major items off my list this week. Yesterday I completed a funding application for the Stanford Humanities Center–they offer a few dissertation fellowships each year. Today I finally–FINALLY–finished revising a paper submission based on my Pynchon chapter and sent it back for round two.

Now it’s time to buckle down and return to data analysis. I’ve assembled a great pile of book reviews and recommendations in a MySQL database, and I have a few discrete challenges ahead of me:

First, I need to come up with an effective way to identify and then tag proper nouns in book reviews. This is easy to do badly and then clean up by hand, which is what I did for the last chapter. But there are a lot of Morrison reviews out there, so now I really need a computer for this. As a first pass/proof of concept I’m hand-editing a little “dictionary” of all the proper noun literary references made in professional reviews of Morrison’s work. Then I’ll write some kind of program to search for and tag those references in the reviews.

Once I get that figured out, the second trial process is going to be creating network graphs of these literary references based on collocations. I think I’ll probably start by defining links as “in the same paragraph,” but this might change depending on how useful the graphs end up being.

If I can get all this working in the next week or two, hopefully I will get some kind of epiphany for how to do automate the process elegantly for a much larger, and badly proof-read, set of consumer reviews of Morrison. It’s 2010…where is my artificial intelligence research assistant?

I have arrived

It’s been quite a while since I updated this blog, so here’s a rapid review.

I’ve completed a draft for my dissertation chapter on Thomas Pynchon.

I’ve got a messy first half of an introductory chapter too, but I’m trying hard not to think about just how much revision that’s going to need.

All of this has snapped into close focus with the end of the academic year and my presence this week at the University of Maryland for Digital Humanities Conference 2009. After months of solitude interrupted mainly (if regularly) by the dogs, I find myself surrounded by people thinking about the same questions I’ve been wrestling with. Cool!

I’ll be presenting on Thursday and panel-hopping for the rest of the time. I’m also looking forward to meeting and re-meeting luminaries of my Twitter and podcast world.

More Culture Maps

The images linked below are two more examples of the material I’m generating for my dissertation. The first is a visualization of the authors and literary references (in proper noun form) made by New York Times reviewers of Pynchon’s books. The second image is the same, only drawn from Amazon customer reviews of Pynchon’s books. Comparing the two, you can see how different sorts of cultural reference (and different levels of density of reference) exist in the sets of text.

Both images were created using the wonderful web gizmo Wordle, which allows users to upload their own data and create custom visualizations.

Culture Map: NYT Reviews

Culture Map: Amazon Reviews

Reading : Material Culture :: Chicken : Egg

A few weeks ago Matthew Wilkens posed a question reaching to the heart of my interdisciplinary project:

A question I’m sure you’ve already gotten many times and likely will many more in the future: To what extent is this kind of work meaningfully understood under the rubric “literary criticism” at all, as opposed to literary-themed sociology and/or the business of literature? … [I]t seems to me that the line between the English department and the sociology department or the business school probably falls somewhere around whether you want to explain the features of particular texts by reference to social/cultural/economic factors, or explain socioeconomic effects by way of book-related networks. So … which is it?

As I replied then, the answer is a bit of both, but I think I ought to expand on that a little more. I am particularly interested in literature as a social phenomenon, and not just an individual experience. Reading can have extremely powerful transformative effects on the individual, of course, and those changes can impact whole categories of interaction and cultural thought. I believe that the authors who have been most successful both commercially and critically are particularly gifted at recasting the operations of our reading minds. Not only does reading Pynchon or Morrison enlighten, entertain and at times frustrate, it also changes how we think about fundamental planks in the social structures holding us together, like ideas of race or communication.

That said, I hasten to add that I don’t think of this project as an economic story or a business school case study. I don’t think these authors set out to get rich and decided that writing novels was the way to do it. Nor do I believe that they are motivated by a quest for recognition or a conscious desire to change how people think, though I do think those motivations are intrinsic to almost all of us to some degree.

Instead, I think of this as a literary approach to the question of reading. If the humanities must show their worth, there is no better way to do it than to reveal the structures of connection and thought that define us as cultural beings, to show how those structures are changing, and to consider the many and expanding ways in which we read and write the cultural landscape. Contemporary literature is an exciting, complicated field to work on, and it takes an interdisciplinary approach to map out the connections between different kinds of cultural authority, changing modes of readership/criticism/authorship and the abiding power of literature to convey human experience at a deeper level than any other medium.

In short, I don’t think there’s a one-directional causal force at work here. These ideational networks of texts, ideas and people are messy, provisional things that generally influence us in subtle, if pervasive, ways. I’ll be doing some close reading, and also trying to think about how others do their close reading, and how we read and evaluate culture collectively.

Talking Pynchon at the Digital Humanities Conference

I’m excited to report that my paper on Pynchon was accepted for the annual Digital Humanities Conference in June. It’s provisionally titled “Cultural Capital in the Digital Era: Mapping the Success of Thomas Pynchon” and will be a first run at the Pynchon chapter of my dissertation.

I’m trying to pull together research for the paper now and am hoping to focus on creating some “cultural network” maps of books that have been brought into association in various ways. For instance, professional book critics invariably describe new books in comparison to established ones so readers can get a sort of triangulated idea of what the new thing is like. Sites like Amazon and LibraryThing are much more explicit in the connections they draw, though of course the mathematical models they employ seem even murkier than the brain’s associative engines. So my first objective is to pull together some maps of the books that cluster around Pynchon in these respectively critical, commercial and webby venues.

I’ll post more about these ideas (and hopefully some web-based models for people to play with) once I know more. I’ve spent the past week reigniting the long-dormant Perl modules in my head. Next step: visualizing the data.