Home » Arts & Life, nature, science

Tumblr Sphinx

By (July 1, 2012) No Comment

Manga Style Space (Lev Manovich and Jeremy Douglass, 2010)
Should readers with a non-professional interest in history and literature care about the “digital humanities”? The phrase itself isn’t very helpful: it’s an umbrella term covering any intersection between digital technology and humanistic learning. Digital humanists might use the Web to cooperatively transcribe a writer’s journals, for instance, or use software to visualize the original layout of an archaeological site. But what difference will this make for non-specialists? When Stanley Fish blogged about this phenomenon for The New York Times, he presented digital approaches to the humanities as the latest in a long series of generational struggles within literary criticism. Historicism begat structuralism, which begat post-structuralism, which begat … historicism again, which begat digital humanities. If this is just another generational struggle between academics, non-professional readers might be well advised to tune it out.

But Fish is wrong in two important ways. First, It’s not just (or even mostly) literary critics who are applying digital technology to the humanities: historians, musicologists, librarians, and computer scientists are all actively involved. Second, this is not a project driven mainly by theoretical conflict between scholars; it has been catalyzed by events outside the academy. Scholars don’t enjoy acknowledging this: we like to point out that the fields of “humanities computing” and “media studies” have a long academic history, stretching back to the middle of the twentieth century. That’s true, but it is also true that those fields have been lifted to new prominence by the Internet. When computers and culture were two separate things, digital approaches to culture received (at best) a raised eyebrow from mainstream scholarship. Now, “quantification” and “communication” are difficult to separate. As a mode of communication that is also a mode of computing, the Web has forced media theorists and scholars of computing into an uneasy alliance, and connected both of those fields to an ongoing transformation of popular culture.

So “digital humanities” isn’t a name for a new theoretical school or academic subfield, but for a complex collision between the humanities and the Internet. The effects of that collision are wide-ranging. It expands the subject matter of the humanities to include things like video games; it changes the way scholars communicate with each other and with the wider world; and it gives computing a larger role in disciplines previously resistant to quantification.

The last of these consequences—quantification—is the part of the story that tends to get the most attention. Since the Romantic era, literature and the arts have been defined as a humane refuge from the dismal numbers that transformed other aspects of social life. So journalists writing about the digital humanities are strongly tempted to frame them in terms of a historic showdown between “culture” and “data.”

The showdown is mostly imaginary. For one thing, quantification is not really new in the humanities. Historians have counted populations and trade routes; literary critics have counted editions and readers. Computers just make it possible to quantify on a larger scale: for instance, you can now count every word in a collection of several million books. But even this is not new. Counting words in a large collection of documents is a research technique that has been used widely (but mostly unknowingly) by amateurs since the 1990s. Typically, they call it “Google.”

Modern search engines have a lot of moving parts, but they are fundamentally based on an index that counts occurrences of every likely search term in every public Web page. The engines use those numbers (along with other information) to assess the relevance of a page to a particular query. Counting words may sound like a crude way to understand the content of a document—and it is. But the technique works well enough as a first approximation that we have been using it for years to find our way around the Web.

For several decades scholars have been using search engines in a similar way to find primary sources, usually without any consciousness that they were relying on a quantitative research strategy. But as search techniques grow more sophisticated, the computational underpinnings of research are going to become harder to overlook. For instance, instead of choosing search terms to find a particular document, it is possible to map a whole collection of documents by instructing a computer to find clusters of terms that tend to appear in the same contexts. The technique is called “topic modeling.” Applied to nonfiction, it produces “topics” that look a lot like Library of Congress subject headings: war, finance, agriculture, and so on. But applied to a collection of poems or novels, it can produce subtler groupings of writers who rely on similar vocabulary. When I modeled 1,853 works of eighteenth- and nineteenth-century literature, for instance, Mary Shelley, William Godwin, and Mary Wollstonecraft turned out to be united by a single topic. Among the roughly 600 writers in the collection I was using, they were the three who relied most heavily on words like “mind, feelings, heart, felt, soul, world, affection, nature.”

One of 150 topics, visualized by plotting the volumes where it is most frequent. (Many volumes where it was uncommon are left unplotted.) The color of a circle represents genre; the X’s at the top represent volumes written by Mary Shelley or her parents.

Now, it’s not surprising that these three writers should be connected. We already knew, after all, that Mary Wollstonecraft married William Godwin and was Mary Shelley’s mother; we also knew that these three had similar political leanings. So it makes sense that they shared certain quirks of vocabulary. But we see here that the mere counting of words turns out to be surprisingly effective at revealing connections between texts. As we expand from collections of 600 writers to collections of 60,000, it seems very likely that digital techniques will reveal some connections we don’t already know about. Counting words may not prove anything about those connections, but that’s fine: a genuinely new lead is valuable enough.

This exploratory kind of research probably isn’t what the word “quantification” brings to mind. Most of us have a firmly fixed idea that numbers produce certainty, so when readers hear about computational approaches to the humanities, they usually assume that the point of computation must be to prove some narrow thesis with Spock-like accuracy. Numbers also, though, serve to navigate in a domain too large to survey at a single glance. This is how we all currently use quantification in search engines (although the numbers themselves are hidden from view), and it seems likely that such exploration will continue to be the main purpose of numbers in the humanities.

But there are, of course, many things to explore besides text. Scholars are also mapping patterns in music and visual images. One particularly active area of research involves visualizing geography. The ORBIS project at Stanford, for instance, is a kind of Mapquest service for the Roman world. It can show you how to get from Rome to Constantinople, or anywhere else in the Empire, by foot, by ox cart, by horseback, or by sea—while also telling you how long it would have taken, and how expensive it would have been to move different kinds of cargo by those means.

Exploratory tools like this make it possible for non-specialists to play an active role in research. If you go to the library and look up a book about travel in the Roman world, you will find in that book exactly what its author already knew. But there is potentially more information in ORBIS than any of its creators consciously grasped; the information is waiting to be revealed by a user who asks the right set of questions.

The same thing could be said about tools like the Google ngram viewer, which allow users to graph the frequency of phrases over time. To be sure, raw frequencies of words and phrases are hard to interpret: the fact that “pizza” became more common than “hamburger” around 1983, in Google Books, certainly does not imply that pizza became more common than hamburger in real life. But tempting leads still emerge from this sort of archive. Robin Sloan’s idea of graphing the meme “Once upon a time” is a good example: a graph like this in effect poses a good question about the history of storytelling, one to which we’d all like to hear an answer.

There are certainly pitfalls for the unwary in this sort of exploration (try graphing “best” and “beft” before 1820). But the bottom line is that the Google ngram viewer empowers nonspecialists to ask new questions about the history of language and society. More provocative examples (along with some random ones) are archived at Zach Seward’s ngram tumblr.

It remains crucial that the people who design new exploratory tools should themselves have an understanding of the humanities. Numerical expertise by itself is never sufficient. For instance, a group of mathematicians and computer scientists at Dartmouth recently compared word usage across time and discovered that certain aspects of change became more rapid in the twentieth century; they inferred from this that twentieth-century authors might have been less strongly influenced by their predecessors than earlier generations. Their conclusion is deeply unpersuasive because it overlooks a basic fact about language that you can demonstrate by opening your mouth and saying “ah.” To put it bluntly: reading is not the primary way that children learn language—there is this thing we do with our mouths. So, while it’s interesting that the pace of language change has varied from one period to another, we can’t be sure this raw result tells us anything about “literary influence.”

But the problem with the Dartmouth group’s argument isn’t that scientists tried to quantify something ineffable. We probably can quantify specific aspects of language change. Nor is the problem that they chose a narrow, boring topic: it’s a genuinely interesting question whether radio, for instance, stabilized or accelerated language change in the twentieth century. The problem is that the authors of this article were working in an unfamiliar field and thus failed to realize that their model of it was incomplete. That’s a risk in any field of scholarship: if you didn’t understand the atmosphere, for instance, you might easily look at recent rising temperatures and conclude that the sun must be getting hotter. A mistake of that kind wouldn’t prove that numbers are essentially reductive and therefore powerless to help us understand the mysteries of climate. It would prove only that numbers are never a substitute for detailed knowledge of a specific field or discipline. To test a hypothesis, you have to be able to envision possible alternatives to it. And once you can envision the possible alternatives, you may realize that certain phrases (like “literary influence”) cover too many different things for a quantitative model to be useful.

So, although it makes a nice, scary hook for an article, the chance that traditional kinds of humanistic expertise will be displaced by some new soulless science of culture seems to me remote. It won’t happen because numbers don’t actually have power to displace expert knowledge in any field—in the sciences or in the humanities. However, I admit that I occasionally despair of communicating this to my fellow humanists. Just as we have it firmly in our head that the only point of numbers is to produce certainty, we have it firmly in our head that quantification is an unstoppable devouring force like Ridley Scott’s alien. Once you let it into the humanities—game over. So your only chance of resistance is to kill it in the airlock, before it gets a foothold.

It’s a compelling story, because humanists feel embattled for many other reasons —state legislatures, Boards of Regents, students whose deepest passions are no longer connected to traditional literary forms. On many campuses, these struggles get expressed as a rivalry with science and engineering (although scientists and engineers themselves often have a baffled respect for humanists). Also, frankly, when I say that digital technology empowers non-specialists, I am not saying something that necessarily brings joy to every academic heart. Senior professors don’t always relish the idea that their work might become part of a collective project, where librarians and computer scientists have an equally important role to play.

So I suspect that nothing I say will dissuade academic humanists from telling stories that represent quantification as an all-powerful devourer or superficially plausible tempter. After all, those narratives accurately express how many of us feel. But for the record: we don’t have to systematically distinguish good digital scholarship from bad quantitative scholarship, or build firewalls that define in advance the limits of quantification in our field. As always, we can look at humanistic arguments on a case-by-case basis and criticize the ones that aren’t persuasive. We can still recognize a bad argument when we see one, even if it involves numbers.

I don’t mean to deny that digital technology can produce unsettling changes. It used to be the case that the humanities differed from the sciences in requiring no specialized equipment. An amateur equipped with pen and paper and the novels of Virginia Woolf had essentially the same resources as a professional working on the same topic. But it is unlikely that an amateur (or even a lone academic) will be able to build a project like ORBIS or the Google ngram viewer. Digital projects don’t necessarily have to be funded by Google, but they do typically require interdisciplinary teams and institutional support. In that sense, the humanities may be moving a bit closer to a model of research that resembles the sciences. This shift can be exciting, because it’s fun to build things and to collaborate across disciplinary lines. But it may also create new tensions and new kinds of inequality, since the people who design a tool are shaping the questions other researchers can pose with it. Many critiques of digital humanities, expressed abstractly as objections to numbers or to “big data,” are really expressions of concern about this social dimension of technology.

I don’t want to dismiss those real and justifiable concerns. On the other hand, the problem of unequal access to technology has been with us for decades. Scholars already use collections of digital sources created by private enterprise; these generally aren’t available to the public, and many of them are only licensed for use at wealthy institutions. In other words, we’ve outsourced technological inequality. Building tools for our own use will mean that we have to acknowledge the problem, and grapple with it directly. That won’t banish inequality: it will still be worth asking who designs these collections and who decides how they can be explored. Questions about gender and race, for instance, are going to be important in this context.

But if scholars and librarians are doing the work themselves, those questions can be answered deliberatively, and the tools that we produce can be made freely available—at least, as far as absurd copyright laws permit. Moreover, the underlying source code can itself be shared, so that other researchers can extend or revise tools if they disagree with the original builders’ decisions. An open-source approach has already become standard among the humanists who do this sort of work.

So to answer the question I started with: yes, everyone should care about this phenomenon called “digital humanities.” It’s not a school or movement within the academy but an encounter, both provocative and productive, between academic disciplines and the Internet. One of the consequences of this meeting of modes will be that history, music, literature, and many other subjects get opened up to new kinds of interactive exploration. Technology has certainly raised questions of power that we need to attend to: in particular, it is vital that humanists themselves should help shape these new exploratory tools. But many humanists are ready to do that, and there is plenty of reason to be optimistic about the results. I, at least, am hopeful that digital technology will make the humanities livelier, more participatory, and more transparent.

Ted Underwood teaches English literature at the University of Illinois, Urbana-Champaign, and also builds tools for digital exploration of eighteenth- and nineteenth-century literary history. He blogs about digital humanities at The Stone and the Shell.