In the wake of President Barack Obama’s electoral rout over former Massachusetts Governor Mitt Romney last fall, the concept of data mining has become conversationally ubiquitous. The 2012 election result, however, is just one example of what might be characterized more broadly as the digital turn in American culture. This turn has included streaming video and smartphones, but it has also involved the collection of sophisticated information and the leveraging of that information in service of various agendas.
In the higher education, a related movement known as digital humanities (DH) has garnered increased attention in recent years. Multiple definitions and characterizations of DH exist, but the most prevalent is the application of computation methods to humanities research.
Such a definition demands specification. What kinds of computational methods? What kinds of applications? What insights are you offering? Why should anyone outside your discipline care about what you’re doing?
A group of digital humanities scholars gathered in Lincoln, Neb., this February to discuss some of these important issues. “Hacking at Books,” a Nebraska Forum on Digital Humanities, was the 2013 edition of an annual, thematic exploration of DH issues hosted by the University of Nebraska-Lincoln’s Center for Digital Research in the Humanities. Previously called Nebraska Digital Workshop, it has been held since 2006.
The 2013 Nebraska Forum on Digital Humanities featured two keynote speakers: Ted Underwood, associate professor of English at University of Illinois, Urbana-Champaign, and Tanya Clement, assistant professor in the School of Information at the University of Texas at Austin. Additional panels provided an opportunity for scholars to share ideas, ask questions and raise concerns. More than 50 scholars, teachers, graduate students and undergraduate students from the University of Nebraska and various universities across the country were in attendance. Fields ranging from literature, linguistics, history, library science and computer science were represented.
Clement’s keynote, “Sound Seeings, or High Performance Sound Technologies for Access and Scholarship,” set the tone for “Hacking at Books.” Clement’s project, called HiPSTAS (High Performance Sound Technologies for Access and Scholarship) uses a software package originally developed to analyze birdcalls to visualize the sound dynamics of audio files, including their rhythmic patterns. Clement’s research team has used this software to analyze poetry read aloud, as well as moments of language shift in recordings of an Ojibwe speaker.
In a sense, Clement’s work is as much a question as it is an answer. What happens when we teach a computer program to understand the subtleties of sound? What otherwise obscured information can this kind of analysis bring to the fore? The next phase of Clement’s work will involve even further collaboration. Her work on deep listening could help answer countless questions that researchers in other disciplines have begun to formulate. One of the great strengths of digital humanities has been its capacity to bring together scholars with questions and scholars offering new ways to address those questions.
Underwood’s talk, “How Well Do We Understand Literary History?,” emphasized the rewards of using data mining to uncover new insights about literature. Bayesian topic modeling can predict voter behavior, but it can also answer questions about what factors empirically characterize poetic language and how norms for poetic diction have changed over time.
Underwood’s most recent analysis has examined breakdowns between first- and third-person narration in 19th-century fiction. Much has already been written about the rise of the novel and its various conventions, but Underwood’s approach represents an empirical intervention in such questions. With a reasonably stable data set (e.g., all works classified as fiction in the Hathi Trust text corpus) and a sufficiently nuanced statistical approach, a computational approach offers a new way of looking at a question deeply embedded in the study of literary history.
Both speakers have pursued coherence in something profoundly complicated. Their work is also invested in complicating the simplistic. In the study of poetry, a bias prioritizing language on the page often prevents scholars from paying sufficient attention to the complexities of poetic performance. The essence of a piece of literature, in other words, changes depending on how it is delivered. Clement’s work can demonstrate the power of acoustic variance, as well as the importance of continuity from one recording to another.
Underwood’s work, meanwhile, complicates seemingly obvious literary definitions. For example, in attempting to train a computer to recognize first-person narration, he came to question the very essence of what the first-person narrative voice really is. Any first-year English major, of course, can tell you that a first-person narrator says “I” and a third-person narrator says “he” or “she,” but a large-scale project like Underwood’s can call attention to the sheer number of ostensibly third-person narratives that feature an occasional first-person pronoun. Further, how might we classify a text like Wilkie Collins’s “The Moonstone,” a fictional work that presents itself as a collection of documents, some of which are first person? What should we do with Joseph Conrad’s “Lord Jim,” which has entire chapters of one character’s first-person account set off with quotation marks as if it were dialogue?
Overall, “Hacking at Books” reenacted much of what makes the present moment so exciting for digital humanities as a discipline. Of course, the scholarly community today called digital humanities has been an abiding presence in higher education for half a century. What has changed in recent years is very much a question of scale. Computation-based scholarship has become more common, in part because the baseline capabilities of digital tools have increased dramatically.
With the expansion of capability has come a flurry of conversation about what counts as digital humanities, what its core values are and how young scholars should go about developing their credentials. Computational approaches to humanities questions are not new, but digital humanities remains a discipline coming to terms with itself.
One of the greatest strengths of the digital turn in humanities, however, has been a sense of pluralism. This sentiment was echoed repeatedly at the “Hacking at Books” forum.
Martin Mueller, professor emeritus of English and classics at Northwestern University, emphasized that digital tools can help undergraduates see that the world of texts goes far beyond what they might encounter in a standard college-level anthology.
William G. Thomas, John and Catherine Angle Professor in the Humanities and Chair of History at the University of Nebraska-Lincoln, emphasized the way digital tools can help historians represent change over time.
Patrick Juloa, an associate professor of computer science at Duquesne University, has used computational methods to advance scholarship on author attribution, or “the science of inferring characteristics of the author from the char- acteristics of documents.”
Amanda Gailey, assistant professor of English at the University of Nebraska-Lincoln, has used digital text editing as a way to help scholars from any numbers of fields better appreciate how the history of a text (and its variance over time) affects its interpretation.
“Digitization lays bare a kind of textual corruption that has always been with us,” she said. “But people are very nervous to discover it.”
DH is characterized by variety, and one that naturally encourages collaboration. Yet the need for a shared basis of understanding, a shared sense of self, persists.
Recommendations for achieving such coherence have their own predictably pluralistic feel. Juola, fielding a question about important skills for would-be digital humanists, emphasized the importance of core mathematical training. Clement emphasized that computationally informed analysis of texts should return “to the page” as its primary site of interpretation. Mueller talked about a move toward “intermediate big data,” a dataset bigger than one person could interpret without a computational approach but perhaps made up of hundreds of texts rather than millions.
Whether it’s something as fundamental as a common definition of the word data, or a sense of the mutual benefits of discovery and analysis as two coexisting modes of humanities scholarship, some notion of disciplinary coherence is essential. That shared sense of identity may ultimately come back to something Clement articulated as the forum came to a close. As the humanities experiences a “drive toward data,” there must remain, at the core of digital humanities scholarship, a human factor, a place for individual interpretation.