If you asked a scholar what digital humanities is, they’d likely reply with a cautious, it depends. Dredge up any article on digital humanities from the Internet, and you’ll invariably get a Jekyll and Hyde definition. Yes, the author will take pains to explain what digital humanities is, then cheerfully tell you what it’s not. Go on. Stop here for a moment and do a brief search on Google. You’ll see why a lot of people still prefer to say, it depends.
But that’s neither here nor there. Digital Humanities is a smorgasbord of two things: digital and humanities. A more technical definition would be this: digital humanities is an area of research, teaching, and creation concerned with the intersection of computing and the disciplines of the humanities1. Don’t let the big words scare you. If you’re a computer geek, you’d instantly recognise a mashup when you see one. Not mashed potatoes. Mashup.
Long, long time ago, in a galaxy far, far away, web applications used to exist in their own universes. I used one application for emails, another for storing passwords, yet another for shelving mundane notes. It was a simpler time, when web developers dug their own wells, leaped gleefully into it, then refused to come out. In short, web applications didn’t bother talking to each other. No one thought about making online apps like Google Calendar, Evernote, or Yahoo Mail do more than what they were originally built for. Then, Yahoo Pipes came along. If you’ve never heard of Yahoo Pipes before, it’s okay. But Yahoo Pipes was a real thing. It was clunky. It was ugly. But it freed users, or geek-empowered ones, from the shackles of app mediocrity. With Pipes, we were able to stitch connections between different web applications. Long before Google Reader was born, geeks were already using Pipes to merge and manage website feeds. Today, of course, APIs are free and abundant. I think the now defunct Pipes taught us huge things about digital convergence.
So mashups aren’t new. This is important because digital humanities is a mashup between what computers do, and well, what our teachers either called moral education, social studies or history during our Primary and Secondary school years. It was Roberta Busa, a Jesuit priest, who first laid the foundations of digital humanities. He employed computing power to analyse the works of Saint Thomas Aquinas. By drawing out and grouping different inflected forms of words, Roberta Busa successfully created a definitive index of huge and important compendiums of religious history. Roberta Busa did not invent a revolutionary iPhone app. What he did was more far-reaching. He made the work of other researchers easier. By creating new ways of looking at ancient texts, he empowered other researchers to extract unique insights not only about Thomas Aquinas, but also the environment he lived in, the academic life he participated in, and the sources that were available to him at that time.
Digital humanities projects almost always begin with a data source, involve some kind of textual analysis before delivering on an outcome. Because computing is such a large part of digital humanities, it’s no surprise that the digital humanities workflow closely resembles the grind that database administrators perform day-in, day-out. The data management workflow of Extract, Transform and Load (ETL) is the foundation of all data-related activities in the computing world.
In the first phase of the ETL process, a source is identified, and data extracted from it. Because data sources not only appear in a wide variety of formats but are also placed under different security protocols, part of the extraction phase is discovering and inventing methods to pull relevant data out. The most common data sources are flat files, SQL databases, CSV, RSS and XML. With universal programming techniques like Document Object Model (DOM) being established, even static HTML pages can become viable data sources.
In the second part of the ETL process, data is transformed, or to put it simply, cleaned up and organised in a meaningful, consistent way. In a digital mapping project, for example, I might be pulling information from two different websites. Website A uses the traditional longitude-latitude way of plotting coordinates, while Website B uses a projected coordinate system such as Universal Transverse Mercator. Before I begin any kind of analysis, I’d have to make sure the information from Websites A and B express only one kind of coordinates. The attempt to standardise information between multiple sources is an important, and a time-honoured transformation task.
In the last part of the ETL process, data is loaded into its final destination, which might be as simple as an Excel spreadsheet, a full-fledged data warehouse or even a static HTML document. Because these destinations are increasingly likely to be found outside of the company’s firewall as external data 2, they may eventually become sources of data themselves, and the whole ETL process is repeated for someone else.
While the ETL process is fundamental to data management, it’s only adequate if our data management activities finish up at parking the information somewhere safe and secure. Meanwhile, the modern era has moved on. It’s become more complex, demanding quicker insights. The ETL process requires an additional layer to make it truly useful, and more importantly, relevant for the digital age. For ETL, the additional layer is business analytics, which aims to create knowledge from otherwise mute data. Such information is critical in making decisions. An example of of how static data gets turned into knowledge is shown in this simple diagram 3:
The challenges that the ETL process face are perhaps similar to the ones faced by traditional methods of humanities-based learning. While both ETL and humanities are foundational, and thus cannot be ignored, they need another technological layer to inspire a broader understanding of topics that are flocking to new virtual platforms. It’s said that human beings are by nature, social animals. If that’s true, then mashups like digital humanities are the natural response to a very human desire for collaboration.