3A: Big Data and Digital Humanities

Zooniverse: People powered research

In-Class Project

Last week we tore furiously through the front-end of web development — HTML, CSS and JavaScript.  But we are not learning coding here, we are doing Digital Humanities. Today let’s look at how we can put those skills into practice for humanities data to tell a story.

One of the longest-running types of applications is an interactive timeline.  We will do a class project to convert a flat timeline into an interactive one using an easy to use application, but there are other tools that require more coding and show you how these work.  Take a look at the SIMILE timeline tool for an example of one of those.

Today, we are going to use the beautiful TimelineJS framework to give the college archive’s Timeline of Carleton History a web 2.0 overhaul.  This timeline was created in 1991 for the college’s 125th anniversary celebration, and while the content is still great, the presentation could use an update as we near the 150th anniversary in 2016.  Archivist Tom Lamb has given us permission to use the timeline as our dataset, and we are going to build a new, dynamic JSified instance of it as our first group project.

The timeline is broken into five date ranges between 1866 and 2002.  I have set up the first one as an example, and it is each group’s task to replicate this work for your own date range by doing the following:

  1. Go to the TimelineJS page and follow the 4 step instructions to Make a New Timeline
    • In Step 2, whoever downloads the Google template should share it with the other group members so that all may editScreen Shot 2015-01-15 at 12.17.03 PM
    • In Step 4, copy the embed code and paste it into a new jsBin.  This is where you will work on your own date ranges for now, and we will combine them all together next week.
  2. Once you are setup, delete the template data and move over your group’s data from the Timeline of Carleton History. The dates and captions should come over with an easy copy/paste, but then you’ll probably need to finesse the rest of the data a bit.
    • You might need to change the number format of the Date columns to a different date display or even Number > Plain Text to get them to display and order properly
    • All entries should have a brief headline that summarizes the text on that date’s card, which you’ll need to write
    • Where there are images on your page, click them to bring up the full resolution version in the fancybox viewer, then use your DevTools knowledge to find the image URL to paste in the appropriate Media column in the sheet
    • Where there are no images, see if you can insert a Google Map if appropriate.  Or search the Carleton Archives Digital Collections to find other appropriate photos or scanned documents
      • NB: All Media should have a Media Credit, which will usually be “Archives, Carleton College Gould Library
    • Finally, explore what happens to the timeline when you use tags to categorize events.  I used buildings and people as two basic categories on my example

If you’re in doubt or stuck, post a comment, ask a question on your blog or (as a last resort) send an email and we’ll try to help each other out.


Big Data

Big Data generally refers to extremely large datasets that require demanding computational analysis to reveal patterns and trends, such as the map below generated from the data in millions of Twitter posts. We are producing reams of this data in the 21st century, but how do we analyze it from a humanities perspective?  How do we perform these sorts of analyses if we are interested in periods before regular digital record keeping?

World travel and communications recorded on Twitter
World travel and communications recorded on Twitter

Enter digitization and citizen science initiatives.  One of the major trends in Digital Humanities work is the digitization of old records or print books that are then made searchable and available online for analysis.  Google Books is the most well-known project of this type, and we also read Tim Hitchcock’s article about his pioneering historical projects in this arena, e.g. the Old Bailey Online and London Lives.  These projects took years to build and required the dedicated paid labor of a team of scholars and professionals.  But there’s another model out there that relies on the unpaid labor of thousands of non-expert volunteers who collectively are able to do this work faster and more accurately than our current computers: crowdsourcing.

Zooniverse is a crowdsourcing initiative that bills itself as “the world’s largest and most popular platform for people-powered research.”  This platform takes advantage of the fact that people can distinguish detailed differences between images that regularly trip up computers, and empowers non-experts to contribute to serious research by reducing complex problems to relatively straightforward decisions:

  • is this galaxy a spiral or an ellipse?
  • is this a lion or a zebra?
  • is this the Greek letter tau or epsilon?

The project that Evan and his team just launched, Measuring the Anzacs, seeks to study demographic and health trends in the early 20th century by transcribing 4.5 million pages worth of service records from the Australian and New Zealand Army Corps during WWI.  This data would take countless years to process with a small team of researchers, but as Evan told us, they hope to speed up this process tremendously by taking advantage of the fact that there are lots of people who have access to a computer, speak English and can read handwriting.

Tim Hitchcock ended his piece with a conundrum:

How to turn big data in to good history?  How do we preserve the democratic and accessible character of the web, while using the tools of a technocratic science model in which popular engagement is generally an afterthought rather than the point.

The Zooniverse model has taken a major step towards resolving this tension and turning formerly restricted research practices into consciously public digital humanities work.

 


Assignment

Explore the Measuring the Anzacs project and work your way through at least one document, marking and transcribing the text.

Screen Shot 2015-09-30 at 6.24.24 AM

When you’re done, post a brief comment below giving some feedback on the process.  Were the instructions easy to follow?  Was the text easy to transcribe?  Did you feel like you were making a real contribution to the project?  What did you get out of the project, from a humanities perspective?  Did you come away with a greater understanding of either the research process or the lived experience of the individual people whose records you were working with?