Our group – Will Richards, Tom Choi, and David Coleman – would like to create an interactive map of the world’s recorded shipwrecks from AD 1-1500. We hope to include information such as the date the ship was wrecked, the date it was discovered, the location of each shipwreck, and the contents of each ship. These parameters might be limited by the robustness of our data, but we will discover soon if that is the case. Our data come from the Digital Atlas of Roman and Medieval Civilizations Scholarly Data Series in the form of the Summary Geodatabase of Shipwrecks AD 1-1500, current as of 2008. There are ~1000 observations in the database, each with most of the variables listed above. Certain variables look more reliably reported than others, but after some data restrictions and cleaning, each observation will be fit for analysis.
That analysis will include details of cargo change over time, geographic regions with greater than average shipwrecks, and periods of time in history with greater numbers of recorded shipwrecks. It is important to note that a crucial limitation of using recorded historical data like this is that we have neither a random sample nor the whole population of shipwrecks. We are working with the shipwrecks whose records and locations were recorded well enough that they were registered in this database, and there are very likely clear biases in the data as a result of that. There will almost certainly be a bias for ships coming from civilizations with better record-keeping (such as the Roman Empire, for example). This bias does not invalidate the inferences we will make with our data, it merely restricts the scope of those inferences. Any claims we make then be about ships similar to those in our dataset – including whatever trends we may end up finding there.
Before we do anything with our data, we first need to clean it to allow us to better categorize it by its cargo. The cargo column cells have many repeated single words (such as amphoras, silver, swords, ceramic, etc). Other fields have long text values describing the cargo in prose. We will decide on some number (to be determined) of discrete categories to group our cargo in, and classify text values as an enumerated category. In this way, we will be able to store our data as a relational database.
Once our data is cleaned and categorized, we can decide on which aspects of the data do we want to be able to filter or sort by. We can then begin integration between our web-map and MySQL database. Once our web map is complete, we can begin building other aspects of our website, such as a ‘featured wrecks’ page, ‘about us’ page, etc.
As far as a timeline of deliverables, our plan is:
- By the end of Week 6: Have our data cleaned and uploaded to a MySQL database
- By the end of Week 7: Have the data connected to the map, with the interface existing, if not polished
- By the end of Week 8: Have performed analysis on our data and begun to incorporate that analysis into our web app in the form of graphics and statistics
- By the end of Week 9: Have finished both researching and incorporating the featured shipwrecks
- By the end of Week 10: Have the entire project complete and live
This is the link to the project.
This interactive ‘ikiMap” gives a good idea of the project we are planning to attempt. It is an interactive map of the sunken ships of the Great Lakes of North America.
Henceforth, our group tag will be “DTW”
One Reply to “Post 5 – Group and Project”
Hey Team DTW,
I really like how you are breaking out of the Carleton box with this project and planning to reuse an existing dataset to create something new and engaging. You seem well aware of the potential problems with these data, and I encourage you to read the paper associated with the data and make sure you are aware of the usage rights before preceding.
As for cleaning the data, there are a couple of tools that might make this go quicker than just using Excel formulae:
Breve is a tool for visualizing gaps and errors in tabular data and quickly cleaning it through a web interface
OpenRefine is a great tool for processing and cleaning data, and can offer you many more benefits. We’ll discuss this a bit more as a class, but Miriam Posner has a great list of resources to get started.
As you move forward we can talk more about mapping methods — some integrate well with MySQL, but ArcGIS online doesn’t, for instance — but this is a great start.