When processing data, the foremost and maybe even the central choice one needs to make for data entry and management is the choice of utilization of the flat database vs. the relational database. Both of these options have pros and cons associated with them, which in turn grow or lessen with respect to how you intend to use your data.
However, first let’s provide some definitions: Flat databases are built from a single file, a single table. Relational databases are built from multiple files or tables and relate to each other via special key fields – hence the term, “relational.”
Flat Databases:
Pros:
Extremely easy to start using: Anyone with access to a computer is likely to have some variety of spreadsheet programs, be it Excel for Windows users or Numbers for Mac OS. Even if both of these options are unavailable, Google Sheets is available free for everyone with a Google account. All of these options can be almost immediately opened upon starting up one’s computer, allowing data entry to start promptly.
User-Friendly: In all the options listed above, as soon as the program has started a table is easily constructed with data entry being self-explanatory. New headers and columns can be easily added and manipulated in these programs, making the initial setup a breeze. For relatively little work you have a table that is easily readable and cohesive.
Cons:
Bad handling large amounts of data: Most of the problems with flat data emerge when the spreadsheet grows larger. I estimate about 200+ entries would be the cutoff point for when one should begin thinking about switching to another method, (can’t imagine that it would work above 1000 entries) – see why below.
Starts looking sloppy: If you go flat, the more data you enter into the spreadsheet, the more variation and duplicity you are likely to see in this data. Therefore, in your spreadsheet, you are going to see columns of different lengths and widths. Various categories for each data point also complicates spreadsheet filing. Sloppiness may lead to increased error rates.
Rigid: Flat data and spreadsheets often are quite rigid and time-consuming to manipulate; especially the more information entered. If a mistake is found later in the data entry processes or if one wishes to add or remove a category, complications can emerge.
Relational Databases:
Pros:
Over the Internet: Most Relational Databases will normally at minimum have internet compatibility or exist as a web program. It is also possible to establish permissions rights for various parts of the data, such as viewing only or editing only of particular locations.
This option allows for easy collaboration or sharing of data.
Logical categories/subcategories: There is a large amount of compartmentalization with each value existing under a new header, which itself is in under another larger parent category, keeping all the information organized.
Easy manipulation and search-ability: Once all the data has been successfully entered into the relational database, manipulation becomes a breeze. Because almost every value is separately compartmentalized, it is extremely easy to find the value or group of values one is searching for and then to move or reposition them in a new location.
Cons:
Harder to get working and somewhat of a general pain: When the program is first entered, it’s a pain to start designing all the separate headers and categories under which to store the information. I attempted to recreate the system described in Stephen Ramsey’s article. However initially learning to use the program proved difficult not to mention that if one were to do it from scratch, a clear picture of the goals is necessary
Conclusive Thoughts: When choosing a database format, the most important thing to think about is its purpose. If you are collecting large amounts of data under various categories and groupings, hoping to then compile all the information, then a flat spreadsheet is not the way to go. Relational database utilization might provide you the “bang for the buck.” However, if the project is a quick analysis of 200-300 points with under 5 groupings, then flat data will probably save you a lot of time.
Other Issues to keep an eye out for:
Plan – Know what you want. Know where you’re going: It is always good to have a rough outline planned for how much data you are going to collect and your plans for processing the information before you get started. This planning can help you avoid problems down the road, such as selecting the wrong database format.
What are the Categories?: Know the categories and groups under which you sort the information before you begin entering it. This preparation will allow for less confusion and avoid redundancy.
Consistency is Key: Be consistent in how your data is entered and in your project as a whole. The more consistent you can be, the cleaner the final product; the nicer the information you can draw from it.
Really clear explanation of the two databases!
The share-ability was something I hadn’t considered before but it makes a lot of sense. I could see DH projects (even ones that aren’t that big) choosing relational databases for its ease of collaboration and its potential to grow as the data level increases.