Last year I went to the Carleton Hackathon, which is a 36-hour non-stop coding contest. Our group built a mobile app that displays the daily menus for the Carleton dining halls and cafes and sends notifications to a user when the food that user likes is on the menu.
One of the major challenges we faced was to collect data from the Bon Appetite website. It seemed that it stores the weekly menus in an online database, which we were not able to access. We ended up coding our app so that it will make a url request every day at midnight to get the html page that displays the daily menu, and then scrape the useful data from the messy html page.
I wondered at that time how the online database for a website works, and now through exploring the WordPress backend, I think I got a better understanding of it.
In terms of pros and cons for spreadsheets and relational databases, I summarized below some points that came to my mind immediately:
- user friendly
- easy to read and understand
- might store redundant information: e.g. in the book example in class, when using a flat spreadsheet to store the information, we see that many publishers, authors are repeated multiple times when a publisher published many books and an author wrote many books.
- Pain to modify the data: if some information is wrong, and we want to modify it, then we will need to fix every occurrence of that mistake: e.g. if a publisher’s information has a typo in it, and 10 books in the spreadsheet are published by that publisher, we will need to change the 10 cells that have typos. For a relational database, we only need to go to the publisher table and change one line.
- Little support for reproducible data manipulation: with a relational database, we normally write a script to do the data manipulation. If we figured that the input data is outdated, we can easily import the updated data and run the script again with one click (reproducible data manipulation). However, for a flat spreadsheet, we will need to repeat the data manipulation (which might involve many steps) all over again by hand.
- Relational Databases:
- steep learning curve for new users
- hard to read and understand
- Easy to modify stored data
- Support reproducible data manipulation