I took a 300 level stats class last term called Data Science, and in that class I learnt about how to use the ggplot2 package in R (a programming language that is great for statistical analysis) to plot various interesting graphs for data visualization. I found this R package extremely useful (I am actually using it to plot various graphs for my Computer Science Senior Capstone project). I want to share with you here the very basics of data visualization using ggplot2 in RStudio (which is an IDE- integrated development environment for R).
So first of all, you need to download R and RStudio before you get started. Once you have downloaded them, click the RStudio icon (which appears as the one below) to launch it.
After launching it, you will get something like this:
Next, select File-> New File -> R Markdown, and name the R Markdown file you are going to create. I named it DataVisualizationTutorial.
Then, an R Markdown file will appear that replaces the console, and you will notice that there’s already something in this file:
In our next step, we will insert a script in the R Markdown file to upload some R packages including the ggplot2 package we will be using:
After typing this chunk of script in the R Markdown file right after the {r setup} chunk, you can click the green triangle on the top right corner of the chunk you just typed in to run this chunk of script (to actually install the packages).
Then we want to load in a dataset in .csv form into the RStudio.
This data file serves as an example in this tutorial. When you want to visualize your own data, you just load your own .csv file to RStudio.
We load the .csv file by adding the following script in our R Markdown file (at the bottom of the image below):
Then we click the green triangle run button for this R script chunk to run our newly added script and load the data. After doing so, you will see the Data window on the top right corner of your screen will now have this graphDataFrame variable.
By clicking this graphDataFrame in the Data window, another window tab will pop up next to our R Markdown file. This is a R dataframe that we just created from the .csv file.
Now that the data is loaded as a dataframe in R, we can use ggplot commands to plot the data. Below is the script to plot a point graph with x axis corresponding to the “bin_size” column in our dataframe, y being the rmse, and color/shape of the points corresponds to the eval.dev variable.
If we click the green button, we will see the graph showing up in the bottom right corner of the screen:
We can now change labels for the x,y axes and also add a title to our graph by writing the following script:
Now if we run it again, we will have a graph with better x,y labels and a title!
Next, we can export this visualization using the Export button:
Then, we will get a .png file for this newly created plot!
Here are two further resources about using ggplot2 to make data visualizations:
I have heard a lot about R, but I’d never really had a good reason to dive in and really get my hands dirty with it. Even just looking at your pictures makes it look pretty freakin awesome though. Data visualization can make dry topics surprisingly interesting. Neat tutorial!
Shatian,
This is a very clear and easy to follow tutorial and your example shows the power of using a package like ggplot2 as opposed to all the manual button clicking you’d have to do in a GUI-based system like Excel or Google Sheets. A little more discussion of how this tool might be used for digital humanities projects in particular would have been helpful, but this was still an excellent tutorial. Well done.