Last lesson we learned about network analysis. Network analysis focuses on displaying relationship data. Usually, this data is presented in a sort of mind map consisting of what is known as nodes and edges. Nodes are the individual data points and edges are the connections between these nodes. This picture below demonstrates this:
(http://mathworld.wolfram.com/images/eps-gif/GraphNodesEdges_1000.gif)
Network analysis is useful in the sense that it shows us the relationships in possibly large quantities of points. A common example that has often been used is the network analysis of Facebook friends. Below is an example of a facebook friend network analysis created through Gephi taken from this youtube tutorial.
This image demonstrates that this person has friends from distinct groups. In the video, he elaborates saying that the groups were generally divided into different universities that he had attended or just people different geographical locations. This data shows that his facebook is populated with large distinct groups of friends rather than small scatters. It shows the different communities he has been a part of throughout his life and is a much better representation of the people in his friends list than the simple alphabetical list provided by facebook.
This tutorial however will not involve facebook data as facebook has recently placed regulations making it much harder to obtain data regarding friends and their respective connections. If we were to rewind a year, we could use a facebook application called netvizz which would extract our respective facebook friend data in a format usable in Gephi (GDF).
Instead, this tutorial will use a data set concerning the co appearances of characters in the novel Les Miserables. D. E. Knuth. This data set can be obtained in the following link.
After obtaining the data set, proceed to download Gephi. Gephi is a free program and can be downloaded here. After you have downloaded both files, install Gephi and unzip the Les Miserables file. Additionally, you may receive an error when opening Gephi for the first time stating that your version of java is not compatible. This can by simply downloading the newest version of java through a simple google search.
Once you have completed all of this, open up Gephi and you should be greeted
with a screen similar to this:
Once you have opened Gephi, click on file on the top left hand corner and then open. Then select the Les Miserables file you extracted before and you should be prompted with a screen with a few options. Select the following and click okay. You will then be greeted with an interesting looking cluster of nodes and edges (77 nodes and 254 edges to be exact!).
From here you can sort of organize and format the points by first choosing a layout. Personally, I prefer Forceatlas 2 though you can pick and choose which one you like best. This layout will cause the data to disperse and somewhat organize itself into relevant groups with similar connections making your network look something like this (you can zoom in using the middle mouse scroll).
Now although this seems pretty good thus far, numerous improvements can still be made. First of all, currently, all of the nodes are the same size. Depending on the complexity of the data, Gephi can also alter the size of the nodes based on numerous factors. These options can be accessed through this box located on the left hand side of the screen.
Due to the simplicity of the data set, we can only analyze the nodes in respect to their degree which really means their number of connections. We can display this either by changing the color of the nodes putting each one on a spectrum based on its degree. Or, we could change the size of the node which is the second tab with the three circles. For this tutorial lets change the size of the nodes. Press on the icon with the three circles and enter values for the minimum and maximum size of the nodes. Here are the values I chose to enter along with the resulting graph.
As you can see the nodes now greatly differ in size based on the number of connections they have with other nodes. However, the graph itself seems to be a bit squished. This can be solved by choosing the layout option: Expansion. This layout option will cause the nodes to move further apart. After using applying this layout a few times, the graph will look much more spaced out. I decided that it was probably better to change the node sizes a bit and ended up with the following:
Now that the Data points are more spread out and the nodes are different sizes based on their degree we can more clearly see the different groups present in the novel based on their co appearances. In order to see the names associated with each point, press on the T symbol on the button left hand corner as shown below. In addition you can also change the color of the nodes with the color block above the underlined A. I would suggest changing the color in order to more clearly see the names of the characters.
Additionally, one can also use color to define the different groups which are present in the data set. In order to do click on the statistics tab and click run on the modularity function. This function can be found on the right hand side of the screen.
A pop up will then appear, simply click okay and close the graph showing the calculations. The modularity basically calculates how the nodes are placed and their promixity to other nodes thus basically showing which groups are present. In order to display this data, once again return to the appearance tab but this time click on the color option.
Once here, select modularity class and click apply to see your graph be divided into different groups separated by color. Finally, now you can examine the data set and see for yourself the different groups present.
This data shows that not all the characters interact with each other in the novel based on co appearance. Rather, there are different groups of characters which follow different storylines. Still there are some characters which are parts of multiple groups and these can be seen near the middle of the cluster as larger nodes.
Overall this tutorial was aimed at showing the potential of network analysis and how it can be used to analyze and visualize data in a way which separates different groups. This data set is relatively simple and Gephi itself can deal with much more complicated things. This was merely an introduction. Well I hope enjoyed the tutorial!
Some of these look really pretty. I think the first hurtle to big data analysis is always making it consumable for your audience. An attractive graph, which clearly Gephi is capable of making, goes a long way to reaching that goal.
Hieu,
This is a very clear introduction to Gephi using a humanities dataset, and your introductory discussion of network analysis and the Facebook example really got at why this type of analysis and visualization is useful for humanists. Your screenshots and descriptions were very easy to follow. My main suggestion would be to use a different dataset, since this is the default tutorial example for Gephi — what other data might lend itself to the same techniques?