Photogrammetry rocks!

Sorry the last third sounds like it was recorded with a toaster. There were some hardware difficulties. And software difficulties. . But nonetheless, here is a tutorial outlining my process of making a 3D Model, including aligning photos by marker and merging chunks by marker.

My Agisoft Tutorial!

 

Data Visualization Using RStudio + ggplot2

                                 

I took a 300 level stats class last term called Data Science, and in that class I learnt about how to use the ggplot2 package in R (a programming language that is great for statistical analysis) to plot various interesting graphs for data visualization. I found this R package extremely useful (I am actually using it to plot various graphs for my Computer Science Senior Capstone project). I want to share with you here the very basics of data visualization using ggplot2 in RStudio (which is an IDE- integrated development environment for R).

 

So first of all, you need to download R and RStudio before you get started. Once you have downloaded them, click the RStudio icon (which appears as the one below) to launch it.

After launching it, you will get something like this:

Next, select File-> New File -> R Markdown, and name the R Markdown file you are going to create. I named it DataVisualizationTutorial.

Then, an R Markdown file will appear that replaces the console, and you will notice that there’s already something in this file:

In our next step, we will insert a script in the R Markdown file to upload some R packages including the ggplot2 package we will be using:

After typing this chunk of script in the R Markdown file right after the {r setup} chunk, you can click the green triangle on the top right corner of the chunk you just typed in to run this chunk of script (to actually install the packages).

Then we want to load in a dataset in .csv form into the RStudio.

This data file serves as an example in this tutorial. When you want to visualize your own data, you just load your own .csv file to RStudio.

We load the .csv file by adding the following script in our R Markdown file (at the bottom of the image below):

Then we click the green triangle run button for this R script chunk to run our newly added script and load the data. After doing so, you will see the Data window on the top right corner of your screen will now have this graphDataFrame variable.

By clicking this graphDataFrame in the Data window, another window tab will pop up next to our R Markdown file. This is a R dataframe that we just created from the .csv file.

Now that the data is loaded as a dataframe in R, we can use ggplot commands to plot the data. Below is the script to plot a point graph with x axis corresponding to the “bin_size” column in our dataframe, y being the rmse, and color/shape of the points corresponds to the eval.dev variable.

If we click the green button, we will see the graph showing up in the bottom right corner of the screen:

We can now change labels for the x,y axes and also add a title to our graph by writing the following script:

Now if we run it again, we will have a graph with better x,y labels and a title!

Next, we can export this visualization using the Export button:

Then, we will get a .png file for this newly created plot!

 

Here are two further resources about using ggplot2 to make data visualizations:

Gephi Tutorial

Last lesson we learned about network analysis. Network analysis focuses on displaying relationship data. Usually, this data is presented in a sort of mind map consisting of what is known as nodes and edges. Nodes are the individual data points and edges are the connections between these nodes. This picture below demonstrates this:

(http://mathworld.wolfram.com/images/eps-gif/GraphNodesEdges_1000.gif)

Network analysis is useful in the sense that it shows us the relationships in possibly large quantities of points. A common example that has often been used is the network analysis of Facebook friends. Below is an example of a facebook friend network analysis created through Gephi taken from this youtube tutorial.

This image demonstrates that this person has friends from distinct groups. In the video, he elaborates saying that the groups were generally divided into different universities that he had attended or just people different geographical locations. This data shows that his facebook is populated with large distinct groups of friends rather than small scatters. It shows the different communities he has been a part of throughout his life and is a much better representation of the people in his friends list than the simple alphabetical list provided by facebook.

This tutorial however will not involve facebook data as facebook has recently placed regulations making it much harder to obtain data regarding friends and their respective connections. If we were to rewind a year, we could use a facebook application called netvizz which would extract our respective facebook friend data in a format usable in Gephi (GDF).

Instead, this tutorial will use a data set concerning the co appearances of characters in the novel Les Miserables. D. E. Knuth. This data set can be obtained in the following link.

After obtaining the data set, proceed to download Gephi. Gephi is a free program and can be downloaded here. After you have downloaded both files, install Gephi and unzip the Les Miserables file. Additionally, you may receive an error when opening Gephi for the first time stating that your version of java is not compatible. This can by simply downloading the newest version of java through a simple google search.

Once you have completed all of this, open up Gephi and you should be greeted
with a screen similar to this:

Once you have opened Gephi, click on file on the top left hand corner and then open. Then select the Les Miserables file you extracted before and you should be prompted with a screen with a few options. Select the following and click okay. You will then be greeted with an interesting looking cluster of nodes and edges (77 nodes and 254 edges to be exact!).

From here you can sort of organize and format the points by first choosing a layout. Personally, I prefer Forceatlas 2 though you can pick and choose which one you like best. This layout will cause the data to disperse and somewhat organize itself into relevant groups with similar connections making your network look something like this (you can zoom in using the middle mouse scroll).

Now although this seems pretty good thus far, numerous improvements can still be made. First of all, currently, all of the nodes are the same size. Depending on the complexity of the data, Gephi can also alter the size of the nodes based on numerous factors. These options can be accessed through this box located on the left hand side of the screen.

Due to the simplicity of the data set, we can only analyze the nodes in respect to their degree which really means their number of connections. We can display this either by changing the color of the nodes putting each one on a spectrum based on its degree. Or, we could change the size of the node which is the second tab with the three circles. For this tutorial lets change the size of the nodes. Press on the icon with the three circles and enter values for the minimum and maximum size of the nodes. Here are the values I chose to enter along with the resulting graph.

As you can see the nodes now greatly differ in size based on the number of connections they have with other nodes. However, the graph itself seems to be a bit squished. This can be solved by choosing the layout option: Expansion. This layout option will cause the nodes to move further apart. After using applying this layout a few times, the graph will look much more spaced out. I decided that it was probably better to change the node sizes a bit and ended up with the following:

Now that the Data points are more spread out and the nodes are different sizes based on their degree we can more clearly see the different groups present in the novel based on their co appearances. In order to see the names associated with each point, press on the T symbol on the button left hand corner as shown below. In addition you can also change the color of the nodes with the color block above the underlined A. I would suggest changing the color in order to more clearly see the names of the characters.

Additionally, one can also use color to define the different groups which are present in the data set. In order to do click on the statistics tab and click run on the modularity function. This function can be found on the right hand side of the screen.

A pop up will then appear, simply click okay and close the graph showing the calculations. The modularity basically calculates how the nodes are placed and their promixity to other nodes thus basically showing which groups are present. In order to display this data, once again return to the appearance tab but this time click on the color option.

Once here, select modularity class and click apply to see your graph be divided into different groups separated by color. Finally, now you can examine the data set and see for yourself the different groups present.

This data shows that not all the characters interact with each other in the novel based on co appearance. Rather, there are different groups of characters which follow different storylines. Still there are some characters which are parts of multiple groups and these can be seen near the middle of the cluster as larger nodes.

Overall this tutorial was aimed at showing the potential of network analysis and how it can be used to analyze and visualize data in a way which separates different groups. This data set is relatively simple and Gephi itself can deal with much more complicated things. This was merely an introduction. Well I hope enjoyed the tutorial!

Java Data Analysis Tutorial

So I’m gonna begin by saying this is in no way the best or most efficient method we’ve come across to analyze data, but I figured it would be fun (at least a little bit) and different than the usual methods. Anyway, on to the good part.

Seeing as my group ran into a few issues in finding ways to extract the important pieces of information from the mess that was our OCR’d, scanned (decade-old) student directories,  I figured I’d try applying what little Computer Science knowledge I’ve picked up during my time here to the issue. This tutorial assumes you have a certain understanding of the Java Programming Language (I promise it isn’t that much) which you can find in a few different ways.

Codecademy has some pretty good, interactive video lessons on the subject, and the Oracle (official Java site) website has its own text tutorials. Alternatively you can look them up on YouTube or take CS 201.

The Actual Tutorial

Before beginning any coding, it’s probably key to know three key elements:

  • What your program should take as  input
  • What your program should do
  • What your program should output

Java is fairly limited in what it can take as input, so I would suggest (if possible) placing your data into .txt format, probably the simplest way to make it accessible by java.

The two most important elements in this exercise will be the use of the  Scanner() and  for  loops since the program will have to iterate through the files provided and compare elements of them to your queries. This is essentially what the computer does when you press ctrl + f, but implementing a program helps us add more delimiters to the search (a double find if you will). While the program can probably be written in one method/function, I’ll walk through the setup with 4 methods, since it is easier to understand when broken up into smaller problems.

Loading methods once they’re in text file format is quite simple. Using the command line input standard for java main functions, you can just type in the file name as a parameter in the command line. Using the File implementation, you’d first create a new File variable with the filename, and subsequently a new scanner with the file. The scanner then allows you to iterate through the file either by line, character or string.

The for loops are somewhat self-explanatory in their function. Using for (an item in your list: the list) the iterator is implemented, done automatically by java for you.

My Example

In this case, my expectation was for the program to take in a text file, count the number of occurrences for a combination of residence hall and class year, and hopefully output another text file with the results. You’d begin by writing your standard java file, (I usually import everything just to make sure) a class definition, a main function the first method, to load the input text file. My approach would be to use a dictionary where the keys are the residence halls and the values are the number of students in the dorm, the list is used to keep track of what residence halls we are searching through. The program should have 3 critical methods; one to load the .txt files, one to count the number of “x” in each hall per year and the display method (or print). None of the methods should have return values since we’ll be using and changing the instance variables we created in the beginning. The final output will either be printed in the command prompt window, or saved to the computer as a new text file.

Basis for the program.

The load function creates the dictionary with the provided halls to search through, and makes all of the values 0 as starting points. Try-catch blocks are to make sure you are loading an actual file, if not it throws an Exception. Because of how this method is coded, your hall text file should be formatted with one hall per line. The while loop continues reading through the file until it reaches the end, adding each line to both the List of hall names and the dictionary.

Example residence hall names file.
public static void loadHalls(File fileName)
{
   Scanner hallsInput = null;
   try
   {
      hallsInput = new Scanner(fileName);
   }
   catch (FileNotFoundException e) {System.out.println("File loading error");}
   while(hallsInput.hasNextLine())
   {
      String hallName = hallsInput.nextLine();
      hallsList.add(hallName);
      halls.put(hallName, 0);
   }
   hallsInput.close();
}

The bulk of the work is done by the count method, which uses the same scanner function to look through the lines of the text you’re searching through. The file name, year of the directory and class year to be searched for are provided through the method call. In this case, lines aren’t added to the dictionary, instead the line is split into words separated by spaces and put into a list which can be access more freely. The following for loop checks if the line contains both the hall name and the predetermined class year. The format in which these two are found also depends on the directory (Goodhue vs. GHUE or ’17 vs. Senior) but these are details that can be dealt with in the main method. If both are in the line, then the value in the dictionary for the respective hall (key) is increased by 1.

public static void count(File fileName, String year, String classYear)
   {
      Scanner input = null;
      try
      {
      input = new Scanner(fileName);
      }
      catch(FileNotFoundException e){System.out.println("File loading error.");}
      String[] valuesInLine;
      while (input.hasNextLine())
      {
         String line = input.nextLine();
         valuesInLine = line.split(" ");
         List valuesInLineList = new ArrayList();
         for (String word: valuesInLine)
         {
            valuesInLineList.add(word);
         }
         for (String hall: hallsList)
         {
            if (valuesInLineList.contains(hall) && valuesInLineList.contains(classYear))
            {
               halls.put(hall, halls.get(hall)+1);
            }
         }
      }
      input.close();
   }

Finally, the display method which gives the end result. This can be done in one of two ways; the easy route being to have the program print out the final values in the command prompt window or the slightly more complex, to create a text file with those values and save it to the computer. While the former takes less time to code, the latter might save more time in the long run so I’ll talk about that one. This method takes in the class standing (defined in the main) and the year of the directory, used simply to name the file where the data is stored. These details are mostly my personal preference and could be done without. The method creates a new PrintWriter instance which takes two parameters; the name of the file and (essentially) the language/the characters to be used in it. Next, the halls and their respective values are put in to the file one by one, using the same type of for loop as the count methods, and throwing an exception if there is an issue with the instantiation of the PrintWriter function. As to specifics of the PrintWriter methods, those can be read on the PrintWriter javadoc.

public static void printTextFile(String standing, String year)
   {      
      try
      {
      PrintWriter writer = new PrintWriter(standing + "PerHall" + year, "UTF-8");
      for (String hall: hallsList)
      {
         writer.println(hall + ": " + halls.get(hall));
      }
      writer.close();
      }
      catch(IOException e)
      {
         System.out.println("File load failure");
      }

   }

With all the methods done, the main method can be synthesized in order to bring together the pieces. Since all the methods return void, they all need to be called in the main. And so the program can be as fluid as possible, the filenames and years can be input as parameters in the command line call, since java requires the program name be called anyway. Separated by a space, anything following the name of the program will be considered as another String in an array, with the first string after the program name being the 0th element of the array. Therefore the main standardizes the input, expecting the user to enter the name of the file to be analyzed, the file with hall names, the year of the directory, and the class year to search for (these details can be changed with no severe impact on the program).

Error message when providing no parameters in command line.
public static void main(String args[])
   {
      String file1= null, file2 = null, year=null, classyear = null;
      try 
      {
         file1 = args[0];
         file2 = args[1];
         year = args[2];
         classyear = args[3];
      }
      catch (IndexOutOfBoundsException e) {System.out.println("Input file to analyze, file with halls, year and classyear");}
      
      String standing;
      int FSJS = Integer.parseInt(args[3])-Integer.parseInt(args[2].substring(2, 4));
      
      if (FSJS==3) standing = "Freshmen";
      else if (FSJS==2) standing = "Sophomores";
      else if (FSJS==1) standing = "Juniors";
      else standing = "Seniors";
      
      File hallNameFile = new File(file2);
      loadHalls(hallNameFile);
      File text = new File(file1);
      count(text,year,classyear);
      printTextFile(standing, year);
      print(standing, year);
   }

Afterwards, each of the methods must be called, with the specified parameters, and bingo! you have results. Because the code is made as general as possible (that being restricted by my knowledge) it is pretty fluid and can be changed with minimal effort to take into account different formats of the directories.

Printed results.
Results as file, automatically saved to folder.

ArcGIS Online – Organizing and Expressing Data Tutorial

While I was making my Japanese Mascot Map (which you can see here!) a lot of my time was spent experimenting with how to organize my data and which data ArcGIS Online could pull directly from the resulting spreadsheets. I explained some of my process HERE but I want to explain in greater depth how I organized, input, and expressed my data.

I used GoogleSheets to store and organize my data which worked well. However, ArcGIS Online can only upload spreadsheet files in .csv or .txt format. GoogleSheets lets you download individual sheets in .csv format, but that means if you edit your data than you have to delete the layer made with the unedited file, download the edited file to your computer, and render the edited file to your map as a new layer. This takes a lot of time. I’m hoping that with this tutorial the time I spend editing/downloading/uploading could save some time with your own project

~~~

What type of location data do you want to map? This will affect how many spreadsheets you make and how you express location within them. My rule of thumb is make a new datasheet for a new layer if location of places/regions will be rendered to the map by  different parameters. In other words, locations denoted by City, State vs. Street Address vs. Latitude/Longitude should be organized in their own spreadsheets. This is helpful for a couple of reasons:

  1. Confuses ArcGIS less – ArcGIS asks you when you import a layer which columns it should pull location data. With multiple datasheets/layers, you can choose which data are rendered in which manner without having to sacrifice accuracy, arbitrarily pick drop points for larger regions, or confuse the program with void entries.
  2. As data sets get larger, problems are easier to find – A number of my spellings didn’t match those ArcGIS used and some of my photo links were broken. It was much easier to delete one layer, find the error, fix it, and re-upload a smaller spreadsheet than it was to do the same for a spreadsheet with 50+ entries
  3. Easier to stylize different categories of data – Not to mention you only have to re-stylize some of your data if you have to fix a layer. Whenever you (re)upload a layer, the points are set as red dots by default.

In my case, I wanted to map mascots from Prefectures, Cities, Buildings, Organizations, and Companies. I used two different methods for designating location so I made two map layers from two datasheets.

Prefectures and Cities  I mapped as points denoted by two columns of data: Prefecture, City (ie. Hokkaido, Hakodatte).  Because prefectures aren’t associated with any one city, I used the same format with the capital as my city marker (ie. Kobe, Hyogo). It would be the same as making one column each for State and City if you were mapping in the United States.

NOTE!: If you want to make polygons for prefectures/states and not points, I would suggest making a separate spreadsheet for them. I did not use polygons in my original map, but if you include a city than ArcGIS will pin that city instead of denoting a region.

Buildings, Organizations, and Companies I mapped using Latitude and Longitude. These are things with definite locations, usually denoted by street addresses. However, street addresses are often very different across countries, and that’s before differing spelling conventions for foreign languages. Even in familiar areas, points sometimes don’t get dropped in the right place. The easiest way to get an accurate point the first time is to use its latitude and longitude.

An easy way to find it is to use GoogleMaps. To do so:

1.) Search Location

2.) Right click on the point and choose “What’s here?”

A small bar should show up at the bottom of the screen. See the numbers at the very bottom?

The first number is the Latitude. The second number is the Longitude.

In my spreadsheet, I made 4 columns for location data: Prefecture, City, Latitude, Longitude. I included the Prefecture and the City because I wanted to display this information at each point, but when I uploaded the layer the program used the latitude and longitude to drop the pins. A window may pop up asking you to specify which data you’d like to use for locations. In that case, pick your preference.

NOTE!: ArcGIS sometimes gives you the option to limit your expressed dataset to a single country, in my case, Japan. If your data set reaches across countries, include a Country column in each spreadsheet. So, in the first example, the location of a city would be expressed in 3 columns: Japan, Hokkaido, Hakodatte. The location of a building would be expressed in 5: Japan, Kyoto (prefecture), Kyoto (city), 34.987756, 135.759333.

You should include a column for any categories you want to distinguish stylistically.

In my spreadsheets I added a Mascot_Type column. This I kept close to my other non-location data: Name and Name of Building/Company/Organization.

From that data, I could set the layer to display points based on what type mascot the point represents. When you upload a new layer, a menu called “Change Style” will appear on the left. In the drop down menu under “Choose an attribute to show,” pick which column you put your categories.

You can then change how each category appears on the map by changing the appearance of the point. Click on one of the sample points in the “Change Style” menu. A window will pop up with point style options for the category you selected. When you are done, press “OK” both in the window and the “Change Style” menu.

If you want an image to pop up when you click on a point ArcGIS Online can pull images straight from your spreadsheets. In a column titled “Image” or “URL”, add URL links to images you want to use for each location. Here is the image of Tawawachan I used and the corresponding link highlighted in yellow. Because URLs are long, I recommend making this the last column in your datasheet.

To add these images to your pop-ups, make sure you pressed “Ok” on any open menus and click the “…” next to the layer you want to add images to. From that menu, click “Configure Pop-up.”

The menu will open on the left hand side. Here you can change which data is expressed where. To add images, go to  “Pop-up Media” and press “Add.” From that drop down menu, select “Image.”

A window called “Configure Image” will appear. Here you can add titles, captions, and hyperlinks to your images. To add the images from your spreadsheet, go down to “URL” and press the small boxed cross to the right. Scroll down and select the name of the column where you put your image URLs. I called mine “Image.”

Press “Ok” in both the “Configure Image” window and “Configure Pop-up” menu. Once you do, an image should appear in your pop-ups when you click on a point. If you don’t see it immediately, scroll down or enlarge the pop-up window as they are quite small. If you still don’t see an image, the URL in your spreadsheet may be broken.

NOTE!: Make sure to double check the links aren’t broken while you’re still working in your spreadsheet. When you render your data onto the map, the program won’t tell you if it can’t find images. It’s better to check before you render your data instead of after you’ve spent time stylizing your points because you will have to re-upload the sheet, setting the points back to default red circles.