Team ResLife Project

Hello!

Finals week is upon us and with it the completion of our project! To recap, we are team Reslife. Our goal was to present the spread of students around campus based on their class year. We did this by using both cityengine and arcgis maps. In city engine we created a 3D model of Carleton’s campus and colored the faces of the residence halls based on the percentage of each student year that lived there. In arcgis maps we did the same thing but with circles and colors. To see all of this for yourself visit our website here.

Timeline

Forgot to link to our group timeline from way back when! Here it is. This exercise was pretty fun, although we mistakenly did the same time period as another group which left some holes in the final project. Messing around with the intricacies of the media data was a bit confusing since many of the columns had similar descriptions.
Next time I’d hope to find some more pictures relating to the events we were talking about, which would make the timeline a bit more colorful and less simple.

Java Data Analysis Tutorial

So I’m gonna begin by saying this is in no way the best or most efficient method we’ve come across to analyze data, but I figured it would be fun (at least a little bit) and different than the usual methods. Anyway, on to the good part.

Seeing as my group ran into a few issues in finding ways to extract the important pieces of information from the mess that was our OCR’d, scanned (decade-old) student directories,  I figured I’d try applying what little Computer Science knowledge I’ve picked up during my time here to the issue. This tutorial assumes you have a certain understanding of the Java Programming Language (I promise it isn’t that much) which you can find in a few different ways.

Codecademy has some pretty good, interactive video lessons on the subject, and the Oracle (official Java site) website has its own text tutorials. Alternatively you can look them up on YouTube or take CS 201.

The Actual Tutorial

Before beginning any coding, it’s probably key to know three key elements:

  • What your program should take as  input
  • What your program should do
  • What your program should output

Java is fairly limited in what it can take as input, so I would suggest (if possible) placing your data into .txt format, probably the simplest way to make it accessible by java.

The two most important elements in this exercise will be the use of the  Scanner() and  for  loops since the program will have to iterate through the files provided and compare elements of them to your queries. This is essentially what the computer does when you press ctrl + f, but implementing a program helps us add more delimiters to the search (a double find if you will). While the program can probably be written in one method/function, I’ll walk through the setup with 4 methods, since it is easier to understand when broken up into smaller problems.

Loading methods once they’re in text file format is quite simple. Using the command line input standard for java main functions, you can just type in the file name as a parameter in the command line. Using the File implementation, you’d first create a new File variable with the filename, and subsequently a new scanner with the file. The scanner then allows you to iterate through the file either by line, character or string.

The for loops are somewhat self-explanatory in their function. Using for (an item in your list: the list) the iterator is implemented, done automatically by java for you.

My Example

In this case, my expectation was for the program to take in a text file, count the number of occurrences for a combination of residence hall and class year, and hopefully output another text file with the results. You’d begin by writing your standard java file, (I usually import everything just to make sure) a class definition, a main function the first method, to load the input text file. My approach would be to use a dictionary where the keys are the residence halls and the values are the number of students in the dorm, the list is used to keep track of what residence halls we are searching through. The program should have 3 critical methods; one to load the .txt files, one to count the number of “x” in each hall per year and the display method (or print). None of the methods should have return values since we’ll be using and changing the instance variables we created in the beginning. The final output will either be printed in the command prompt window, or saved to the computer as a new text file.

Basis for the program.

The load function creates the dictionary with the provided halls to search through, and makes all of the values 0 as starting points. Try-catch blocks are to make sure you are loading an actual file, if not it throws an Exception. Because of how this method is coded, your hall text file should be formatted with one hall per line. The while loop continues reading through the file until it reaches the end, adding each line to both the List of hall names and the dictionary.

Example residence hall names file.
public static void loadHalls(File fileName)
{
   Scanner hallsInput = null;
   try
   {
      hallsInput = new Scanner(fileName);
   }
   catch (FileNotFoundException e) {System.out.println("File loading error");}
   while(hallsInput.hasNextLine())
   {
      String hallName = hallsInput.nextLine();
      hallsList.add(hallName);
      halls.put(hallName, 0);
   }
   hallsInput.close();
}

The bulk of the work is done by the count method, which uses the same scanner function to look through the lines of the text you’re searching through. The file name, year of the directory and class year to be searched for are provided through the method call. In this case, lines aren’t added to the dictionary, instead the line is split into words separated by spaces and put into a list which can be access more freely. The following for loop checks if the line contains both the hall name and the predetermined class year. The format in which these two are found also depends on the directory (Goodhue vs. GHUE or ’17 vs. Senior) but these are details that can be dealt with in the main method. If both are in the line, then the value in the dictionary for the respective hall (key) is increased by 1.

public static void count(File fileName, String year, String classYear)
   {
      Scanner input = null;
      try
      {
      input = new Scanner(fileName);
      }
      catch(FileNotFoundException e){System.out.println("File loading error.");}
      String[] valuesInLine;
      while (input.hasNextLine())
      {
         String line = input.nextLine();
         valuesInLine = line.split(" ");
         List valuesInLineList = new ArrayList();
         for (String word: valuesInLine)
         {
            valuesInLineList.add(word);
         }
         for (String hall: hallsList)
         {
            if (valuesInLineList.contains(hall) && valuesInLineList.contains(classYear))
            {
               halls.put(hall, halls.get(hall)+1);
            }
         }
      }
      input.close();
   }

Finally, the display method which gives the end result. This can be done in one of two ways; the easy route being to have the program print out the final values in the command prompt window or the slightly more complex, to create a text file with those values and save it to the computer. While the former takes less time to code, the latter might save more time in the long run so I’ll talk about that one. This method takes in the class standing (defined in the main) and the year of the directory, used simply to name the file where the data is stored. These details are mostly my personal preference and could be done without. The method creates a new PrintWriter instance which takes two parameters; the name of the file and (essentially) the language/the characters to be used in it. Next, the halls and their respective values are put in to the file one by one, using the same type of for loop as the count methods, and throwing an exception if there is an issue with the instantiation of the PrintWriter function. As to specifics of the PrintWriter methods, those can be read on the PrintWriter javadoc.

public static void printTextFile(String standing, String year)
   {      
      try
      {
      PrintWriter writer = new PrintWriter(standing + "PerHall" + year, "UTF-8");
      for (String hall: hallsList)
      {
         writer.println(hall + ": " + halls.get(hall));
      }
      writer.close();
      }
      catch(IOException e)
      {
         System.out.println("File load failure");
      }

   }

With all the methods done, the main method can be synthesized in order to bring together the pieces. Since all the methods return void, they all need to be called in the main. And so the program can be as fluid as possible, the filenames and years can be input as parameters in the command line call, since java requires the program name be called anyway. Separated by a space, anything following the name of the program will be considered as another String in an array, with the first string after the program name being the 0th element of the array. Therefore the main standardizes the input, expecting the user to enter the name of the file to be analyzed, the file with hall names, the year of the directory, and the class year to search for (these details can be changed with no severe impact on the program).

Error message when providing no parameters in command line.
public static void main(String args[])
   {
      String file1= null, file2 = null, year=null, classyear = null;
      try 
      {
         file1 = args[0];
         file2 = args[1];
         year = args[2];
         classyear = args[3];
      }
      catch (IndexOutOfBoundsException e) {System.out.println("Input file to analyze, file with halls, year and classyear");}
      
      String standing;
      int FSJS = Integer.parseInt(args[3])-Integer.parseInt(args[2].substring(2, 4));
      
      if (FSJS==3) standing = "Freshmen";
      else if (FSJS==2) standing = "Sophomores";
      else if (FSJS==1) standing = "Juniors";
      else standing = "Seniors";
      
      File hallNameFile = new File(file2);
      loadHalls(hallNameFile);
      File text = new File(file1);
      count(text,year,classyear);
      printTextFile(standing, year);
      print(standing, year);
   }

Afterwards, each of the methods must be called, with the specified parameters, and bingo! you have results. Because the code is made as general as possible (that being restricted by my knowledge) it is pretty fluid and can be changed with minimal effort to take into account different formats of the directories.

Printed results.
Results as file, automatically saved to folder.

Team ResLife Second Update

Coming into 9th week, we’re a little behind where we’d like to be, but after getting some help from Austin we’re looking to finally finish data cleanup and move into the graphical aspects of our project. We’ll be working specifically with CityEngine and bringing together the raw data we’ve gathered into something more tangible.

Update On Group Project

So far, we’ve reached out to both the Carleton archives and ResLife concerning what data they might be able to offer us. We’re still waiting for an answer from ResLife, but the archives responded and unfortunately don’t have any information about room draw numbers for past years. They have confirmed that they have kept directory information in print for a certain number of years, which we would have to then transcribe into datasets.

As of now, this process is expected to be quite time consuming, which is forcing us to most likely reduce our sample size from an expected ~100 year period with yearly intervals to possibly four years or more. Analysis of room draw priorities might have to be dropped from the project because of our lack of data.

For now, everything else in the proposal (found here ) seems feasible and is on track for completion. We’ve begun our use of ArcGIS and will most likely start with SketchUp this coming week for the building models.

In the meantime we have mapped the distribution of students for this year as an example, limited to most of the residence halls (excluding town houses and northfield options).  Map.

3D Modeling

3D Modeling and simulation is most useful in projects attempting to preserve physical structures against the effects of time or simulate changes to an object that would be otherwise inconvenient in the real world. Recreation of hypotheses would be another plausible use for this type of work, for example, modeling an idea of an ancient village and using the model to compare to the modern ruins and geographical features in order to determine it’s accuracy. One must also mention the ability of one to create media from modeling and simulations, for the simple purpose of display and/or supplementing a project.

Manual Modeling

This type of modeling is best used when trying to recreate a single item (building/object/etc.) without an extensive attention to detail. Because of its time consuming nature, taking on large projects such as cities would be painfully slow, and even then would not produce the best result without some intensive accumulating  of data (e.g. building proportions). As of my understanding of manual modeling programs so far, most are very rigid in what you can make, in terms of shapes, therefore creating very specific details is either extremely hard or takes a high proficiency in the program you would use. However, when trying to model a single building, especially if it is fairly symmetrical, this approach is quite simple even to new users.

Procedural Modeling

This technique, as we saw in class, is much more useful in the recreation of large scale projects (cities), that have little bearing to a real component. The international city provided as an example was an impressive display, and if not told otherwise I would have believed it to be a model of an actual city somewhere in the world (statistically speaking it might be, regardless). Creating a set of rules for the program to act on isn’t as time consuming especially considering the result, however, making a set of rules in order to mirror a real life example would be, as mentioned in the World’s Columbian Exposition example we saw in class. This method is also adaptable and easy to modify once you have the final product, making it a good choice for simulations of real world events on what could be considered fairly accurate depictions of cities.

Scanning

Possible the most accurate modeling method, laser scanning is unfortunately limited by the hardware required for it to run. As opposed to the previous two methods, which required only a program and a computer advanced enough to run it, acquiring a laser scanner is probably expensive, and even with access to one, knowledge on how to properly use the equipment is not as intuitive (or safe) as messing around on SketchUp for a few minutes. The fact that you have to physically encompass the object you want to scan also immensely limits what types of objects you can model, and while there are likely different size scanners, the price factor grows proportionally as well. Therefore, laser scanning is best suited for detailed representations of objects.

Photogrammetry

Similar to laser scanning, this method is heavily limited by the size of what you are attempting to model. While the equipment aspect is less of a limiting factor, the work one has to put in is much more. First of all, you must possess the object you want to model, an issue if trying to do so with precious stones or museum pieces. Secondly, one must pay particular attention to the picture taking phase of the project, which requires some basic math skills and a steady hand. In this case, you reap what you sow is quite applicable, since the more pictures you take (at more angles) provides you with a more detailed result. After acquiring both the object and its pictures, the process is fairly simple, returning to the program+computer approach where one can simply follow instructions and come up with a result. Photogrammetry is detailed, yet its limits are established by size again which leaves this type of modeling best suited for objects and possibly small-scale structures.

Project Analysis

Looking through Marie Saldaña’s Modeling of Ancient Cities and Buildings, one cannot help being impressed by the scope at which Rome is pictured. From a zoomed out perspective, the model looks accurate, compared to what one would expect ancient Rome to look like, yet its the detailed aspects which are lacking. The “uncanny valley” phenomenon comes to mind a lot when taking into account procedural modeling, since many times it might look real, but there are certain elements you can’t quite pinpoint which make it feel otherwise. This I believe is the main issue with this type of project. The level of detail and consideration she describes when writing the “rules” for the code is impressive, yet still yields a random-looking result. If we could somehow combine the range of procedural modeling with laser scanning/photogrammetry in order to implement some magnitude of detail within the rules, we would find, in my opinion, an improved result.

 

 

Music Hall

This was an exercise that wasn’t necessarily challenging to do, but was quite challenging to do well.  When looking for pictures in the archives, I found that most buildings had usable pictures from only one angle (if any at all) and furthermore, pictures often had different shadings and differences which made it hard to end up with a consistent end product.

I wound up doing the Music Hall, and I must say it doesn’t even come close to the quality of what we did in class. This building was a bit trickier since it isn’t exactly square, and the roof took me a little while to figure out. Furthermore the best picture I could find only showed half of one of the shorter sides, meaning I had to copy, paste and mirror part of it which doesn’t look that great either. It was still a pretty entertaining exercise though, and I wish I could have had more pictures to work with.

Group Project Proposal

Hieu, Tristan and myself have decided on creating a project which analyzes the trends of students in the residence halls over the years, how this has changed and how this data might correlate with, say, field of study, interests and class year. We would also like to examine how the process of the Room Draw has changed and what weight each number truly has/what halls have been prioritized over the years. The idea is to create an interactive map which fluctuates according to the time period the client chooses. The map would display an accurate Carleton campus for the time, a graphical representation of the aforementioned groups and student distributions in each hall, as well as media of each hall (possibly as outbound links) in the form of either images or SketchUp files.

Data

Most importantly we would need information regarding the housing information of all students at Carleton, a data set which is easily accessible for the current four years. For previous iterations we hope the Carleton Archive still holds some of this information.

This will most likely manifest in the form of a flat database, an excel spreadsheet or something of the likes, which we can populate easily. However, how we use the data is heavily dependent on what form it is provided to us in the first place.

The data would be separated first by year, and subsequently by residence hall, and presumably the percentage of each class (% freshmen, sophmores, etc.) that resides within. Most interesting would be the movement of freshmen since they are placed into housing by the school, whereas the rest of the population gets some semblance of a choice, so the disparities between those two sets should be interesting.

Tools

  • A flat database of some sort (Excel, Google Spreadsheet).
  • ArcGIS/GoogleMaps as the main display.
  • SketchUp for complementary media.
  • WordPress for main site.

Timeline

By Week 7 – Finish gathering of data, create a realistic plan for the final version of the project with the data we have managed to find.

Week 7/8 – Organize the data into a selected database which we can easily manipulate using some of the other tools.  Establish subsections of information.

Week 8 – Complete complementary media files; find pictures, create SketchUp files of each residence hall. Setup basis for the website the project will be hosted on.

Week 9 – Bring together databases and media into the map. Add finishing touches to the website. Create presentation.

Similar Projects

Because our project is as unconventional as it is, there weren’t many projects acting as precise guidelines to what we want to do. However, many of the population tracking projects share a similar goal to what we have in mind.

Mapping Danish Population – Change in population over time in Denmark.

Animal City – Analysis of role of animals in San Francisco and where they lived.

Encompasses more of what we seek:  “What urban spaces did they inhabit and how did those spaces change over time?” in regards to different classes (in our case).

Mapping

For the in class map, I wanted to look at the relationship between early settlements in the US and where some of the early colleges appeared. Unfortunately I wasn’t able to find any good layers or data regarding relevant enough, and settled with a comparison between the colleges and some of the population centers of the 1800s on the East Coast. The ArcGIS setup was simple enough to understand, although it feels like some of the more advanced uses of the program are still far beyond my grasp.

My Map.

Difference of database

The exploration of my website’s back end did not go as well as expected, since even opening up the phpMyAdmin site was overwhelming and puzzling in all the information it seemed to offer. At most I figured out how to edit the site in the same ways I can directly through the WordPress interface, but moving into other tabs of the php I found I couldn’t understand how to properly make us of the prompts and information provided there. After exposure to the database conundrum, I figured the use of a relational database seemed to be redundant and in most cases pointless, since the main point was apparently to reestablish groupings created in the flat one. After some consideration, I see how a relational database could be more useful to condense what would otherwise be an expansive data set.

As for pros and cons, Flat Databases are useful in their presentation, providing immediate access to all data points, and thus a manageable way to compare them. Sorting flat data sets is also easier than a relational database, as one does not have to worry about the shifting of related values in other graphs. The layout is consistent, so even if you have a large pool of y(?)-values, the reference point to the column headings is the same. Similarly, they are fairly easy to set up and populate, but are most ideal for smaller data sets, which leads into the cons.

Flat databases have no way of representing complex relationships between points, in other words they can only display the connection between pieces of information provided in the rows/columns. Furthermore these points cannot be easily indexed. Conversely, when working with larger sets, a flat database is too spread out and obfuscating.

Relational Databases, on the other hand, are optimal for large data sets with similar genres of information. The ability to combine different subsets into graphs provides a flexibility when attempting to change or index entire sets. This form is also more efficient (space-wise) than flat databases when dealing with large sets.

Negatively, relational databases are, in essence, subjective to the author’s understanding of the relationship between different groups of the data. Even if working with the same data set, separate authors might come up with different subgraphs, which causes a problem when attempting to combine similar data from separate sources.

Before choosing either, it is important to take into account the influence of data collectors on the data and how that might manifest in the separation of data points into groups.