1. Preface
In recent posts in this blog series, we have learned how to plot cross-sectional data and map geographical data using Plotly with Python. This article shows how to graph networks using a digital humanities example with Instagram hashtags.
Take a look at the figure below:
The interactive graph shows a network of Instagram hashtags including the following five: 'palmtree', 'garden', 'beach', 'vacation' and 'home'. As I talked briefly about in the last post, I started to be involved in a palm tree business and decided to check how palm trees are associated with different scenes. You can click on each node (colored dot) representing a particular hashtag to see which of the five main tags it appears together with most often (color) and its frequency of occurrence (weight). The graph was generated with Fruchterman Reingold algorithm. This is a great layout for seeing the macro structure of the network like the number of nodes, edges and the modularity.
On the other hand, an alternative algorithm called Force Atlas is useful for studying how strongly associated with each other the nodes are:
It shows that the hashtag "#palmtree" appears with "#vacation" and "#beach" more often than with "#garden" and "#home". That might suggest palm trees are photogenic for tourists and something property owners would want to show in their pictures on Airbnb and Bookings.com. Having those trees at home can be hard, depending on where you live, but as I've seen myself, it's a popular luxury in "#sanfrancisco".
Quite predictably for the frequenters, these graphs are what we are going to learn how to make of in this article. Let's go.
2. Data Collection and Edge Calculation
In this project, we refer to this tutorial and this command shared by Marcos Junior, a contributor to Medium. There are some errors and parts that need to be changed since the latest Python module NetworkX has a new structure and built-in functions. You can find the modified code for the data collection and edge calculation here within my repository.
When running the command, you may encounter a NameError in the cell with its head saying "checking non-tagged medias". That might be because your data have no non-tagged media; you can ignore it and keep running the following commands. You will have created multiple files.
The code used here is too long for a line by line analysis here. If you want to understand it fully, check Marcos' tutorial and look up the functions used there.
3. Graphing the Network
Once the code is run, install the open source network graphing software Gephi to your computer for free if you don't have it yet. You may get an error message like "Error: Cannot find Java 1.8 or higher" when you try to open Gephi. You can find and install the latest version that fits your environment here and then you will be able to open it.
In Gephi, you can preview a network graph you construct in the Preview mode. Usually, the preview appears when you click on the "Refresh" button. If it doesn't, find the AppData folder in the disk and clear the user directory. Doing this magically fixes the issue.
One more setting. We can use Gephi's extension called Sigma.js to export and embed graphs like those you saw above elsewhere. You can find how to download in this video. We'll discuss its use again below.
Ok. When you are all set, open Gephi and load the nw.graphml file generated in the final line of the code. The data will be mapped into a box of black dots that looks similar to this:
From here, we are going to edit the graph with a few attributes. First, click on the Statistics button on the right hand side of the window and run a Modularity analysis:
There are 5 modularity classes (0, 1, 2, 3, 4; garden, home, beach, vacation, palmtree) containing 500 to 900 nodes (hashtags) each. The modularity value is around 0.575, indicating that the hashtags indeed form clusters with the five main tags to a meaningful degree (for a thorough explanation of modularity in the network theory, see this study). Let's reflect this classification on the graph using the Appearance button on the left hand side:
You can reflect several attributes of the nodes and edges in their size, sickness and color using that section. Now, moving down, you can try different algorithms for Layout. There are a few but not many so you can try all of them. Although Force Atlas takes a lot of time, Force Atlas 2 is much quicker; you can run Force Atlas afterward if you need to:
If you are happy with what you have drawn, watch the rest of the video about Sigma.js and export the graph. If you want to embed it like I did here in this post, follow the step discussed in this post (upload the files in your repository, get the link, and embed the iframe). You can see my repository with the second graph here. That's it.
4. Postface
With the methods introduced here, now you are ready to analyze social and other types of networks. Network analysis is very useful for getting insights form a large set of qualitative data.
In recent posts including this one, we have learned how to visualize different types of data in the ways most appropriate for each. There is more to come, and I'll update the blog as soon as possible though I'm getting busier with the palm tree business.
Until then, keep up with the metrics!
Commenti