Lab 04 - CLoseness/Betweenness Centrality - INSTRUCTIONS

CRJ 507 Social Network Analysis

Introduction

The purpose of this lab is to familiarize yourself with calculating closeness centrality and betweenness centrality scores as well as centralization scores for undirected and directed networks in R. Please review the following materials before beginning the lab:

In Parts I and II of this lab we will revisit the networks we used in Lab 3 - Degree Centrality and Centralization. In Part III, you will use the two networks you created from Lab 1 and Lab 2.


Part I: Working with an Undirected Network

For this part of the lab you will again use data from Thomas Grund and James Densley’s study of ties among members of an inner-city gang in London, England. The network is undirected, binary ties collected from arrest data. This network is available in SNACpack and is called london_gang_net. To review documentation on the network, use ?london_gang_net.

Network and Matrix Objects

Note that the object, and all networks in the SNACpack package, is of class network. This means that to use functions for objects of class matrix (such as dim() or sum()), you need to first coerce the object to a matrix object using as.matrix. For example: my_matrix <- as.matrix( my_network ).


For the london_gang_net network, do the following:

  1. Without running any code, describe what kind of actor in this network is likely to have a high closeness centrality score. Now, describe what kind of actor in this network is likely to have a low betweenness centrality score.
  2. Calculate the degree, closeness, and betweenness centrality scores for each actor. Feel free to use the degree(), closeness(), and betweenness() functions from the sna package.
  3. There are two actors with the highest closeness centrality scores. Who are they and what does this say about their position in the network? There is one actor with the highest betweenness centrality scores. Who are they and what does this say about their position in the network?
  4. Is the one actor with the highest betweenness centrality score one of the actors with the highest closeness centrality scores? What does this tell us about betweenness and closeness centrality for this network? Is the one actor with the highest betweenness centrality score one of the actors with the highest degree centrality scores? What does this tell us about betweenness and degree centrality for this network?
  5. What does it mean to “standardize” closeness centrality? What about betweenness centrality? Why might we want to do this?
  6. Calculate and report the standardized degree, closeness, and betweenness centrality scores for each actor.
  7. Are the actors with the highest closeness centrality scores the same individuals with the highest standardized closeness centrality scores? Why or why not?
  8. Are the actors with the highest betweenness centrality scores the same individuals with the highest standardized betweenness centrality scores? Why or why not?
  9. Calculate the mean closeness centrality score and the mean betweenness centrality score. Interpret each mean centrality score. What do these means suggest about the structure of the network?
  10. Graph centralization is often used to assess how “star-like” a network is. If we were examining closeness centrality, what would a completely centralized network look like? What would its closeness centrality scores be? Describe it (or, if you are feeling spunky, create one and plot it!).
  11. If we were examining betweenness centrality, what would a completely centralized network look like? What would its betweenness centrality scores be? Describe it (or, if you are still feeling spunky, create one and plot it!).
  12. Calculate the graph centralization scores for degree, closeness, and betweenness centrality.
  13. Interpret each graph centralization score. What does each tell us about the distribution of centrality scores in the network? Would you describe the network as centralized or decentralized? Justify your answer using the score.
  14. Plot the network three times using the gplot() function where each plot sizes the nodes by the centrality score (i.e. a plot for degree centrality, a plot for closeness centrality, and a plot for betweenness centrality). Examine the plots. Describe how the differences you noted between the centrality measures above are seen in the plots.
  15. Suppose you are an analyst in a gang unit in London. After analyzing this network, what might you do to address gang involvement? How is this informed by your knowledge of the network? Is your answer different based on whether you are examining degree centrality, closeness centrality, or betweenness centrality?


Part II: Working with a Directed Network

For this part of the lab you will use data from Mangia Natarajan’s study of a large cocaine trafficking organization in New York City. The network is directed, binary ties of communication between individuals collected from police wiretaps of telephone conversations. This network is available in SNACpack and is called cocaine_dealing_net. To review documentation on the network, use ?cocaine_dealing_net.

Network and Matrix Objects

Note that the object, and all networks in the SNACpack package, is of class network. This means that to use functions for objects of class matrix (such as dim() or sum()), you need to first coerce the object to a matrix object using as.matrix. For example: my_matrix <- as.matrix( my_network ).


For the cocaine_dealing_net network, do the following:

  1. Calculate the indegree, outdegree closeness, and betweenness centrality scores for each actor. Feel free to use the degree(), closeness(), and betweenness() functions from the sna package.
  2. Does the node with the highest closeness centrality score also have the highest betweenness centrality score? Explain your answer beyond “yes” or “no”.
  3. Look at the node with the highest indegree centrality score. Do they have the highest closeness centrality score? Betweenness centrality score? What does this say about their position in the network?
  4. Now, look at the node with the highest outdegree centrality score. Do they have the highest closeness centrality score? Betweenness centrality score? What does this say about their position in the network?
  5. Calculate and report the standardized indegree, outdegree, closeness, and betweenness centrality scores for each actor.
  6. Calculate and report the mean (unstandardized) indegree, outdegree, closeness, and betweenness centrality score. Interpret each mean centrality score. What do these means suggest about the structure of the network?
  7. Calculate the graph centralization scores for indegree, outdegree, closeness, and betweenness centrality.
  8. Interpret each score. What does it tell us about the distributions of each type of centrality score in the network? For each measure of centrality, would you describe the network as centralized or decentralized?
  9. Plot the network four times using the gplot() function where each plot sizes the nodes by the centrality score (i.e. a plot for indegree centrality, a plot for outdegree centrality, a plot for closeness centrality, and a plot for betweenness centrality).
  10. Examine the plots. Describe how the differences you noted between the centrality measures above are seen in the plots. If you can’t see them, think about how to modify the plot to better visualize these differences.
  11. Suppose you are an analyst in a drug unit examining cocaine trafficking. After analyzing this network, what might you do to curb cocaine trafficking? How is this informed by your knowledge of the network? Is your answer different based on whether you are examining indegree centrality, outdegree centrality, closeness centrality, or betweenness centrality?
  12. Now, suppose your colleague said “hey, this is a directed graph, but you can calculate the closeness and betweenness assuming it is an undirected graph you know?” Well, of course you know that, but should you? Think about what the directionality in the graph represents and then explain your answer.


Part III: Your Networks

Pick one of your networks you created in Lab 1 and Lab 2 and do the following:

  • If the network is undirected, repeat the steps in Part I.

  • If the network is directed, repeat the steps in Part II.


Challenge Question

Feeling bold? Try this challenge question. It is not required, but try taking a crack at it.

  • Here is an edgelist for a small 6-node directed network:
    • A → B
    • A → C
    • B → C
    • C → A
    • D → C
    • E → F

Task: Without using a computer, sketch this network.

Then, answer:

  • Which node(s) have the highest closeness centrality?
  • Which node(s) have the highest betweenness centrality?




How to Submit

Download the template for this lab prior to beginning the lab. The template contains code for accessing the data files.


Knitting to HTML

When you have completed your assignment, click the “Knit” button to render your .RMD file into a .HTML report.


Special Instructions

Upload both your .RMD and .HTML files to the appropriate link for this assignment on the Canvas page for this course.


Before You Submit

Remember to ensure the following before submitting your assignment.

  1. Name your files using this format: Lab-##-LastName.rmd and Lab-##-LastName.html
  2. Show both the solution for your code and write out your answers in the body text

See Google’s R Style Guide for examples of common conventions.



Common Knitting Issues

.RMD files are knit into .HTML and other formats procedural, or line-by-line.

  • An error in code when knitting will halt the process; error messages will tell you the specific line with the error
  • Certain functions like install.packages() or setwd() are bound to cause errors in knitting
  • Altering a dataset or variable in one chunk will affect their use in all later chunks
  • If an object is “not found”, make sure it was created or loaded with library() in a previous chunk

If All Else Fails: If you cannot determine and fix the errors in a code chunk that’s preventing you from knitting your document, add eval = FALSE inside the brackets of {r} at the beginning of a chunk to ensure that R does not attempt to evaluate it, that is: {r eval = FALSE}. This will prevent an erroneous chunk of code from halting the knitting process.


Please report any needed corrections to the Issues page. Thanks!