How to Make Interactive Networks using R programming

FelixAnalytix
6 min readDec 13, 2022

--

Would you like to know how to create beautiful interactive networks using R programming?

First we will scrap the data from Wikipedia, structure the data in a tidy way, and build an interactive network of the main Marvel characters.

You can play with the final interactive network here:

You can also watch this tutorial on YouTube:

How to Make Interactive Networks using R programming

Download the R code of this tutorial by joining my newsletter on www.felixanalytix.com. You will receive an automatic email from me to access the R script.

Web Scraping Wikipedia using rvest

How to make interactive networks using R programming

The first thing we want to do is to install the necessary R packages for this data analysis.

We will use the tidyverse, which is a collection of R packages to do data wrangling or visualization, rvest to do the web scrapping in a tidy way, tidygraph to do network analysis (also in a tidy way), and visNetwork to make the interactive network visualization.

Now we want to scrap Marvel data from Wikipedia. After a quick research online I found have three nice tables on Wikipedia.

Below the first table of the most recurring characters in the phase 1 of the Marvel Cinematic Universe.

Most recurring Marvel characters movies in phase 1 of the cinematic universe

The table contains the main characters and the name of the movies. The idea is is to take this data and transform it in a tidy structure, i.e. the movies in a “movie” colomn and the Marvel character in a “character” column.

In your Web browser you can right click, select “inspects” (Q) and you should see that Wikipedia has a class wikitable for its tables.

Web scraping using the “wikitable” class

Using rvest we can scrap only the table from this specific class. We will do this operation for each Marvel cinematic universe phase, each having its own Wikipedia page.

list of URLs to scrap

We will create a function get table which will extract the specific table from the Web page. The function reads the HTML page, keep all the nodes that have the class wikitable, then extract all the tables using the “html_table” function from rvest with the empty characters as NAs.

We also want to pivot the data using the “pivot_longer” function with the movie and the actor as variables for the names on the values. We use some regular expressions to remove and clean the actor names as well as the character names. We also add additional spaces if needed.

We can now loop on each URL using the “map_dfr” function from the purrr package on each of the URLs. The “map_dfr” function will also join the three data frames into a single one, called here “df”.

Our dataset is structured in tidy way format, with NAs if the character is not in a specific movie. For example Hulk is not in “Iron Man” movie but it is of course in the movie “The Incredible Hulk”.

We also need to do some additional cleaning because there is a “c” character that comes sometimes after the name of the characters. The little “c” indicates an credited Cameo rule, meaning the character appeared but very briefly in the movie. Let’s remove all the Cameo characters using a regular expression (the “$” sign means that the “c” comes at the end of the string when we detect it, and we will replace it as an “NA” character.

Extra data cleaning

We also rename the movie names when they have the same name as the Marvel characters, because we don’t want to have the character name that have the same name as the movie. That’s why I decided to rename the movie “Thor” as “Thor 1” and the “Iron Man” movie as “Iron Man 1”. We also want to remove all the characters that are not appearing in the movies using “filter(!is.na(Actor))”.

Quick Exploratory Data Analysis

Now let’s do a is a very quick exploratory data analysis. We can count the most recurring characters and the movies with the most Marvel characters using the “count()” function from dyplr.

Descriptive statistics using count() from dyplr

We can use different metrics using the tidygraph R package. We first need to transform our data frame as a tbl_graph using the R “as_tbl_graph” function from the tidygraph R package.

create a tbl_graph object

The tidygraph R package allows to easily get some metrics such as degree, betweenness, closeness, etc. You can even access a variety of different metrics such as the page rank algorithm originally used by Google.

For example, we can analyze the nodes (by first activating the nodes), then get a centrality degree metric using “centrality_degree()”. This allows us to know which are the characters that are the most common.

We see that the numbers of degree for Iron Man on Captain America are 9, exactly as expected (we already saw this result using the “count()” function from dplyr).

I will not go further into the different metrics of the tidygraph. I could do another article if you’re interested in (let me know in the comments below). Feel free explore by yourself the other functions of the tidygraph package.

How to create the interactive network

Before building the interactive network we will add an additional “group” variable, which will allow us to visualize differently the Marvel characters and the movies in the interactive network. We can now transform our dataframe as a list using the “toVisNetworkData()” function.

transform the tidy dataset into a list object

Our “vis_network” object can now be read by the “visNetwork()” function to create automatically the interactive visualization. We can create this interactive Network by adding the nodes from this network to the “nodes” argument of the function “visNetwork()”, and the edges to the “edges” argument.

Code to make the interactive network using R

You can choose the “weight”, the “height” and even give it a title, i.e. “The Marvel Cinematic Universe Network”. We also add a random seed because the network structure is generated in a random way. I decided to add some icons to differentiate the movies and the characters using fontawsome icons. I also want to highlight the nearest edges or nodes when hovering my mouse using the “highlightNearest” argument.

You can see the result and play with this interactive network at the following URL: https://felixanalytix.com/vis/marvel-network

If you found this tutorial useful consider giving me a clap or even subscribe. You can download the full R script of this tutorial by joining my newsletter on www.felixanalytix.com.

See you in another tutorial, bye!

--

--