Github Network Network Geographics Communities

Explore the Network of Users

On this page, you can explore the network of GitHub users. The total network consisted of 1585 nodes and 2092 connections, however only the giant connected component is shown here consisting of significantly less nodes being 737 nodes and 2072 edges. The network characteristics are analysed as well as the text from the user profiles and repository descriptions by term frequency. This is then combined with a sentiment analysis of the individual users. Additionally, the users are analysed based on their total stars, forks, followers as well as the number of users they follow themselves in order to see who are initially the most popular users.

Visualization

The nodes are scaled according to their node degree, so that the nodes with the most connections to other nodes will appair with larger node size. The nodes are positioned based on the Force Atlas 2 algorithm, which is placing each node spatially depending on its connection to the other nodes. Here it can already be seen that there are some very dominant users. Hover over the nodes to see who.


Analysis

Degree Distribution

IN degree

Forest

OUT degree

Forest
Both the in- and out-degree distributions are asymmetric as they are right skewed with long tails. Although most users have zero or small degrees, there are a few users with a large degree as well. This indicates the presence of a few users that are highly connected to other users in the network. These highly connected hubs of users can be seen as responsible for keeping the network connected. So a removal of one of the key users would have a huge impact on the network stucture. Since the hubs of key users are highly connected, they collect information about other key users fast, but they also spread information quickly to other users. The log-log plots of the distributions demonstrate that the network follows the power-law degree distribution meaning that this network is scale-free. This is a typical property of real world networks.

Centrality

Degree centrality

Forest

Eigenvector and betweenness centrality

Forest
From exploring the different centrality measures, some of the most central users according to degree -, eigenvector- and betweenness centrality were revealed. It can be seen that some of the most followed users in the network are amueller and ogrisel, whereas yupbank is the user who follows most of the other users. Hence these three users are also the ones with most connections overall. The users ogrisel, agramfort and yupbank are central according to eigenvector centrality meaning that they have high influence within the network.

User Popularity

The most popular users

Fan users following other users the most

Users with the most stars

Users with the most forks

From the barcharts, it can be concluded that llSourcell is a very popular user with over 35K+ followers. He has a youtube channel, and is also in the top 5 for having most total stars and total forks across his repositories. The user rasbt is also mentioned in top 5 for these three variables and is a user more related to academia. The user jwasham is also mentioned in top 5 and is the user with most stars across his repositories. He has a blog, startupnextdoor, where he shares leanings from running and starting web businesses, thereby being more connected to industry.


Text and Sentiment

Word Clouds

Below the most frequent single words as well as word pairs within user biographies and repository descriptions are shown in word clouds. The programming languages used in the repositories are also shown to get an overview of the most popular languages.

Repository descriptions

Snow Snow

User biographies

Snow Snow

Programming languages

Snow Snow
It is seen from the biographies that many users seem to be engineers, scientiest, students, phds, researchers and developers indicating that the network has well educated and hard working users. Adding context from the words pairs this mostly regards software, computer and data science as well as deep learning. Python is big not only for the programming languages but also for the repository descriptions. So GitHub is definitely popular for Python users and escpecially combined with Jupyter notebook. Though, it is observed that many other languages are used as well.

Sentiment Analysis

A sentiment analysis of the repository descriptions and biographies written by the individual users is presented below. The happiness or sadness for each user was determined by comparing the words in the texts to a predefined list of words with a related happiness score. This list is available from the article Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter. Below the top happiest and saddest users based on their repository descriptions and biographies are presented.

Repository descriptions

Forest

User biographies

Forest
Looking at the sentiment scores, it is seen that the user gitter-badger has a very high sentiment score and is quite positive using words as love and happiness, whereas some users as Qovaxx is deemed quite negative due writing about cancer. It can also be concluded that users such as dangsonbk and weakish are probably not worth following due biographies with words as lazy and lost. However this obviously depends on what kind of users you are interested in.