Sentiment and User Network Analysis Using YouTube Comments

Yasin Khadem Charvadeh

Sentiment analysis represents a specialized branch of text mining designed to extract and quantify emotional attitudes within written content. This technique proves especially valuable for organizations aiming to gauge public opinion and consumer attitudes toward their brands, offerings, or services, enabling data-driven improvements to product development and customer experience strategies.

Network analysis is a robust analytical framework for mapping and visualizing relational structures among interconnected entities. It has been widely applied across disciplines such as social psychology, behavioral research, physics, and epidemiology, particularly in modeling disease transmission patterns. In this analysis, I apply sentiment and network analysis to data extracted from two YouTube videos, using R for computation and visualization.

Sentiment Analysis

Extract data from YouTube

library(tidyverse)
library(magrittr)
library(vosonSML)
library(igraph)
library(syuzhet)

## To download YouTube data, you need to obtain your unique Google developer API key 

## Authenticate (my_apikey is not shown here due to privacy and confidentiality reasons)

youtubeAuth <- Authenticate("youtube", apiKey = my_apikey)

## Specify the link of the YouTube video
## The following link is for "Stranger Things 4 | Official Trailer | Netflix"

Stranger_things__id <- GetYoutubeVideoIDs(c("https://www.youtube.com/watch?v=yQEondeGvKo"))
## Extracted 1 video ids.
## Collect the data for 500 top-level comments including the reply comments to these top-level comments

youtube_data <- youtubeAuth %>%
  Collect(videoIDs = Stranger_things__id,
          maxComments = 500,
          verbose = FALSE)
## Collecting comment threads for youtube videos...
## Video 1 of 1
## ---------------------------------------------------------------
## ** video Id: yQEondeGvKo
## -- API returned more than max comments. Results truncated to first 500 threads.
## ** Collected threads: 500
## ** Collecting replies for 104 threads with replies. Please be patient.
## ........................................................................................................
## ** Collected replies: 191
## ** Total video comments: 691
## ---------------------------------------------------------------
## ** Total comments collected for all videos 691.
## (Estimated API unit cost: 226)
## Done.
## Elapsed time: 0 hrs 0 mins 21 secs (21.22)
## Extract sentiment scores

sentiment_score <- get_nrc_sentiment(youtube_data$Comment)

head(sentiment_score)
##   anger anticipation disgust fear joy sadness surprise trust negative positive
## 1     0            0       0    0   0       0        0     0        1        0
## 2     0            1       0    0   0       0        0     0        0        0
## 3     0            0       0    0   0       0        0     0        0        0
## 4     0            0       0    0   0       0        0     0        0        0
## 5     0            0       0    0   1       1        0     0        0        1
## 6     0            1       0    0   0       0        0     0        0        0
## Calculate sentiment score percentages

dt <- data.frame(Score = colSums(sentiment_score)*100/sum(sentiment_score), 
                 Sentiment = colnames(sentiment_score))

## Make a barplot

dt %>%
  mutate(Sentiment = fct_reorder(Sentiment, Score)) %>%
  ggplot(aes(x = Sentiment, y = Score, fill = Sentiment)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = scales::label_percent(scale = 1)) +
  theme(axis.text.x = element_text(
    angle = 45,
    vjust = 1,
    hjust = 1
  )) +
  scale_fill_viridis_d()

Network Analysis: Creating YouTube Networks

## The following YouTube link is for a video called "We WILL Fix Climate Change!"

climate_change_id <-
  GetYoutubeVideoIDs(c("https://www.youtube.com/watch?v=LxgMdjyw8uw"))
## Extracted 1 video ids.
youtube_data <- youtubeAuth %>%
  Collect(videoIDs = climate_change_id,
          maxComments = 500,
          verbose = FALSE)
## Collecting comment threads for youtube videos...
## Video 1 of 1
## ---------------------------------------------------------------
## ** video Id: LxgMdjyw8uw
## -- API returned more than max comments. Results truncated to first 500 threads.
## ** Collected threads: 500
## ** Collecting replies for 216 threads with replies. Please be patient.
## ........................................................................................................................................................................................................................
## ** Collected replies: 422
## ** Total video comments: 922
## ---------------------------------------------------------------
## ** Total comments collected for all videos 922.
## (Estimated API unit cost: 450)
## Done.
## Elapsed time: 0 hrs 0 mins 40 secs (40.28)

Actor Network

## In the actor network the nodes are users and the edges are the interactions between users in the comments

actor_graph <- youtube_data %>%
  Create("actor") %>%
  AddText(youtube_data) %>%
  Graph(writeToFile = TRUE)
## Creating igraph network graph...
## Adding text to network...Generating youtube actor network...Done.
## Done.
## GRAPHML file written: C:/Users/YKC/Desktop/My Website Final Projects/Network Analysis/2022-07-11_163612-YoutubeActor.graphml
## Done.
## Number of edges

gsize(actor_graph)
## [1] 923
## Number of vertices

gorder(actor_graph)
## [1] 563
## Summary of the igraph graph object
## DN-- in the first row stands for "Directed Named" network
## It is directed network because each edge can only be traversed in a single direction
## It is named network because each node has a unique ID

actor_graph
## IGRAPH 3f77c70 DN-- 563 923 -- 
## + attr: type (g/c), name (v/c), screen_name (v/c), node_type (v/c),
## | label (v/c), video_id (e/c), comment_id (e/c), edge_type (e/c),
## | vosonTxt_comment (e/c)
## + edges from 3f77c70 (vertex names):
## [1] UCsXVk37bltHxD1rDPwtNM8Q->VIDEOID:LxgMdjyw8uw
## [2] UCvPRLwQm7YlUrppRWkyw__A->VIDEOID:LxgMdjyw8uw
## [3] UCAUOxHqS8qODm0TnkwaZs7g->VIDEOID:LxgMdjyw8uw
## [4] UC3mmQYJUrxrAmwysDn_sdSg->VIDEOID:LxgMdjyw8uw
## [5] UC0w7SMDB8t4ICO9SF_x8bHA->VIDEOID:LxgMdjyw8uw
## [6] UCkNltE2T_f4BaYky9lGsijQ->VIDEOID:LxgMdjyw8uw
## + ... omitted several edges
## List of vertices

V(actor_graph)
## + 563/563 vertices, named, from 3f77c70:
##   [1] UCsXVk37bltHxD1rDPwtNM8Q UCvPRLwQm7YlUrppRWkyw__A UCAUOxHqS8qODm0TnkwaZs7g
##   [4] UC3mmQYJUrxrAmwysDn_sdSg UC0w7SMDB8t4ICO9SF_x8bHA UCkNltE2T_f4BaYky9lGsijQ
##   [7] UCkUcVrXC3eQVEhWFmqYOR-w UCdkv14aWdwrLUf1D1ASDVHg UCDrFTvydoXBezTDfegSW4tA
##  [10] UCwYmTHwUeP8txCn_NGAlnEQ UCumt1oM5oFaAJQago81PL2g UCtI8ZdTpTZl6VzjJg8JyTGg
##  [13] UCPnRO10l80Th2t724ANjNLQ UC6dE07rYx9uoZOvTDwZezgw UCAfdHnP6GMBpz2z9pSoyq0Q
##  [16] UCqEKWkLNKUAy19ME3TCSckQ UCVMajr4RKvW9fxpCHqt_W0A UCNLSADwMu2cmr0i9i2U_AJA
##  [19] UCyWAsBotjd0jtoJSaWbzhrw UCaNOFcZ4Oh2WCCk4toEKmBA UCLjBNaBpGwRfB_zmFeeTGYg
##  [22] UCKVcmvoUIFlHDprunpm_DHA UCxAi3hcuYQ1VG__yT1bbM-g UCWlXaSjLvSDftLPw-eDhSmQ
##  [25] UCT-0pFRGTQ-IAADMM6DVsiA UCFhAOvc-X2_-4JpyHbmGXng UCDPdfOL_JlNNv6bQNN9DIDA
##  [28] UC7X3vfprwavn-MUbjweF0qw UChne_uFaPgY0I4gil4Fa0-A UCATFmvRh5o4KNjkKVdgp_6Q
## + ... omitted several vertices
## List of edges

E(actor_graph)
## + 923/923 edges from 3f77c70 (vertex names):
##  [1] UCsXVk37bltHxD1rDPwtNM8Q->VIDEOID:LxgMdjyw8uw
##  [2] UCvPRLwQm7YlUrppRWkyw__A->VIDEOID:LxgMdjyw8uw
##  [3] UCAUOxHqS8qODm0TnkwaZs7g->VIDEOID:LxgMdjyw8uw
##  [4] UC3mmQYJUrxrAmwysDn_sdSg->VIDEOID:LxgMdjyw8uw
##  [5] UC0w7SMDB8t4ICO9SF_x8bHA->VIDEOID:LxgMdjyw8uw
##  [6] UCkNltE2T_f4BaYky9lGsijQ->VIDEOID:LxgMdjyw8uw
##  [7] UCkUcVrXC3eQVEhWFmqYOR-w->VIDEOID:LxgMdjyw8uw
##  [8] UCdkv14aWdwrLUf1D1ASDVHg->VIDEOID:LxgMdjyw8uw
##  [9] UCDrFTvydoXBezTDfegSW4tA->VIDEOID:LxgMdjyw8uw
## [10] UCwYmTHwUeP8txCn_NGAlnEQ->VIDEOID:LxgMdjyw8uw
## + ... omitted several edges
## We add the following new indicators which can be useful for regression analysis
## We do not use this indicators in our analysis, however it is useful to be familiar with them

## Degree and degree distribution of the vertices

V(actor_graph)$degree <- degree(actor_graph)

glimpse(V(actor_graph)$degree)
##  num [1:563] 17 1 1 1 1 17 1 1 1 1 ...
## Eigenvector Centrality Scores

V(actor_graph)$Eigen <- evcent(actor_graph)$vector

glimpse(V(actor_graph)$Eigen)
##  num [1:563] 0.0562 0.039 0.039 0.039 0.039 ...
## Betweenness centrality

V(actor_graph)$betweenness <- betweenness(actor_graph)

glimpse(V(actor_graph)$betweenness)
##  num [1:563] 26.1 0 0 0 0 ...
## Use as_long_data_frame() if you want to see the metadata about both the vertices and edges of the graph. 

actor_graph_df <- as_long_data_frame(actor_graph)

## Note that, the graph object here does not have any attributes

set.seed(123)

plot(actor_graph,
  edge.color = "grey",
  vertex.color = ifelse(V(actor_graph)$node_type == "video", "#20928CFF", "#FDE725FF"),
  vertex.label = "",
  vertex.size = 4,
  edge.arrow.size = 0.5,
  layout = layout.fruchterman.reingold)

## Creates a sub-network containing only the replies to top-level comments
## Remove nodes with no connections

rep_comm <- delete.edges(actor_graph, which(E(actor_graph)$edge_type != "reply-comment"))

rep_comm <- delete.vertices(rep_comm, which(degree(rep_comm) == 0))

V(rep_comm)$color <- "grey"

## Identify comments with particular terms and use red color to show them

ind <- tail_of(rep_comm, grep("earth|technology|climate change|fossil fuels", tolower(E(rep_comm)$vosonTxt_comment)))

V(rep_comm)$color[ind] <- "red"

set.seed(234)

plot(rep_comm,
  edge.color = "grey",
  vertex.label = "",
  vertex.size = 4,
  edge.arrow.size = 0.5,
  layout = layout.fruchterman.reingold)

Activity Network

## In the activity the nodes are comments or videos and edge types are "comment" and "reply-comment"

activity_graph <-
  youtube_data %>% 
  Create("activity") %>%
  AddText(youtube_data) %>%
  Graph(writeToFile = TRUE)
## Creating igraph network graph...
## Adding text to network...Generating youtube activity network...
## -------------------------
## collected youtube comments | 922
## top-level comments         | 500
## reply comments             | 422
## videos                     | 1 
## nodes                      | 923
## edges                      | 922
## -------------------------
## Done.
## Done.
## GRAPHML file written: C:/Users/YKC/Desktop/My Website Final Projects/Network Analysis/2022-07-11_163615-YoutubeActivity.graphml
## Done.
V(activity_graph)$color <- viridis::turbo(1, direction = 1)

V(activity_graph)$color[which(V(activity_graph)$node_type == "video")] <- "red"

ind <- grep("earth|technology|climate change|fossil fuels", tolower(V(activity_graph)$vosonTxt_comment))

V(activity_graph)$color[ind] <- viridis::cividis(1, direction = -1)

set.seed(345)

plot(activity_graph,
  edge.color = "grey",
  vertex.label = "",
  vertex.size = 4,
  edge.arrow.size = 0.5,
  layout = layout.fruchterman.reingold)