Create Co-occurrence Network
This example shows how to create a co-occurrence network using a bag-of-words model.
Given a corpus of documents, a co-occurrence network is an undirected graph, with nodes corresponding to unique words in a vocabulary and edges corresponding to the frequency of words co-occurring in a document. Use co-occurrence networks to visualize and extract information of the relationships between words in a corpus of documents. For example, you can use a co-occurrence network to discover which words commonly appear with a specified word.
Import Text Data
Extract the text data in the file weekendUpdates.xlsx
using readtable
. The file weekendUpdates.xlsx
contains status updates containing the hashtags "#weekend"
and "#vacation"
. Read the data using the readtable
function and extract the text data from the TextData
column.
filename = "weekendUpdates.xlsx"; tbl = readtable(filename,'TextType','string'); textData = tbl.TextData;
View the first few observations.
textData(1:5)
ans = 5x1 string
"Happy anniversary! ❤ Next stop: Paris! ✈ #vacation"
"Haha, BBQ on the beach, engage smug mode! ❤ #vacation"
"getting ready for Saturday night #yum #weekend "
"Say it with me - I NEED A #VACATION!!! ☹"
" Chilling at home for the first time in ages…This is the life! #weekend"
Preprocess Text Data
Tokenize the text, convert it to lowercase, and remove the stop words.
documents = tokenizedDocument(textData); documents = lower(documents); documents = removeStopWords(documents);
Create a matrix of word counts using a bag-of-words model.
bag = bagOfWords(documents); counts = bag.Counts;
To compute the word co-occurrences, multiply the word-count matrix by its transpose.
cooccurrence = counts.'*counts;
Convert the co-occurrence matrix to a network using the graph
function.
G = graph(cooccurrence,bag.Vocabulary,'omitselfloops');
Visualize the network using the plot
function. Set the line thickness to a multiple of the edge weight.
LWidths = 5*G.Edges.Weight/max(G.Edges.Weight); plot(G,'LineWidth',LWidths) title("Co-occurrence Network")
Find neighbors of the word "great" using the neighbors
function.
word = "great"
word = "great"
idx = find(bag.Vocabulary == word); nbrs = neighbors(G,idx); bag.Vocabulary(nbrs)'
ans = 18x1 string
"next"
"#vacation"
""
"#weekend"
"☹"
"excited"
"flight"
"delayed"
"stuck"
"airport"
"way"
"spend"
""
"lovely"
"friends"
"-"
"mini"
"everybody"
Visualize the co-occurrences of the word "great" by extracting a subgraph of this word and its neighbors.
H = subgraph(G,[idx; nbrs]); LWidths = 5*H.Edges.Weight/max(H.Edges.Weight); plot(H,'LineWidth',LWidths) title("Co-occurrence Network - Word: """ + word + """");
For more information about graphs and network analysis, see Graph and Network Algorithms.
See Also
tokenizedDocument
| bagOfWords
| removeStopWords
| graph