Tweet analysisTwitter is a widely used social networking and micro-blogging service where most celebrities, politicians, and leaders use it to make official and formal statements directed to the public, not seen on any other platforms. It has over 330 million users as of March 2021. Hence, Twitter data is beneficial for understanding trends and people's opinions.
Tweet analysis can be performed in MATLAB to obtain trends about the tweets like popularity, the number of likes, re-tweets count, word clouds, and sentiment analysis. The prerequisites are that you must have:
1) A Twitter developer account (more info at https://developer.twitter.com/en/apply-for-access) and
2) Text Analytics toolbox installed in your current MATLAB version.
For our program, we first take the example of a well-known personality and avid Twitter user with over 48 million followers, Elon Musk, and perform some fundamental analysis of his tweets.
Retrieving twitter data and extracting required information
We first need to create a Twitter object using the Twitter function and pass your Twitter developer credentials as parameters which will look something like this:
connection = twitter(consumerKey,consumerKeySecret,accessToken,accessTokenSecret);
The Twitter object received will have a field "StatusCode" which should have a value of 200 to indicate connection successful/authorized.
Now to search for tweets from Elon Musk, we need his Twitter handle, which we can found here:
We have to make a string in the format "from:@handle" without the '@’and pass it as a search query. The search command in MATLAB will look like this:
response = search(connection,'from:elonmusk','count',100,'lang','en');
For more information on search queries, you can refer to the documentation: https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/search-operators.
All of our required data is stored in the following path:
The structure "statuses" in the path response.Body.Data.statuses have data like date and time of the tweet, number of likes, number of retweets, the mentions and hashtags, etc. We can apply cell functions to the cells of the structure fields' text',' retweet_count',' created_at' to retrieve the tweets, number of likes, and date of the tweet. Whereas hashtags, mentions, and URLs are obtained using for loops as shown:
%retrieving the tweet text,retweetcount,likes and date
tweets = cellfun(@(x) string(x.text), response.Body.Data.statuses);
retweet_count=cellfun(@(x) x.retweet_count, response.Body.Data.statuses);
likes_count=cellfun(@(x) x.favorite_count, response.Body.Data.statuses);
tweet_date=cellfun(@(x) str2num(x.created_at(9:10)), response.Body.Data.statuses);
hashs= cellfun(@(x) x.entities.hashtags, response.Body.Data.statuses,'UniformOutput',false);
The mentions, URLs, and hashtags are stored in a separate field called entities with a 1x1 structure in the presence of the elements just mentioned. We iterate through entities and use for loops to check for structures and also retrieve data from them. After that, we can plot the retrieved information in the form of graphs to get a better representation of the data:
Using functions from the Text Analytics toolbox
Text Analytics Toolbox™ provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. We can use it to make it easy to understand and intuitive representation of text data in Wordclouds, LDA topics, and sentiment analysis.
Word clouds of tweets
A wordcloud is an image composed of words used in a particular text document. The importance or frequency of a word is represented by its size in the image. We can create word clouds in MATLAB using the "wordcloud" function. This function can only be applied to a tokenized document represented as a collection of words (also known as tokens) used for text analysis. The text also needs to be preprocessed and cleaned. We can use functions such as "removeStopwords"," erasePunctuations", and "removeShortWords" etc, to clean the document, which removes stop words like 'to',' and' etc., removes punctuations, and removes 2 letter words, respectively.
The following image shows the code and output of the wordcloud of Elon Musk's tweets.
tweetlist = removeStopWords(tweetlist);
tweetlist = erasePunctuation(tweetlist);
tweetlist = removeShortWords(tweetlist,2);
title("Wordcloud of all tweets")
The same can be done for all the hashtags and mentions by applying the same functions to the list of mentions and hashtags. Hence word clouds prove to be an easy and quick method to get an idea about what someone is tweeting about. In our case, we see that Elon Musk is tweeting about Tesla, WholeMArsBlog, CyberpunkGame, etc.
An n-gram is a contiguous sequence of n items(words) from a sample text document. When n=2, it represents 2 continuous words in a document and is called bigrams. We can use the "bagOfNgrams" function to make a bigram list. When n=3, it is known as trigrams. It can be made in MATLAB using the same "bagOfNgrams" function with an additional parameterNGramLengths' set to 3. We can search for tweets from NASA using the command:
response = search(connection,'from:NASA','count',100,'lang','en')
The bi-grams present in NASA's tweets is shown in a wordcloud below.
Topic modeling is a type of statistical modeling for discovering the abstract "topics" that occur in collecting documents. Latent Dirichlet Allocation (LDA) is an example of a topic model. It is used to classify text in a document to a particular topic. We can form topics and their word clouds in MATLAB using the "fitlda" function. The wordcloud of the 2 topics detected using this function is shown below.
We can incur two topics from the word clouds: ' Kate Rubins' and 'Women Nasa'. These are some of the topics present in the tokenized document identified using the fitlda() function.
Twitter sentiment analysis
Twitter Sentiment analysis is the study of whether the overall positivity and negativity ratio of an individual tweet or a collection of tweets on a topic or from a user. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment. The average sentiment score of several tweets is an excellent indicating factor of the sentiment towards a particular topic. We can compare the sentiment of different topics from the plot shown below.
We see that Bill Gates mostly makes positive tweets. In contrast, the tweets with the hashtag #Corruption have an overall negative sentiment. The tweets with the text 'Olympics' seem to have a positive sentiment as well. The tweets with the hashtag '#Bitcoin' have an overall high positive sentiment score with n average score of more than 0.7. This is under the fact that bitcoin price has gone up in the recent years and particularly recent months and has drawn a lot of hype and popularity on social media platforms like Twitter.
We see that using MATLAB functions to form word clouds, LDA topics and perform sentiment analysis on tweets allows us to quickly analyze and see statistics of a vast amount of Twitter data. A similar analysis can be performed on other data as well. In a world where text-based communication is the standard and is drastically increasing, the volume of textual data produced increases exponentially. These analytics methods help to make sense of the massive archive of textual information obtained as we just implemented. Hence the scope of Twitter and other text-based analyses will be on the rise for years to come.
Did you find some helpful content from our video or article and now looking for its code, model, or application? You can purchase the specific Title, if available, and instantly get the download link.
Thank you for reading this blog. Do share this blog if you found it helpful. If you have any queries, post them in the comments or contact us by emailing your questions to [email protected]. Follow us on LinkedIn Facebook, and Subscribe to our YouTube Channel. If you find any bug or error on this or any other page on our website, please inform us & we will correct it.
If you are looking for free help, you can post your comment below & wait for any community member to respond, which is not guaranteed. You can book Expert Help, a paid service, and get assistance in your requirement. If your timeline allows, we recommend you book the Research Assistance plan. If you want to get trained in MATLAB or Simulink, you may join one of our training modules.
If you are ready for the paid service, share your requirement with necessary attachments & inform us about any Service preference along with the timeline. Once evaluated, we will revert to you with more details and the next suggested step.
Education is our future. MATLAB is our feature. Happy MATLABing!