What does Twitter say about Padmavati?
What should have been the release of just another Bollywood movie, is now a full fledged political and social controversy in India. In fact, it is no longer restricted to India. An article by The Guardian reports that a London-based Hindu charity, Rajput Samaj of UK, has opposed the certification of the film for release in UK and has said that it would hold a peaceful protest against the release of the movie.
While the Indian news rooms have turned into verbal war zones with endless debates, I decided to understand what the people are really saying. And what better place to go to than Twitter, where trolls, celebrities and other common folk, all have 140 characters (now 280) to speak their mind.
Using a custom python script and Twitter API, I retrieved JSON tweets that contained the hashtag, Padmavati. I collected a set of 67,638 tweets in the period between November 6 and November 26. Now, I know that I may have missed on some tweets that discussed the Padmavati issue and that did not contain the hashtag Padmavati, but I am sure that the ones that did actually spoke about Padmavati (or at least most of them did).
With the corpus of tweets obtained, I proceeded with some basic data exploration. Since one of the fields in the JSON tweet contained information about the language of the tweet, I plotted a pie chart to visualize the distribution of tweets in various languages.
As expected majority of the Tweets were in English, followed by Hindi. About 71 % of the tweets were written in English and about 19 % of the tweets were in Hindi. The rest were a minute percentage of either undefined or other languages.
As I originally wanted to understand what the general sentiment of the people is, I did not do much exploratory analysis. I proceeded to perform sentiment analysis on the text of the tweets. I used the TextBlob python package, which essentially uses the nltk package under the hood, for my analysis.
One of the most important steps before analysing tweets is the pre-processing of the text of the tweet. The text of a tweet usually contains more than just text. It includes combinations of hashtags, @-mentions, URLs and other special characters. It also contains short forms of commonly used words. Therefore, it is necessary to perform some pre-processing before we could use it for analysis. I used regular-expressions (regex) to find patterns and remove URLs, hashtags, html-tags, numbers and @-mentions from all the tweets.
After some pre-processing, I applied TextBlob’s sentiment polarity function to find the polarity of text in the tweet. The function returns a value of 1.00 if the text has a highly positive sentiment and returns a value of -1.00 if the text has a highly negative sentiment. I retrieved values for all the tweets and stored them seperately. In order to ease the understanding of the sentiment values, I labelled each value between 0.00 and 1.00 as positive, 0.00 as neutral and a value between 0.00 and -1.00 as negative. This gave me 19,903 positive tweets, 19,194 neutral tweets and 9395 negative tweets. Here’s an example of a tweet that was classified as negative.
Seems now everywhere,every everything revolves around tis crap film, time wind up all tweets,ensuring v r not PROs working freely promote tis numb flick disgracing Bollywood
And here’s a tweet that was classified as positive.
I support ! She is a Strog, Fearless, Amazing woman and I have her back! Violence is never the answer no matter what! Love & understanding is!
Great! So we now have an idea of how the people think. But what are they really saying? I tried to find this out using my my trusted LDA (Latent Dirichlet Allocation) algorithm. When it comes to understanding the distribution of topics, I like to use either the LDA algorithm or K-Means clustering algorithm. I tried to find out 5 topics that were most discussed in the set of tweets. Here are the results.
The topics seems to revolve around Karni Sena, Deepika Padukone, hindu history of India, freedom to make a movie and regarding the release of the film.
Cool! Finally we have some ‘news’ on what the people think about this topic. I hope you enjoyed reading it as much as I enjoyed writing it. I tried to keep it as simple as possible while trying to give some technical details. Until next time!
References:
Padmavati image credit : http://www.hindustantimes.com/rf/image_size_960x540/HT/p2/2017/09/21/Pictures/bhansali-first-poster-padmavati-deepika-padukone-sanjay_08aca446-9e79-11e7-ba2d-20fa1b34073f.jpg