Use Sentiment Analytics to Predict Motion Picture Box-Office

The motion picture industry has long searched the magical formula of predicting future box office performances. A glimpse of future box office performances can help a studio save millions in post-production marketing and distribution expenses.  The same logic applies to all nonprofit presenting and performing arts organizations - if NPOs can predict their future events’ financial outcomes NPOs can save their precious resources and reallocate them to other uses. Social media may just provide this magical prediction formula.

The motion picture industry has developed two paths to predict future box-office through social media platforms – sentiment analysis and employing a collective intelligence group. This article will only focus on the former, the prediction of future box-office outcomes with web metrics and social media sentiment analysis. An example of the method can be found in my previous AMT post “Hollywood Stock Exchange – A League of Its Own”.

The concept of using sentiment analysis builds on the argument that peer influence is a strong motivator when it comes to movie choices. People tend to follow their friends’ advice or opinions in which movies to watch. Sentiment analysis suggests that by identifying keywords and sentiments of social media comments for a particular film, studios can establish a regression model and identify the correlation between positive commentaries and the film’s opening weekend box-office performance. Such analyses require professional social media analytic software or outsourced online analytic services, for example Social Radar. 

The software first searches selected social media platforms and collects an incredible amount of online conversation data related to specific movies. The software then analyzes the content and grammatical structures of collected conversations using linguistic analysis technologies to gain a more accurate understanding of selected comments. For example: “I saw American Sniper last night and it wasn’t bad! I would highly recommend it!”  

The software begins the analysis by breaking down content to the component parts. The subject, “American Sniper,” is extracted and identified. Once the subject is identified the software begins the sentiment analysis. There are many data sentiment analytic software and services available, ranging in quality. A regular sentiment analysis software will recognize “bad” and interpret the comment as negative. A better software will recognize both “bad” and “recommend” and interpret the comment as neutral. Moreover, the best software will recognize “wasn’t bad” and “highly recommend” and interpret the comment as positive. 

The software analyzes and collects both positive and negative comments. It generates a film’s total positive commentaries by subtracting the amount of negative comments from positive ones. Studios can then establish a fixed ratio between opening weekend box-office and total positive commentaries of previous released films.

To test this hypothesis, online analytics firm Infegy collected and analyzed 10 films released in 2012 and 2013 as follows:

The chart shows the ordering of opening weekend box-office revenues matched with the number of positive comments received. It also shows the amount of opening weekend box-office generated by each positive comment for each movie. The data is somewhat consistent: each positive comment generates $283.13 in the opening weekend box-office on average. For 67% of selected samples each comment generates from $236.23 to $330.01 in box-office during their opening weekend.  As the data accumulates, studios are able to use historical data to better tailor their desired metrics.

Compared with collective intelligence, sentiment analysis seems primitive and has limited accuracy. Nevertheless, both data analytics technology and total social media users are growing rapidly, it would be wise for organizations with the resource to consider investing in sentiment analytics even it’s only for the sake of providing sufficient historical data in the future.