Dynamic Box Office Forecasting Based on Microblog Data

Runyu Chen, Wei Xu, Xinghan Zhang

Movies, as one of the most rapidly developing industries' outcomes, have gained much attention these years. Especially in China, the world' s second largest film market with a rapid growing speed, many film companies intend to foresee the future box office in advance to better arrange their income and expenditure. Unlike some traditional forecasting model based on several movie-related features, this paper comprehensively utilizes the real-time social media, microblog, to realize a more accurate weekly box office forecasting model. The features weekly extracted from microblogs can be divided into count based features and context based features, along with the existing box office and the screen arrangements, to predict the box office in next week. For count based features, not only the total volume of related microblogs and the diffusion effect considers the number of followers, several unnoticed features like authentication users, gender ratio and mobile-users ratio are also introduced into the original predicting model. For content based features, a duplicate semantic analysis method is proposed. The number of tweets which can indeed influence others' purchase decision, along with the number of tweets with positive and negative influence is the results of the analysis system. On this basis, guided effect for each influential tweets are identified by the praise, comment and retweet times. Some machine learning models are then adopted after using genetic algorithm (GA) for feature selection. The empirical study shows that our research can dynamic forecast box office with a sustainable good performance.