Previous studies have employed machine learning tools to classify films according to success to guide a reduction in the degree of uncertainty of film production. We revisited the literature to contribute to three relevant issues in classifying films according to economic success. First, we explored the differences between the results of the shortest or longest samples in terms of time to study possible changes in patterns of consumption mainly due to technological changes and between total and wide-released films. Second, we used profits free of price inflation as measures of economic success instead of the usual box office nominal revenues. Third, we employed a smaller set of features, only the ones available at the time of production, to help producers maneuver contingencies since little or nothing can be done by the time a film is in the theaters. We followed the literature to choose the classifiers - Random Forest, Support Vector Machine, and Neural Network - and designed sub-datasets to model and compare the performance of our results. Our dataset includes all films with budgets disclosed at the Box Office Mojo website, resulting in 3167 movies released at theaters worldwide between 1980 and 2019. The Random Forest results outperform previous similar studies with different sampling in time, including results for a less usual larger sample, with the best data sample about 97% both in accuracy and F1-score.
Supplementary Information
The online version contains supplementary material available at 10.1007/s11042-023-15169-4.