This work applies Matrix Completion (MC) – a class of machine-learning methods commonly used in recommendation systems – to analyze economic complexity. In this paper MC is applied to reconstruct the Revealed Comparative Advantage (RCA) matrix, whose elements express the relative advantage of countries in given classes of products, as evidenced by yearly trade flows. A high-accuracy binary classifier is derived from the MC application to discriminate between elements of the RCA matrix that are, respectively, higher/lower than one. We introduce a novel Matrix cOmpletion iNdex of Economic complexitY (MONEY) based on MC and related to the degree of predictability of the RCA entries of different countries (the lower the predictability, the higher the complexity). Differently from previously-developed economic complexity indices, MONEY takes into account several singular vectors of the matrix reconstructed by MC. In contrast, other indices are based only on one/two eigenvectors of a suitable symmetric matrix derived from the RCA matrix. Finally, MC is compared with state-of-the-art economic complexity indices, showing that the MC-based classifier achieves better performance than previous methods based on the application of machine learning to economic complexity.
Economic complexity and machine learning have recently become popular approaches for analysing international trade. However, for effective use of machine learning in relation to economic complexity and policymaking, it is important to understand what are the key features for predictions. In this framework, this article addresses the issue of the interpretability of results obtained with a machine learning technique—namely, matrix completion—when applied to economic complexity, specifically in predicting revealed comparative advantages (RCAs) of countries in different product categories. Shapley values are used to measure the role each country plays in predicting the RCAs of other countries. Countries relevant for prediction may differ from countries whose RCA values are similar to those of the country of interest when a standard similarity measure such as cosine similarity is used. We demonstrate the usefulness of our approach to identifying comparable countries by focussing our analysis on export diversification into complex goods of selected European countries.
This paper is focused on the unbalanced fixed effects panel data model. This is a linear regression model able to represent unobserved heterogeneity in the data, by allowing each two distinct observational units to have possibly different numbers of associated observations. We specifically address the case in which the model includes the additional possibility of controlling the conditional variance of the output given the input and the selection probabilities of the different units per unit time. This is achieved by varying the cost associated with the supervision of each training example. Assuming an upper bound on the expected total supervision cost and fixing the expected number of observed units for each instant, we analyze and optimize the trade-off between sample size, precision of supervision (the reciprocal of the conditional variance of the output) and selection probabilities. This is obtained by formulating and solving a suitable optimization problem. The formulation of such a problem is based on a large-sample upper bound on the generalization error associated with the estimates of the parameters of the unbalanced fixed effects panel data model, conditioned on the training input dataset. We prove that, under appropriate assumptions, in some cases “many but bad” examples provide a smaller large-sample upper bound on the conditional generalization error than “few but good” ones, whereas in other cases the opposite occurs. We conclude discussing possible applications of the presented results, and extensions of the proposed optimization framework to other panel data models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.