Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values.
Quality of Service (QoS) management in IP networks today relies on static configuration of classes of service definitions and related forwarding priorities. Packets are actually classified according to the DiffServ architecture based on the RFC 4594, typically thanks to static configuration or filters matching packet features, at network access equipment. In this paper, we propose a dynamic classification procedure, referred to as Learning-powered DiffServ (L-DiffServ), able to detect the distinctive characteristics of traffic and to dynamically assign service classes to IP packets. The idea is to apply semi-unsupervised Machine Learning techniques, such as Linear Discriminant Analysis (LDA) and K-Means, with a proper customization to take into account the issues related to packet-level analysis, i.e. unbalanced distribution of traffic among classes and selection of proper IP header related features. The performance evaluation highlights that L-DiffServ is able to change dynamically the classification outcome, providing an higher number of classes than DiffServ. This last result represents the first step toward a more granular differentiation of IP traffic.
Universities and other organizations providing higher level education are collectively called Higher Education Institutions. Their detail data, for instance number of students, number of graduates, etc., constitute the basis for several important analyses of the educational systems. This work provides data of the European Tertiary Education Register (ETER), which describes the Educational Institutions of Europe. These data have been gathered through the National Statistical Authorities of all the Countries participant in the ETER Project. However, they include many scattered missing values. Therefore, we have developed and applied an imputation methodology (see “Imputation Techniques for the Reconstruction of Missing Interconnected Data from Higher Educational Institutions, Bruni et al. [3]) to replace the missing values with feasible values being as similar as possible to the original values that have been lost and are now unknown. Thus, we also provide the imputed version of the same dataset, which allows more in-depth analyses of the European Higher Education Institutions. Both datasets (before and after imputation) are provided in two versions: with or without bibliometric information for the Institutions, so the user can also consider these additional information if interested.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.