In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.
Risk management of electric power transmission lines requires knowledge from different areas such as the environment, land, investors, regulations, and engineering. Despite the widespread availability of databases for most of those areas, integrating them into a single database or model is a challenging problem. Instead, in this paper, we propose a novel method to calculate risk probabilities on the implementation of transmission lines based on unstructured text from a single source. It uses the Brazilian National Electric Energy Agency’s (ANEEL) weekly reports, which contain decisions about the electrical grid comprising most of the aforementioned areas. Since the data are unstructured (text), we employed NLP techniques such as stemming and tokenization to identify keywords related to common causes of risks provided by an expert group on energy transmission. Then, we used models to estimate the probability of each risk. This method differs from previous works, which were based on structured data (numerical or categorical) from single or multiple sources. Our results show that we were able to extract relevant keywords from the ANEEL reports that enabled our proposed method to estimate the probability of 97 risks out of 233 listed by an expert.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.