The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there's a lack of definition of an evaluation protocol that must be applied by each tool.
Multi-label classification (MLC) is a very explored field in recent years. The most common approaches that deal with MLC problems are classified into two groups: (i) problem transformation which aims to adapt the multi-label data, making the use of traditional binary or multiclass classification algorithms feasible, and (ii) algorithm adaptation which focuses on modifying algorithms used into binary or multiclass classification, enabling them to make multi-label predictions. Several approaches have been proposed aiming to explore the relationships among the labels, with some of them through the transformation of a flat multilabel label space into a hierarchical multi-label label space, creating a tree-structured label taxonomy and inducing a hierarchical multilabel classifier to solve the classification problem. This paper presents a novel method in which a label hierarchy structured as a directed acyclic graph (DAG) is created from the multi-label label space, taking into account the label co-occurrences using the notion of closed frequent labelset. With this, it is possible to solve an MLC task as if it was a hierarchical multi-label classification (HMC) task. Global and local HMC approaches were tested with the obtained label hierarchies and compared with the approaches using tree-structured label hierarchies showing very competitive results. The main advantage of Article Title the proposed approach is better exploration and representation of the relationships between labels through the use of DAG-structured taxonomies, improving the results. Experimental results over 32 multi-label datasets from different domains showed that the proposed approach is better than related approaches in most of the multi-label evaluation measures. Moreover, we found that both tree and in specially DAG-structured label hierarchies combined with a local hierarchical classifier are more suitable to deal with imbalanced multi-label datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.