Many repositories of open data for genomics, collected by worldwide consortia, are important enablers of biological research; moreover, all experimental datasets leading to publications in genomics must be deposited to public repositories and made available to the research community. These datasets are typically used by biologists for validating or enriching their experiments; their content is documented by metadata. However, emphasis on data sharing is not matched by accuracy in data documentation; metadata are not standardized across the sources and often unstructured and incomplete. In this paper, we propose a conceptual model of genomic metadata, whose purpose is to query the underlying data sources for locating relevant experimental datasets. First, we analyze the most typical metadata attributes of genomic sources and define their semantic properties. Then, we use a top-down method for building a global-as-view integrated schema, by abstracting the most important conceptual properties of genomic sources. Finally, we describe the validation of the conceptual model by mapping it to three well-known data sources: TCGA, ENCODE, and Gene Expression Omnibus.
The recent success of XML as a standard to represent semi-structured data, and the increasing amount of available XML data, pose new challenges to the data mining community. In this paper we present the X MINE operator a tool we developed to extract XML association rules for XML documents. The operator; that is based on XPath and inspired by the syntax ofXQuery, allows us to express complex mining tasks, compactly and intuitively. X MINE can be used to specify indifferently ( and simultaneously) mining tasks both on the content and on the structure of the data, since the distinction in XML is slight. changed among data mining tools (e.g., PMML [8]); but there are no significant extensions of data mining research taking full advantage of the intrinsic properties of XML. However, it is easy to foresee that the spreading of XML will cause an increasing interest on this subject, going beyond a mere syntactic adaptation to XML of data mining artifacts and techniques.In this paper, we present the X MINE operator, a tool that can be used to extract association rules from native XML documents, shortly "XML association rules", which we first introduced in [6, 5]. The paper is organized as follows. In Section 2 we overview association rules in the context of relational databases. In Section 3 we shortly discuss the notion of association rules for XML while we refer the readers to [6,5] for additional details about the their theoretical foundations. In Section 4 we present the X MINE operator through a serie of intuitive examples. In Section 5 we introduce some basic concepts needed to discuss implementation details. In Section 6 we discuss how XML association rules are extracted from an XML document through XMINE by composing an execution environment for XPath expressions and an algorithm for discovering frequent itemsets. In Section 7 we give some implementation details discussing the current state of the prototype we developed and an outline of the planned future development. We conclude the paper with a discussion of future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.