2022
DOI: 10.1109/tcbb.2020.2998954
|View full text |Cite
|
Sign up to set email alerts
|

META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration

Abstract: The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
2
2

Relationship

6
3

Authors

Journals

citations
Cited by 29 publications
(33 citation statements)
references
References 61 publications
0
33
0
Order By: Relevance
“…We have previously proposed another conceptual model focused on human genomics ( 2 ), which was based on a central entity representing files of genomic regions, similarly described from various dimensions. We next developed and implemented an integrated database ( 3 ), searchable through the GenoSurf ( 4 ) interface ( http://gmql.eu/genosurf/ ). Thanks to such previous knowledge in human genomics, we have been able to rapidly design VCM and then to deploy ViruSurf.…”
Section: Introductionmentioning
confidence: 99%
“…We have previously proposed another conceptual model focused on human genomics ( 2 ), which was based on a central entity representing files of genomic regions, similarly described from various dimensions. We next developed and implemented an integrated database ( 3 ), searchable through the GenoSurf ( 4 ) interface ( http://gmql.eu/genosurf/ ). Thanks to such previous knowledge in human genomics, we have been able to rapidly design VCM and then to deploy ViruSurf.…”
Section: Introductionmentioning
confidence: 99%
“…The GMQL system contains a multiplicity of public genomic datasets from a variety of sources [44], ready to be used within tertiary analysis pipelines (as shown in [29]); among other sources, it features all the datasets available in the OpenGDC FTP service, providing an interface for browsing and processing data curated in OpenGDC. The produced datasets are also made available within another system, GenoSurf (GenoSurf is available at [45]) [46], a semantic search engine based on a Conceptual Model [47] that integrates TCGA data, imported by OpenGDC, with several sources such as ENCODE [48], Roadmap Epigenomics [49], and 1000 Genomes [50], among others, using the META-BASE integration framework [51].…”
Section: Use Case Examplesmentioning
confidence: 99%
“…Results produced by queries on the search interface (2) are updated to reflect each additional search conditions, and counts of matching sequences are dynamically displayed to help users in assessing if query results match their intents. The interface allows to choose multiple values for each attribute at the same time (these are considered in disjunction); it enables the interplay between the searches performed within parts (2) and (3), thereby allowing to build complex queries given as the logical conjunction -of arbitrary length -of filters set in parts (2) and ( dimension), Tecnology and Organization (from the respective dimensions). It includes attributes which are present in most of the sources, described by an information tab that is opened by clicking on blue circles; values can be selected using dropdown menus.…”
Section: Web Interfacementioning
confidence: 99%