2018
DOI: 10.1093/bioinformatics/bty688
|View full text |Cite
|
Sign up to set email alerts
|

Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data

Abstract: The GMQL system is freely available for non-commercial use as open source project at: http://www.bioinformatics.deib.polimi.it/GMQLsystem/.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

5
2

Authors

Journals

citations
Cited by 51 publications
(60 citation statements)
references
References 40 publications
0
60
0
Order By: Relevance
“…We built an exploration mechanism for supporting semantic queries upon our Genomic Knowledge Graph; we demonstrated the effectiveness of our approach through four examples which are representative of the use of our query interface. Our repository is already storing data coming from eight data sources of genomic data, including datasets relevant for epigenomics, gene expression data, mutation data, deployed in conjunction with an advanced genomic data manager [9], available at http://gmql.eu/gmql-rest/).…”
Section: Discussionmentioning
confidence: 99%
“…We built an exploration mechanism for supporting semantic queries upon our Genomic Knowledge Graph; we demonstrated the effectiveness of our approach through four examples which are representative of the use of our query interface. Our repository is already storing data coming from eight data sources of genomic data, including datasets relevant for epigenomics, gene expression data, mutation data, deployed in conjunction with an advanced genomic data manager [9], available at http://gmql.eu/gmql-rest/).…”
Section: Discussionmentioning
confidence: 99%
“…The latter is a cloud-based data manager for region-based data, supporting a new query language for genomics, called GenoMetric Query Language, GMQL [15]. The language derives from classical abstractions of relational databases and is the composition of orthogonal operations, which apply to either one or two datasets.…”
Section: Geco Resourcesmentioning
confidence: 99%
“…The associated GMQL query system [15] has a modular architecture including an intermediate representation supporting operations over regions and metadata which are executed by the Apache Spark engine, a data frameworks on the cloud that proved to be extremely efficient in supporting Fig. 2 First two components of the PCA on the two selected datasets form the Curated Ovarian Datasets highlights strong batch effects, that hinders the integration of the two massive genomic queries [16], with a high-level technology-independent repository abstraction, supporting different repository types (e.g., local file system, Hadoop File System, or others), several system interfaces, including an intuitive public Web-based interface, 2 as well as two programmatic interfaces: a pyGMQL library for Python 3 and a RGMQL package 4 for the R/Bioconductor environment.…”
Section: Geco Resourcesmentioning
confidence: 99%
“…We downloaded the 33 ENCODE CTCF Narrow Peak tracks (Table S1) from the UCSC Browser 1 . For each CTCF binding site we then associate its enrichment signal for each of the Chip-seq tracks (using the map operation of PyGMQL (Masseroli et al, 2019). Before aggregating the 33 signal values for every CTCF binding site, we assessed the value distribution of every CTCF Chip-seq experiment and found heterogeneous distributions across cell lines, lineages and laboratories.…”
Section: Assigning Scores To Ctcf Binding Sitesmentioning
confidence: 99%