2018
DOI: 10.1038/nmeth.4556
|View full text |Cite
|
Sign up to set email alerts
|

GIGGLE: a search engine for large-scale integrated genome analysis

Abstract: GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
148
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 167 publications
(149 citation statements)
references
References 22 publications
1
148
0
Order By: Relevance
“…Given these conditions, we can either pre-merge the index of all files or use a data structure to dynamically update index as we parse files. This would be similar to the interval data structure used by GIGGLE (Layer et al 2018), which uses a B+ tree to create an index from thousands of genomic data and annotation files. We would like to extend the File Server to use giggle or giggle like structures for querying data, or an index structure that is updated dynamically as more queries are processed across files.…”
Section: Discussionmentioning
confidence: 99%
“…Given these conditions, we can either pre-merge the index of all files or use a data structure to dynamically update index as we parse files. This would be similar to the interval data structure used by GIGGLE (Layer et al 2018), which uses a B+ tree to create an index from thousands of genomic data and annotation files. We would like to extend the File Server to use giggle or giggle like structures for querying data, or an index structure that is updated dynamically as more queries are processed across files.…”
Section: Discussionmentioning
confidence: 99%
“…Analysis modules perform various types of genomic data integration to produce functional evidence including tissue-specific regulatory elements (enhancers), transcription factor (TF) activity, chromatin states, and genetic regulation (eQTL) information. SparkINFERNO implements scalable genomic querying ( Supplementary Figures S2, S3) using Spark parallel transformations and Giggle-based genomic indexing (Layer et al, 2018). SparkINFERNO can be extended with additional annotation data and/or customized evaluation modules.…”
Section: Methodsmentioning
confidence: 99%
“…Genes are defined as overlapping with the genomic intervals based on coordinates of canonical APPRIS (Rodriguez et al, 2013) transcript isoforms if available or alternatively the longest transcript for a given gene; the overlap step is performed with GIGGLE (Layer et al, 2018). Users can extend the gene region by up to 20 kbp in upstream or downstream direction since most of the strong eQTLs (expression quantitative trait loci) regulating gene expression are found in that region (Veyrieras et al, 2008).…”
Section: Identification Of Overlapping Mendelian Genesmentioning
confidence: 99%