2016
DOI: 10.1101/044578
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata

Abstract: The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associa… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…78 Bioinformatic analysis of RNASeq reads adhered to ENCODE guidelines and best practices for RNASeq. 79 Briefly, alignment of adapter-trimmed (Skewer v0.1.123) 80 2 × 150 bp paired-end strand-specific Illumina reads to the GRCh38.p14 genome was achieved with the Spliced Transcripts Alignment to a Reference (STAR v2.5.3a) software, 81 and a splice-junction aware aligner using Ensembl annotation. 82 Expression estimation was conducted using RSEM v1.3.0 (RNASeq by Expectation Maximization).…”
Section: Methodsmentioning
confidence: 99%
“…78 Bioinformatic analysis of RNASeq reads adhered to ENCODE guidelines and best practices for RNASeq. 79 Briefly, alignment of adapter-trimmed (Skewer v0.1.123) 80 2 × 150 bp paired-end strand-specific Illumina reads to the GRCh38.p14 genome was achieved with the Spliced Transcripts Alignment to a Reference (STAR v2.5.3a) software, 81 and a splice-junction aware aligner using Ensembl annotation. 82 Expression estimation was conducted using RSEM v1.3.0 (RNASeq by Expectation Maximization).…”
Section: Methodsmentioning
confidence: 99%
“…To fulfill these requirements, the DCC developed a hybrid relational‐object data store system known as SnoVault, as described in Hitz et al. (). All experimental components stored in the database are modeled as JSON objects, which is a file format that stores information as key‐value pairs.…”
Section: Commentarymentioning
confidence: 99%
“…The ENCODE project has the goal of identifying all functional elements in the human and mouse genomes. The ENCODE portal (Davis et al., ; Hitz et al., ; Hong et al., ; Sloan et al., ) serves as the canonical source of ENCODE data, and is actively maintained by the ENCODE Data Coordination Center (DCC) to update the relevant experimental data and metadata and provide visualization and analysis tools for the scientific community. For this reason, researchers seeking to use ENCODE data should always use the ENCODE portal to ensure that they get the most up‐to‐date analysis results for their experiments, as well as metadata about data provenance and experimental methods.…”
Section: Introductionmentioning
confidence: 99%
“…CistromeDB developed by Liu and colleagues curated and processed a huge collection of human and mouse ChIP-seq and chromatin accessibility datasets from GEO with a standard analysis pipeline ChiLin [9], and further evaluated individual data quality under several scoring metrics [10]. High-quality processed ChIP-seq data generated by ENCODE consortium, including histone modification, chromatin regulator and transcription factor binding data in a selected set of biological samples, are also available through its data portal [11]. ENCODE and CistromeDB provide access to the processed data, and the corresponding metadata including the sources and properties of biological samples, experimental protocols, the antibody used and others, which offer opportunities for users to re-analyze the data and identify the genome-wide targets of a transcription regulator in different cell lines and tissues.…”
Section: Introductionmentioning
confidence: 99%