Identifying the impact of G-Quadruplexes on Affymetrix 3′ Arrays using Cloud Computing

Memon, F. N.; Owen, Anne M.; Sanchez-Graillet, Olivia; Upton, Graham; Harrison, Andrew

doi:10.1515/jib-2010-111

Cited by 13 publications

(6 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sequence-specific motifs are an issue in microarray data [18,35] and have also been shown to affect RNA-seq data [38] as well as RNA primers [9], resulting in sequencespecific deviations in the distribution of mapped reads to a reference genome [27,15]. Furthermore GC content effects have been demonstrated in both Microarray and RNA-seq data [26].…”

Section: Methodsmentioning

confidence: 99%

“…The effect of extremes of GC content in sequencing data (as well as microarray data) has been discussed in numerous studies [6,18], and we therefore also investigate the effect of the mean GC content of reads within the exonḡe and the GC content of the 4-mer motif itself gm. In order to partition reads by mean GC content (which we will discuss later) we also define binned GC content ranges (30-40%, 40-50%, 50-60% and 60-70%) forḡe as follows:…”

Section: Phase II -Motif Correlations Analysismentioning

confidence: 99%

See 1 more Smart Citation

Transcriptomics: Quantifying Non-Uniform Read Distribution Using MapReduce

Alnasir

Shanahan

2018

Int. J. Found. Comput. Sci.

View full text Add to dashboard Cite

RNA-seq is a high-throughput Next-sequencing technique for estimating the concentration of all transcripts in a transcriptome. The method involves complex preparatory and post-processing steps which can introduce bias, and the technique produces a large amount of data [7, 19]. Two important challenges in processing RNA-seq data are therefore the ability to process a vast amount of data, and methods to quantify the bias in public RNA-seq datasets. We describe a novel analysis method, based on analysing sequence motif correlations, that employs MapReduce on Apache Spark to quantify bias in Next-generation sequencing (NGS) data at the deep exon level. Our implementation is designed specifically for processing large datasets and allows for scalability and deployment on cloud service providers offering MapReduce. In investigating the wild and mutant organism types in the species D. melanogaster we have found that motifs with runs of Gs (or their complement) exhibit low motif-pair correlations in comparison with other motif-pairs. This is independent of the mean exon GC content in the wild type data, but there is a mild dependence in the mutant data. Hence, whilst both datasets show the same trends, there is however significant variation between the two samples.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Phase II -Motif Correlations Analysismentioning

confidence: 99%

Transcriptomics: Quantifying Non-Uniform Read Distribution Using MapReduce

Alnasir

Shanahan

2018

Int. J. Found. Comput. Sci.

View full text Add to dashboard Cite

show abstract

“…In this work, some of the experiments which use the Human GeneChip called HG-U133A were downloaded and analyzed. The analysis, carried out using the R statistical language, was to determine whether runs of guanine in the probe sequences (runs of 4 or more 'G's) were producing a significant bias in the gene expression data [14,39].…”

Section: Understanding the Microarray Datamentioning

confidence: 99%

Self-service infrastructure container for data intensive application

Musa

Walker

Owen

et al. 2014

Journal of Cloud Computing: Advances, Systems and Applications

View full text Add to dashboard Cite

Cloud based scientific data management -storage, transfer, analysis, and inference extraction -is attracting interest. In this paper, we propose a next generation cloud deployment model suitable for data intensive applications. Our model is a flexible and self-service container-based infrastructure that delivers -network, computing, and storage resources together with the logic to dynamically manage the components in a holistic manner. We demonstrate the strength of our model with a bioinformatics application. Dynamic algorithms for resource provisioning and job allocation suitable for the chosen dataset are packaged and delivered in a privileged virtual machine as part of the container. We tested the model on our private internal experimental cloud that is built on low-cost commodity hardware. We demonstrate the capability of our model to create the required network and computing resources and allocate submitted jobs. The results obtained shows the benefits of increased automation in terms of both a significant improvement in the time to complete a data analysis and a reduction in the cost of analysis. The algorithms proposed reduced the cost of performing analysis by 50% at 15 GB of data analysis. The total time between submitting a job and writing the results after analysis also reduced by more than 1 hr at 15 GB of data analysis.

show abstract

“…The cloud has become an appealing alternative high-performance computing platform for ad-hoc analytics since it offers on-demand computing and storage resources, along with scalability and low maintenance costs [4], [5], [6], [7]. This has led to a variety of research for supporting analytics in computational biology and bioinformatics on the cloud (for example, [8], [9], [10] and [11]). …”

Section: Introductionmentioning

confidence: 99%

RBioCloud: A Light-Weight Framework for Bioconductor and R-based Jobs on the Cloud

Varghese

Patel

Barker

2015

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Abstract-Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by over 700 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While Biologists and Bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as 'RBioCloud', is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is available from http://www.rbiocloud.com.

show abstract

Identifying the impact of G-Quadruplexes on Affymetrix 3′ Arrays using Cloud Computing

Cited by 13 publications

References 8 publications

Transcriptomics: Quantifying Non-Uniform Read Distribution Using MapReduce

Transcriptomics: Quantifying Non-Uniform Read Distribution Using MapReduce

Self-service infrastructure container for data intensive application

RBioCloud: A Light-Weight Framework for Bioconductor and R-based Jobs on the Cloud

Contact Info

Product

Resources

About