2018
DOI: 10.1155/2018/9391635
|View full text |Cite
|
Sign up to set email alerts
|

Framework for Parallel Preprocessing of Microarray Data Using Hadoop

Abstract: Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…This microarray data provided a gene expression profile of the liver from 32 patients with advanced NAFLD (fibrosis stages 3–4) and 40 patients with mild NAFLD (fibrosis stages 0–1). The GSE49541 dataset underwent independent normalization using robust multiarray analysis (RMA) [ 14 ] at the NCBI, followed by log2 transformation and quantile normalization. To mitigate batch effects, ComBat was applied to the normalized combined dataset.…”
Section: Methodsmentioning
confidence: 99%
“…This microarray data provided a gene expression profile of the liver from 32 patients with advanced NAFLD (fibrosis stages 3–4) and 40 patients with mild NAFLD (fibrosis stages 0–1). The GSE49541 dataset underwent independent normalization using robust multiarray analysis (RMA) [ 14 ] at the NCBI, followed by log2 transformation and quantile normalization. To mitigate batch effects, ComBat was applied to the normalized combined dataset.…”
Section: Methodsmentioning
confidence: 99%
“…Then, microarray analyses and experiments were completed, following which, the Agilent custom algorithm was used to design the probe sets that were printed on the GPL18056 platform. The robust multiarray average algorithm ( Sahlabadi et al, 2018 ) was used to perform quartile data standardization of the downloaded data and background correction. We filtered the lack of corresponding gene symbols for the probes, and reserved the maximum values of the gene symbols using multiple probes.…”
Section: Methodsmentioning
confidence: 99%
“…For lncRNA-based classification, datasets were selected according to the following criteria: (1) dataset having high HbF vs. normal HbF expression profiles ;(2) dataset with sample no.>3; and (3) only Affymetrix HG-U133 plus 2.0 arrays expression profiles were included in our study. The raw (.CEL) files were downloaded, quantile normalized, Log2transformed and background corrected by using Robust Multichip Average (RMA, Windows Version) (Sahlabadi et al 2018). Finally, we obtained probe ID-centric gene expression datasets.…”
Section: Data Acquisition and Processingmentioning
confidence: 99%