Brandon Nick Sern Ooi scite author profile

Loh

et al. 2019

BackgroundEvidence linking breast size to breast cancer risk has been inconsistent, and its interpretation is often hampered by confounding factors such as body mass index (BMI). Here, we used linkage disequilibrium score regression and two-sample Mendelian randomization (MR) to examine the genetic associations between BMI, breast size and breast cancer risk.MethodsSummary-level genotype data from 23andMe, Inc (breast size, n = 33 790), the Breast Cancer Association Consortium (breast cancer risk, n = 228 951) and the Genetic Investigation of ANthropometric Traits (BMI, n = 183 507) were used for our analyses. In assessing causal relationships, four complementary MR techniques [inverse variance weighted (IVW), weighted median, weighted mode and MR-Egger regression] were used to test the robustness of the results.ResultsThe genetic correlation (rg) estimated between BMI and breast size was high (rg = 0.50, P = 3.89x10−43). All MR methods provided consistent evidence that higher genetically predicted BMI was associated with larger breast size [odds ratio (ORIVW): 2.06 (1.80–2.35), P = 1.38x10−26] and lower overall breast cancer risk [ORIVW: 0.81 (0.74–0.89), P = 9.44x10−6]. No evidence of a relationship between genetically predicted breast size and breast cancer risk was found except when using the weighted median and weighted mode methods, and only with oestrogen receptor (ER)-negative risk. There was no evidence of reverse causality in any of the analyses conducted (P > 0.050).ConclusionOur findings indicate a potential positive causal association between BMI and breast size and a potential negative causal association between BMI and breast cancer risk. We found no clear evidence for a direct relationship between breast size and breast cancer risk.

Towards precision medicine: interrogating the human genome to identify drug pathways associated with potentially functional, population-differentiated polymorphisms

et al. 2019

Drug response variations amongst different individuals/populations are influenced by several factors including allele frequency differences of single nucleotide polymorphisms (SNPs) that functionally affect drug-response genes. Here, we aim to identify drugs that potentially exhibit population differences in response using SNP data mining and analytics. Ninety-one pairwise-comparisons of >22,000,000 SNPs from the 1000 Genomes Project, across 14 different populations, were performed to identify ‘population-differentiated’ SNPs (pdSNPs). Potentially-functional pdSNPs (pf-pdSNPs) were then selected, mapped into genes, and integrated with drug–gene databases to identify ‘population-differentiated’ drugs enriched with genes carrying pf-pdSNPs. 1191 clinically-approved drugs were found to be significantly enriched (Z > 2.58) with genes carrying SNPs that were differentiated in one or more population-pair comparisons. Thirteen drugs were found to be enriched with such differentiated genes across all 91 population-pairs. Notably, 82% of drugs, which were previously reported in the literature to exhibit population differences in response were also found by this method to contain a significant enrichment of population specific differentiated SNPs. Furthermore, drugs with genetic testing labels, or those suspected to cause adverse reactions, contained a significantly larger number (P < 0.01) of population-pairs with enriched pf-pdSNPs compared with those without these labels. This pioneering effort at harnessing big-data pharmacogenomics to identify ‘population differentiated’ drugs could help to facilitate data-driven decision-making for a more personalized medicine.

Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients

Lim

et al. 2022

eBioMedicine

Background Major challenges in large scale genetic association studies include not only the identification of causative single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus proposes a novel feature engineering approach integrating potentially functional coding haplotypes (pfcHap) with machine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, that take into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX) response in rheumatoid arthritis (RA) patients.Methods Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set. Predictive capacity and robustness of the selected features were assessed using six popular machine learning models through a train set cross-validation and evaluated in an unseen test set.Findings Significantly, 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive performance (AUC: 0.776-0.828; Sensitivity: 0.656-0.813; Specificity: 0.684-0.868) across all six ML models in an unseen test dataset for the prediction of MTX response in RA patients.Interpretation Majority of the predictive pfcHap SNPs were predicted to be potentially functional and some of the genes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA pathways.

Insights gained from the reverse engineering of gene networks in keloid fibroblasts

Phan

2011

Theor Biol Med Model

BackgroundKeloids are protrusive claw-like scars that have a propensity to recur even after surgery, and its molecular etiology remains elusive. The goal of reverse engineering is to infer gene networks from observational data, thus providing insight into the inner workings of a cell. However, most attempts at modeling biological networks have been done using simulated data. This study aims to highlight some of the issues involved in working with experimental data, and at the same time gain some insights into the transcriptional regulatory mechanism present in keloid fibroblasts.MethodsMicroarray data from our previous study was combined with microarray data obtained from the literature as well as new microarray data generated by our group. For the physical approach, we used the fREDUCE algorithm for correlating expression values to binding motifs. For the influence approach, we compared the Bayesian algorithm BANJO with the information theoretic method ARACNE in terms of performance in recovering known influence networks obtained from the KEGG database. In addition, we also compared the performance of different normalization methods as well as different types of gene networks.ResultsUsing the physical approach, we found consensus sequences that were active in the keloid condition, as well as some sequences that were responsive to steroids, a commonly used treatment for keloids. From the influence approach, we found that BANJO was better at recovering the gene networks compared to ARACNE and that transcriptional networks were better suited for network recovery compared to cytokine-receptor interaction networks and intracellular signaling networks. We also found that the NFKB transcriptional network that was inferred from normal fibroblast data was more accurate compared to that inferred from keloid data, suggesting a more robust network in the keloid condition.ConclusionsConsensus sequences that were found from this study are possible transcription factor binding sites and could be explored for developing future keloid treatments or for improving the efficacy of current steroid treatments. We also found that the combination of the Bayesian algorithm, RMA normalization and transcriptional networks gave the best reconstruction results and this could serve as a guide for future influence approaches dealing with experimental data.

Machine learning using genetic and clinical data identifies a signature that robustly predicts methotrexate response in rheumatoid arthritis

Lim

et al. 2022

Objective To develop a hypothesis-free model that best predicts response to methotrexate (MTX) drug in rheumatoid arthritis (RA) patients utilizing biologically meaningful genetic feature selection of potentially functional single nucleotide polymorphisms (pfSNPs) through robust machine learning (ML) feature selection methods. Methods MTX-treated RA patients with known response were divided in a 4:1 ratio into training and test sets. From the patients’ exomes, potential features for classifier prediction were identified from pfSNPs and non-genetic factors through ML using recursive feature elimination with cross-validation incorporating Random Forest Classifier. Feature selection was repeated on random subsets of the training cohort, and consensus features were assembled into the final feature set. This feature set was evaluated for predictive potential using six ML classifiers, first by cross-validation within the training set, and finally by analyzing its performance with the unseen test set. Results The final feature set contains 56 pfSNPs and five non-genetic factors. The majority of these pfSNPs are located in pathways related to RA pathogenesis or methotrexate action and are predicted to modulate gene expression. When used for training in six ML classifiers, performance was good in both the training set (AUC : 0·855–0·916, sensitivity : 0·715–0·892 and specificity : 0·733–0·862) in the unseen test set (AUC : 0·751–0·826, sensitivity: 0·581–0·839 and specificity: 0·641–0·923). Conclusion Sensitive and specific predictors of MTX response in RA patients were identified in this study through a novel strategy combining biologically meaningful and machine learning feature selection and training. These predictors may facilitate better treatment decision-making in RA management.