mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

Topçuoğlu, Begüm D.; Lapp, Zena; Sovacool, Kelly L; Snitkin, Evan S.; Wiens, Jenna; Schloss, Patrick D.

doi:10.21105/joss.03073

Cited by 42 publications

(29 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We used the mikropml package to train and evaluate models to predict C. difficile colonization status at 10 days postchallenge where mice were categorized as either cleared or colonized ( 77 , 78 ). We removed the C. difficile genus relative abundance data prior to training the model.…”

Section: Methodsmentioning

confidence: 99%

“…To accommodate the small number of samples in our data set, we used 50% training and 50% testing splits with repeated 2-fold cross-validation of the training data for hyperparameter tuning. Permutation importance was performed as described previously ( 79 ) using mikropml ( 77 , 78 ) with the random forest model because it had the highest AUROC value.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An Osmotic Laxative Renders Mice Susceptible to Prolonged Clostridioides difficile Colonization and Hinders Clearance

Tomkovich

Taylor

King

et al. 2021

mSphere

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

An Osmotic Laxative Renders Mice Susceptible to Prolonged Clostridioides difficile Colonization and Hinders Clearance

Tomkovich

Taylor

King

et al. 2021

mSphere

Self Cite

View full text Add to dashboard Cite

show abstract

“…Utilizing publicly available 16S rRNA sequence data from the stools of patients with SRNs and healthy controls, we generated taxonomic abundance tables with mothur ( 7 ) annotated to phylum, class, order, family, genus, OTU, and ASV levels. Using the taxonomic abundance data and the mikropml R package ( 8 ), we quantified how reliably samples could be classified as SRN or normal using five machine learning methods, including random forest, L2-regularized logistic regression, decision tree, gradient boosted trees (XGBoost), and support vector machine with radial basis kernel (SVM radial). Across the five machine learning methods tested, model performance increased with increasing taxonomic level usually peaking around genus/OTU level before dropping off slightly with ASVs (see Fig.…”

Section: Observationmentioning

confidence: 99%

“…Machine learning models were run with the R package mikropml (v0.0.2) ( 8 ) to predict the diagnosis category (normal versus SRN) of each sample. Data were preprocessed to normalize values (scale/center), remove values with zero or near-zero variance, and collapse colinear features using default parameters.…”

Section: Observationmentioning

confidence: 99%

A Goldilocks Principle for the Gut Microbiome: Taxonomic Resolution Matters for Microbiome-Based Classification of Colorectal Cancer

Armour

Topçuoğlu

Garretto

et al. 2022

mBio

Self Cite

View full text Add to dashboard Cite

show abstract

“…We also oversampled the data so that the number of attacks against healthcare were approximately the same as the number of non-healthcare attacks through generation of ‘synthetic positive instances using ADASYN algorithm. The number of majority neighbors of each minority instance determines the number of synthetic instances generated from the minority instance’ 71 and fit the algorithm a second time using the mikropml R package 72 to produce 15 additional performance metrics for comparison with the original model. For both fitting processes, the categorical variables year, governorate, perpetrator and weapon were one hot encoded to indicator variables; the five infrastructure type variables were already represented by 1s and 0s and represented categorically to indicate if a strike was recorded as present or absent, respectively.…”

Section: Methodsmentioning

confidence: 99%

Overview of attacks against civilian infrastructure during the Syrian civil war, 2012–2018

et al. 2021

View full text Add to dashboard Cite

BackgroundHundreds of thousands of people have been killed during the Syrian civil war and millions more displaced along with an unconscionable amount of destroyed civilian infrastructure.MethodsWe aggregate attack data from Airwars, Physicians for Human Rights and the Safeguarding Health in Conflict Coalition/Insecurity Insight to provide a summary of attacks against civilian infrastructure during the years 2012–2018. Specifically, we explore relationships between date of attack, governorate, perpetrator and weapon for 2689 attacks against five civilian infrastructure classes: healthcare, private, public, school and unknown. Multiple correspondence analysis (MCA) via squared cosine distance, k-means clustering of the MCA row coordinates, binomial lasso classification and Cramer’s V coefficients are used to produce and investigate these correlations.ResultsFrequencies and proportions of attacks against the civilian infrastructure classes by year, governorate, perpetrator and weapon are presented. MCA results identify variation along the first two dimensions for the variables year, governorate, perpetrator and healthcare infrastructure in four topics of interest: (1) Syrian government attacks against healthcare infrastructure, (2) US-led Coalition offensives in Raqqa in 2017, (3) Russian violence in Aleppo in 2016 and (4) airstrikes on non-healthcare infrastructure. These topics of interest are supported by results of the k-means clustering, binomial lasso classification and Cramer’s V coefficients.DiscussionFindings suggest that violence against healthcare infrastructure correlates strongly with specific perpetrators. We hope that the results of this study provide researchers with valuable data and insights that can be used in future analyses to better understand the Syrian conflict.

show abstract

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

Cited by 42 publications

References 14 publications

An Osmotic Laxative Renders Mice Susceptible to Prolonged Clostridioides difficile Colonization and Hinders Clearance

An Osmotic Laxative Renders Mice Susceptible to Prolonged Clostridioides difficile Colonization and Hinders Clearance

A Goldilocks Principle for the Gut Microbiome: Taxonomic Resolution Matters for Microbiome-Based Classification of Colorectal Cancer

Overview of attacks against civilian infrastructure during the Syrian civil war, 2012–2018

Contact Info

Product

Resources

About