Machine learning methods are increasingly applied for medical data analysis to reduce human efforts and improve our understanding of disease propagation. When the data is complicated and unstructured, shallow learning methods may not be suitable or feasible. Deep learning neural networks like multilayer perceptron (MLP) and convolutional neural network (CNN), have been incorporated in medical diagnosis and prognosis for better health care practice. For a binary outcome, these learning methods directly output predicted probabilities for patient's health condition. Investigators still need to consider appropriate decision threshold to split the predicted probabilities into positive and negative regions. We review methods to select the cut-off values, including the relatively automatic methods based on optimization of the ROC curve criteria and also the utility-based methods with a net benefit curve. In particular, decision curve analysis (DCA) is now acknowledged in medical studies as a good complement to the ROC analysis for the purpose of decision making. In this paper, we provide the R code to illustrate how to perform the statistical learning methods, select decision threshold to yield the binary prediction and evaluate the accuracy of the resulting classification. This article will help medical decision makers to understand different classification methods and use them in real world scenario.
Mendelian randomization is a technique used to examine the causal effect of a modifiable exposure on a trait using an observational study by utilizing genetic variants. The use of many instruments can help to improve the estimation precision but may suffer bias when the instruments are weakly associated with the exposure. To overcome the difficulty of high‐dimensionality, we propose a model average estimator which involves using different subsets of instruments (single nucleotide polymorphisms, SNPs) to predict the exposure in the first stage, followed by weighting the submodels' predictions using penalization by common penalty functions such as least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). The model averaged predictions are then used as a genetically predicted exposure to obtain the estimation of the causal effect on the response in the second stage. The novelty of our model average estimator also lies in that it allows the number of submodels and the submodels' sizes to grow with the sample size. The practical performance of the estimator is examined in a series of numerical studies. We apply the proposed method on a real genetic dataset investigating the relationship between stature and blood pressure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.