Detecting the presence of genetically modified plants (adventitious presence of unwanted transgenic plants, AP) from outcrossing species such as maize requires a method that lowers laboratory costs without losing precision. Group testing is a procedure in which groups that contain several units (plants) are analysed without having to inspect individual plants, with the purpose of estimating the prevalence of AP in a population at a low cost without losing precision. When pool (group) testing is used to estimate the prevalence of AP (p), there are sampling procedures for calculating a confidence interval (CI); however, they usually do not ensure precision in the estimation ofp. This research proposes a method to determine the number of pools (g), given a pool size (k), that ensures precision in the estimated proportion of AP (that is, it ensures a narrow CI). In addition, the study computes the maximum likelihood estimator ofpunder pool testing and its exact CI, considering the detection limit of the laboratory,d, and the concentration of AP per unit (c). The proposed sample procedure involves two steps: (1) obtain a sample size that guarantees that the mean width of the CI (\overline{w}) is narrower than the desired width (ω); and (2) iteratively increase the sample size until\overline{w}is smaller than the desired width (ω) with a specified degree of certainty (γ). Simulated data were created and tables are presented showing the different possible scenarios that a researcher may encounter. An R program is given and explained that will reproduce the results and make it easy for the researcher to create other scenarios.
The overfitting phenomenon happens when a statistical machine learning model learns very well about the noise as well as the signal that is present in the training data. On the other hand, an underfitted phenomenon occurs when only a few predictors are included in the statistical machine learning model that represents the complete structure of the data pattern poorly. This problem also arises when the training data set is too small and thus an underfitted model does a poor job of fitting the training data and unsatisfactorily predicts new data points. This chapter describes the importance of the trade-off between prediction accuracy and model interpretability, as well as the difference between explanatory and predictive modeling: Explanatory modeling minimizes bias, whereas predictive modeling seeks to minimize the combination of bias and estimation variance. We assess the importance and different methods of cross-validation as well as the importance and strategies of tuning that are key to the successful use of some statistical machine learning methods. We explain the most important metrics for evaluating the prediction performance for continuous, binary, categorical, and count response variables.
This chapter deals with the main theoretical fundamentals and practical issues of using functional regression in the context of genomic prediction. We explain how to represent data in functions by means of basis functions and considered two basis functions: Fourier for periodic or near-periodic data and B-splines for nonperiodic data. We derived the functional regression with a smoothed coefficient function under a fixed model framework and some examples are also provided under this model. A Bayesian version of functional regression is outlined and explained and all details for its implementation in glmnet and BGLR are given. The examples take into account in the predictor the main effects of environments and genotypes and the genotype × environment interaction term. The examples are done with small data sets so that the user can run them on his/her own computer and can understand the implementation process.
In this chapter, we go through the fundamentals of artificial neural networks and deep learning methods. We describe the inspiration for artificial neural networks and how the methods of deep learning are built. We define the activation function and its role in capturing nonlinear patterns in the input data. We explain the universal approximation theorem for understanding the power and limitation of these methods and describe the main topologies of artificial neural networks that play an important role in the successful implementation of these methods. We also describe loss functions (and their penalized versions) and give details about in which circumstances each of them should be used or preferred. In addition to the Ridge, Lasso, and Elastic Net regularization methods, we provide details of the dropout and the early stopping methods. Finally, we provide the backpropagation method and illustrate it with two simple artificial neural networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.