Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

Lee, Loong Chuen; Liong, Choong Yeun; Jemain, Abdul Aziz

doi:10.1016/j.microc.2018.02.009

Cited by 31 publications

(22 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multiple observations recorded for each cow were averaged for each variable. Data were split into datasets containing 70% and 30% of observations, according to a stratified random sampling procedure [38]. The stratification was conducted as follows: cows were first grouped as (1) healthy (cows that did not get sick during the first 17 days postpartum); (2) having developed 1 condition (either metritis, HYK, or mastitis) during the first 17 d postpartum; or (3) having developed ≥2 conditions during the first 17 d postpartum.…”

Section: Data Preparationmentioning

confidence: 99%

Predicting Disease in Transition Dairy Cattle Based on Behaviors Measured Before Calving

Sahar

Beaver

Keyserlingk

et al. 2020

Animals

View full text Add to dashboard Cite

Dairy cattle are particularly susceptible to metritis, hyperketonemia (HYK), and mastitis in the weeks after calving. These high-prevalence transition diseases adversely affect animal welfare, milk production, and profitability. Our aim was to use prepartum behavior to predict which cows have an increased risk of developing these conditions after calving. The behavior of 213 multiparous and 105 primiparous Holsteins was recorded for approximately three weeks before calving by an electronic feeding system. Cows were also monitored for signs of metritis, HYK, and mastitis in the weeks after calving. The data were split using a stratified random method: we used 70% of our data (hereafter referred to as the “training” dataset) to develop the model and the remaining 30% of data (i.e., the “test” dataset) to assess the model’s predictive ability. Separate models were developed for primiparous and multiparous animals. The area under the receiver operating characteristic (ROC) curve using the test dataset for multiparous cows was 0.83, sensitivity and specificity were 73% and 80%, positive predictive value (PPV) was 73%, and negative predictive value (NPV) was 80%. The area under the ROC curve using the test dataset for primiparous cows was 0.86, sensitivity and specificity were 71% and 84%, PPV was 77%, and NPV was 80%. We conclude that prepartum behavior can be used to predict cows at risk of metritis, HYK, and mastitis after calving.

show abstract

Section: Data Preparationmentioning

confidence: 99%

Predicting Disease in Transition Dairy Cattle Based on Behaviors Measured Before Calving

Sahar

Beaver

Keyserlingk

et al. 2020

Animals

View full text Add to dashboard Cite

show abstract

“…The primary spectral dataset consisting of 1361 samples and 5401 variables has been studied and reported elsewhere (Lee, Liong, & Jemain, 2018b, 2018c, 2019a, 2019b. The practical purpose of classification model is to predict brand of unknown pen inks using based on ATR-FTIR spectrum of the ink entry.…”

Section: Atr-ftir Spectral Datasetmentioning

confidence: 99%

“…Table 01 shows the number of spectrum according to ten different pen brands. More details about the spectra collection procedures can be referred to Lee, Liong, and Jemain (2018b). The dataset was first truncated and included only region between 2000-1600 cm -1 ; and then preprocessed using Asymmetric Least Squares (AsLS) algorithm (Eilers & Boelens, 2005).…”

Section: Atr-ftir Spectral Datasetmentioning

confidence: 99%

Comparison Of Stratified And Random Iterative Sampling In Evaluation Of Pls-Da Model

Lee¹

2020

European Proceedings of Social and Behavioural Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…A recent study showed that, in some cases, no statistically significant differences were noted between the two approaches in the final results. 56 This, however, should not be generalized and, as a rule of thumb, calibration/training and prediction sets should be independent. Notice that independence can have different meanings, in terms of sampling, depending on the target of analysis.…”

Section: Validation and Figures Of Meritmentioning

confidence: 99%

“…Lee et al 56,133 recently published two relevant studies that evaluated the appropriate use of chemometrics in classification of ink lines in questioned documents. In the first, 56 they discussed whether ink strokes made with the same pen should be considered replicates or independent samples for classification purposes and the best way for splitting the dataset into training and test sets for external validation. For this, they analyzed 1361 strokes made by 273 blue gel pen inks from ten different brands and from 23 models using ATR-FTIR.…”

Section: Infrared Spectroscopymentioning

confidence: 99%

Vibrational Spectroscopy and Chemometrics in Forensic Chemistry: Critical Review, Current Trends and Challenges

Silva¹,

Braz²,

Pimentel³

2019

J. Braz. Chem. Soc.

View full text Add to dashboard Cite

The present manuscript makes an extensive review of the scientific approaches developed in the last decade involving infrared and Raman spectroscopy combined with chemometrics for solving several issues in the investigation of the most relevant forensic traces, such as questioned documents and currency, explosives, gunshot residues, illicit drugs and body fluids. In addition, current trends, main challenges and the adequate use of several chemometric techniques are discussed. Principal component analysis (PCA) was found to be the most used technique. This unsupervised approach, however, has sometimes been misunderstood as a classification technique. Discriminant analysis techniques are widely employed, leaving a range of possibilities for application of class-modeling techniques, particularly in cases of problems regarding only one target class. In addition, increasingly complex dataset structures frequently require nonlinear approaches or flexible techniques such as multivariate curve resolution-alternating least squares (MCR-ALS). Results reporting, however, still lack reliable quality parameters and sample representativeness, posing a significant challenge to the solution of forensic problems. Regarding the analytical techniques, Raman has been playing an important role, especially in the area of questioned documents and of body fluids. Portable and hyperspectral imaging infrared spectrometers have also been showing significant potential in forensic applications.

show abstract

Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

Cited by 31 publications

References 33 publications

Predicting Disease in Transition Dairy Cattle Based on Behaviors Measured Before Calving

Predicting Disease in Transition Dairy Cattle Based on Behaviors Measured Before Calving

Comparison Of Stratified And Random Iterative Sampling In Evaluation Of Pls-Da Model

Vibrational Spectroscopy and Chemometrics in Forensic Chemistry: Critical Review, Current Trends and Challenges

Contact Info

Product

Resources

About