2018
DOI: 10.1016/j.microc.2018.02.009
|View full text |Cite
|
Sign up to set email alerts
|

Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(22 citation statements)
references
References 33 publications
0
22
0
Order By: Relevance
“…Multiple observations recorded for each cow were averaged for each variable. Data were split into datasets containing 70% and 30% of observations, according to a stratified random sampling procedure [38]. The stratification was conducted as follows: cows were first grouped as (1) healthy (cows that did not get sick during the first 17 days postpartum); (2) having developed 1 condition (either metritis, HYK, or mastitis) during the first 17 d postpartum; or (3) having developed ≥2 conditions during the first 17 d postpartum.…”
Section: Data Preparationmentioning
confidence: 99%
“…Multiple observations recorded for each cow were averaged for each variable. Data were split into datasets containing 70% and 30% of observations, according to a stratified random sampling procedure [38]. The stratification was conducted as follows: cows were first grouped as (1) healthy (cows that did not get sick during the first 17 days postpartum); (2) having developed 1 condition (either metritis, HYK, or mastitis) during the first 17 d postpartum; or (3) having developed ≥2 conditions during the first 17 d postpartum.…”
Section: Data Preparationmentioning
confidence: 99%
“…The primary spectral dataset consisting of 1361 samples and 5401 variables has been studied and reported elsewhere (Lee, Liong, & Jemain, 2018b, 2018c, 2019a, 2019b. The practical purpose of classification model is to predict brand of unknown pen inks using based on ATR-FTIR spectrum of the ink entry.…”
Section: Atr-ftir Spectral Datasetmentioning
confidence: 99%
“…Table 01 shows the number of spectrum according to ten different pen brands. More details about the spectra collection procedures can be referred to Lee, Liong, and Jemain (2018b). The dataset was first truncated and included only region between 2000-1600 cm -1 ; and then preprocessed using Asymmetric Least Squares (AsLS) algorithm (Eilers & Boelens, 2005).…”
Section: Atr-ftir Spectral Datasetmentioning
confidence: 99%
“…A recent study showed that, in some cases, no statistically significant differences were noted between the two approaches in the final results. 56 This, however, should not be generalized and, as a rule of thumb, calibration/training and prediction sets should be independent. Notice that independence can have different meanings, in terms of sampling, depending on the target of analysis.…”
Section: Validation and Figures Of Meritmentioning
confidence: 99%
“…Lee et al 56,133 recently published two relevant studies that evaluated the appropriate use of chemometrics in classification of ink lines in questioned documents. In the first, 56 they discussed whether ink strokes made with the same pen should be considered replicates or independent samples for classification purposes and the best way for splitting the dataset into training and test sets for external validation. For this, they analyzed 1361 strokes made by 273 blue gel pen inks from ten different brands and from 23 models using ATR-FTIR.…”
Section: Infrared Spectroscopymentioning
confidence: 99%