2020
DOI: 10.1371/journal.pone.0236092
|View full text |Cite
|
Sign up to set email alerts
|

Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures

Abstract: Automated monitoring of the movements and behaviour of animals is a valuable research tool. Recently, machine learning tools were applied to many species to classify units of behaviour. For the monitoring of wild species, collecting enough data for training models might be problematic, thus we examine how machine learning models trained on one species can be applied to another closely related species with similar behavioural conformation. We contrast two ways to calculate accuracies, termed here as overall and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
1
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 41 publications
(40 citation statements)
references
References 40 publications
0
38
1
1
Order By: Relevance
“…[ 5 , 26 ]). However, the value of using data split per individual datasets has been highlighted when validating the ability of models to predict behaviour of unobserved individuals [ 28 ]. In this study, we built two model sets, the first splitting the data 60/40 randomly, with data from each individual present in both the training and the validation models, and the other approximately split 60/40 at the individual level, with individuals only in either the training or validation sets.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…[ 5 , 26 ]). However, the value of using data split per individual datasets has been highlighted when validating the ability of models to predict behaviour of unobserved individuals [ 28 ]. In this study, we built two model sets, the first splitting the data 60/40 randomly, with data from each individual present in both the training and the validation models, and the other approximately split 60/40 at the individual level, with individuals only in either the training or validation sets.…”
Section: Methodsmentioning
confidence: 99%
“…Importantly, it is particularly problematic to test the value of domestic surrogates for wild animals if those wild animals cannot be observed for verification. For example, applying the common method for splitting data into training and validation data sets overestimates the accuracy of models when tested on new individuals because the models are validated on individuals also used to train the model [ 28 ].…”
Section: Introductionmentioning
confidence: 99%
“…Thus, it is suggested to use cross-validation, which is a technique used to evaluate the results of a statistical analysis. It is used where the main objective is prediction and need to estimate the accuracy of a model ensuring that they are independent of the partitioning between training and test dataset [ 119 , 120 ].…”
Section: Automated Computational Methods For Anticancer Peptide Prmentioning
confidence: 99%
“…This process will be repeated n times and, in every iteration, a different test dataset will be selected, while the remaining data will be used, as mentioned, as a training set. Once the iterations are completed, the accuracy and error are calculated for each of the models produced [ 119 , 120 ].…”
Section: Automated Computational Methods For Anticancer Peptide Prmentioning
confidence: 99%
“…In the beginning, this was done in situ during observation, but in modern times the typical routine is to make video recordings of the behaviour which is analysed later to get quantitative results (examples with various taxa are, e.g. experiments with dogs [3], capuchin monkeys [4], cleaner fish [6], and zebra finches [15]). The possibility of recordings opened the possibility for obtaining a wealth of data, but due to lack of tools, analysis is mostly done with human effort.…”
Section: Domain-specific Tool For Hands-on Trainingmentioning
confidence: 99%