Purpose
Authenticating an individual through voice can prove convenient as nothing needs to be stored and cannot easily be stolen. However, if an individual is authenticating under duress, the coerced attempt must be acknowledged and appropriate warnings issued. Furthermore, as duress may entail multiple combinations of emotions, the current f-score evaluation does not accommodate that multiple selected samples possess similar levels of importance. Thus, this study aims to demonstrate an approach to identifying duress within a voice-based authentication system.
Design/methodology/approach
Measuring the value that a classifier presents is often done using an f-score. However, the f-score does not effectively portray the proposed value when multiple classes could be grouped as one. The f-score also does not provide any information when numerous classes are often incorrectly identified as the other. Therefore, the proposed approach uses the confusion matrix, aggregates the select classes into another matrix and calculates a more precise representation of the selected classifier’s value. The utility of the proposed approach is demonstrated through multiple tests and is conducted as follows. The initial tests’ value is presented by an f-score, which does not value the individual emotions. The lack of value is then remedied with further tests, which include a confusion matrix. Final tests are then conducted that aggregate selected emotions within the confusion matrix to present a more precise utility value.
Findings
Two tests within the set of experiments achieved an f-score difference of 1%, indicating, Mel frequency cepstral coefficient, emotion detection, confusion matrix, multi-layer perceptron, Ryerson audio-visual database of emotional speech and song (RAVDESS), voice authentication that the two tests provided similar value. The confusion matrix used to calculate the f-score indicated that some emotions are often confused, which could all be considered closely related. Although the f-score can represent an accuracy value, these tests’ value is not accurately portrayed when not considering often confused emotions. Deciding which approach to take based on the f-score did not prove beneficial as it did not address the confused emotions. When aggregating the confusion matrix of these two tests based on selected emotions, the newly calculated utility value demonstrated a difference of 4%, indicating that the two tests may not provide a similar value as previously indicated.
Research limitations/implications
This approach’s performance is dependent on the data presented to it. If the classifier is presented with incomplete or degraded data, the results obtained from the classifier will reflect that. Additionally, the grouping of emotions is not based on psychological evidence, and this was purely done to demonstrate the implementation of an aggregated confusion matrix.
Originality/value
The f-score offers a value that represents the classifiers’ ability to classify a class correctly. This paper demonstrates that aggregating a confusion matrix could provide more value than a single f-score in the context of classifying an emotion that could consist of a combination of emotions. This approach can similarly be applied to different combinations of classifiers for the desired effect of extracting a more accurate performance value that a selected classifier presents.