Purpose
Most source recording device identification models for Web media forensics are based on a single feature to complete the identification task and often have the disadvantages of long time and poor accuracy. The purpose of this paper is to propose a new method for end-to-end network source identification of multi-feature fusion devices.
Design/methodology/approach
This paper proposes an efficient multi-feature fusion source recording device identification method based on end-to-end and attention mechanism, so as to achieve efficient and convenient identification of recording devices of Web media forensics.
Findings
The authors conducted sufficient experiments to prove the effectiveness of the models that they have proposed. The experiments show that the end-to-end system is improved by 7.1% compared to the baseline i-vector system, compared to the authors’ previous system, the accuracy is improved by 0.4%, and the training time is reduced by 50%.
Research limitations/implications
With the development of Web media forensics and internet technology, the use of Web media as evidence is increasing. Among them, it is particularly important to study the authenticity and accuracy of Web media audio.
Originality/value
This paper aims to promote the development of source recording device identification and provide effective technology for Web media forensics and judicial record evidence that need to apply device source identification technology.
Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recording device source information. Therefore, in this paper, to fully explore the spatial information and temporal information of recording device source, we propose a new method for recording device source identification based on the fusion of spatial feature information and temporal feature information by using an end-to-end framework. From a feature perspective, we designed two kinds of networks to extract recording device source spatial and temporal information. Afterward, we use the attention mechanism to adaptively assign the weight of spatial information and temporal information to obtain fusion features. From a model perspective, our model uses an end-to-end framework to learn the deep representation from spatial feature and temporal feature and train using deep and shallow loss to joint optimize our network. This method is compared with our previous work and baseline system. The results show that the proposed method is better than our previous work and baseline system under general conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.