On the ability of adaptation of speech signals and data hiding

Ballesteros, Dora M.; A, Juan M. Moreno

doi:10.1016/j.eswa.2012.05.027

Cited by 11 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Although the techniques have been improved in [38], they still face the challenge of naturalness. Moreover, VOCo and Double voice models are developed to generate fake speeches based on signal processing mechanisms rather than machine learning algorithms [39,40]. The generated voice copies the accent, pitch, rhythm, genre, and plain text from the original speech based on mthe apping.…”

Section: Iiirelated Workmentioning

confidence: 99%

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Mahum,

Irtaza,

Javed

2023

IEEE Access

View full text Add to dashboard Cite

Various algorithms exist for the audio deep fake synthesis, such as deep voice, tacotron, fastspeech, and imitation techniques. Despite the existence of various spoofing speech detectors, they are not ready to distinguish unseen audio samples with high precision. In this study, we suggest a robust model, namely Ensemble Deep Learning based Detector (EDL-Det) to detect text-to-speech (TTS) and categorize it into spoofed and bonafide classes. Our proposed model is an improved method based on YAMNet employing VGG19 as a base network instead of MobileNet combined with two other deep learning(DL) methods. Our proposed system effectively analyzes the mel-spectrograms generated from input audio to extract the better artifacts underlying the audio signals. We have added an ensemble learning block that consists of ResNet50, and InceptionNetv2. First, we convert speech into mel-spectrograms that consist of time-frequency representations. Second, we train our model using the ASVspoof-2019 dataset. In the end, we classified the audios converting them into mel-spectrograms using our trained binary classifier along with a majority voting scheme by three networks. Due to deep convolutional network architecture, our proposed model effectively extracts the most representative features from the mel-spectrograms. Furthermore, we have performed extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. Additionally, our proposed model is robust enough to identify the unseen spoofed audios and accurately classify the attacks based on cloning algorithms.

show abstract

Section: Iiirelated Workmentioning

confidence: 99%

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Mahum,

Irtaza,

Javed

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Recientemente, ha sido propuesto el cuadrado del coeficiente de correlación (r 2 ) como un índice de distorsión de las señales de voz [13][14]. Hemos seleccionado el índice de distorsión como nuestro parámetro de similitud de las señales de voz, por varias razones: el resultado es independiente del rango dinámico de las señales a comparar, es normalizado, lo que permite establecer el valor óptimo y el valor mínimo [15], y por último es un parámetro que se calcula con bajo costo computacional.…”

Section: Selección Del Parámetro De Similitudunclassified

Separación ciega de fuentes no-determinada aplicada a mezclas de voz con base en la transformada wavelet discreta

Lemus

Ballesteros

2012

Ingeniare. Rev. chil. ing.

View full text Add to dashboard Cite

“…Steganography and Watermarking are defined as special procedures for embedding signals into digital content [1]. In the first case, the message is hidden for a communication purpose and may be used in military operations.…”

Section: Introductionmentioning

confidence: 99%

Audio Watermarking Systems - Design, Implementation and Evaluation of an Echo Hiding Scheme Using Subjective Tests and Common Distortions

Tarhda¹,

Elgouri²,

Hlou³

2013

Int. J. Recent Contrib. Eng. Sci. IT

View full text Add to dashboard Cite

Today's digital media have made the product (digital content) very flexible and diminished the cost of its distribution. However, it contributes on piracy explosion as digital content can be duplicated and re-distributed at virtually no cost. Watermarking technology appears in order to protect the intellectual property and fight the piracy. It consists on embedding data like copyright labels inside a data source without changing its perceptual quality. In audio domain, watermarking techniques rely on the imperfection of the human auditory system in order to embed data. In this paper, we completed a design based on echo hiding technique and implement it in MATLAB.The main idea of this method is to embed data into an original signal by introducing an echo with the appropriate delay. Subjective listening tests reveal that the watermarks are imperceptible. Fidelity tests show that quantity of distortion imposed by watermarks on a signal is small. Robustness tests against common signal processing reveal good responses. The watermark information is always detectable and recoverable.

show abstract

On the ability of adaptation of speech signals and data hiding

Cited by 11 publications

References 11 publications

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Separación ciega de fuentes no-determinada aplicada a mezclas de voz con base en la transformada wavelet discreta

Audio Watermarking Systems - Design, Implementation and Evaluation of an Echo Hiding Scheme Using Subjective Tests and Common Distortions

Contact Info

Product

Resources

About