Changyan Zheng scite author profile

Even though audio replay detection has improved in recent years, its performance is known to severely deteriorate with the existence of strong background noises. Given the fact that different frames of an utterance have different impacts on the performance of spoofing detection, this paper introduces attention-based long short-term memory (LSTM) to extract representative frames for spoofing detection in noisy environments. With this attention mechanism, the specific and representative frame-level features will be automatically selected by adjusting their weights in the framework of attention-based LSTM. The experiments, conducted using the ASVspoof 2017 dataset version 2.0, show that the equal error rate (EER) of the proposed approach was about 13% lower than the constant Q cepstral coefficients-Gaussian mixture model (CQCC-GMM) baseline in noisy environments with four different signal-to-noise ratios (SNR). Meanwhile, the proposed algorithm also improved the performance of traditional LSTM on audio replay detection systems in noisy environments. Experiments using bagging with different frame lengths were also conducted to further improve the proposed approach.

show abstract

Magnetically Levitated Flexible Vibration Sensors with Surficial Micropyramid Arrays for Magnetism Enhancement

Zhang

Zheng²,

et al. 2022

ACS Appl. Mater. Interfaces

View full text Add to dashboard Cite

Magnetically levitated vibration sensors possess wide frequency response ranges and high sensitivity. Compared with springs and cantilevers, the levitated magnet suffers no mechanical abrasion, allowing minimized mechanical fatigue after prolonged exposure to vibration. However, magnetic levitated sensors are mostly based on fully rigid components, which are difficult to match the soft and curvilinear surface of the biological tissues and machines. Here, an innovative vibration sensor based on magnetic levitation has been developed. The proposed sensor contains two parallel magnetic membranes, one of which is levitated by magnetic force and connected to a specially designed sensor package. The surfaces of the membranes are modified with micropyramid arrays to enhance the magnetism and integrated with flexible coil arrays to maximize the changes in magnetic flux during vibration. The sensor exhibits a wide frequency response ranging from 1 Hz to 20 kHz and high sensitivity of 0.82 mV/μm at an operating frequency of 120 Hz. Various applications have been demonstrated through bone-conducted speech acquisition, sound recording, human motion detection, and machine condition evaluation. The sensor is one of the first flexible vibration sensors based on magnetic levitation. Its innovative levitated sensing structures may inspire development of novel flexible sensors with soft mechanical moving structures for force and displacement sensing in healthcare and industrial monitoring.

show abstract

Improving the Spectra Recovering of Bone-Conducted Speech via Structural SIMilarity Loss Function

Zheng

Yang

Zhang

et al. 2019

View full text Add to dashboard Cite

A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing

et al. 2019

View full text Add to dashboard Cite

In recent years, neural network-based voice conversion methods have been rapidly developed, and many different models and neural networks have been applied in parallel voice conversion. However, the over-smoothing of parametric methods [e.g., bidirectional long short-term memory (BLSTM)] and the waveform collapse of neural vocoders (e.g., WaveNet) still have negative impacts on the quality of the converted voices. To overcome this problem, we propose a BLSTM and WaveNet-based voice conversion method cooperated with waveform collapse suppression by post-processing. This method firstly uses BLSTM to convert the acoustic features between parallel speakers, and then synthesizes pre-converted voice with WaveNet. Subsequently, several alternative iterations of BLSTM post-processing is performed, and the final converted voice is generated by WaveNet. The proposed method can directly generate converted audio waveforms and avoid the waveform-collapsed speech caused by a single WaveNet generation effectively. The experimental results indicate that acoustic features trained by using the BLSTM network could achieve better results than conventional baselines. From our experiments on VCC2018, the usage of WaveNet could alleviate the problem of over-smoothing, which contributes to improving the similarity and naturalness of the final results of voice conversion.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Changyan Zheng

Camouflage people detection via strong semantic dilation network

Attention-Based LSTM Algorithm for Audio Replay Detection in Noisy Environments

Magnetically Levitated Flexible Vibration Sensors with Surficial Micropyramid Arrays for Magnetism Enhancement

Improving the Spectra Recovering of Bone-Conducted Speech via Structural SIMilarity Loss Function

A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing

Contact Info

Product

Resources

About