Zongze Ren scite author profile

Zongze Ren

5Publications

16Citation Statements Received

97Citation Statements Given

How they've been cited

How they cite others

102

Affiliations

Shanghai University

Publications

Order By: Most citations

Two-Stage Training for Chinese Dialect Recognition

Ren

Yang²,

2019

View full text Add to dashboard Cite

In this paper, we present a two-stage language identification (LID) system based on a shallow ResNet14 followed by a simple 2-layer recurrent neural network (RNN) architecture, which was used for Xunfei (iFlyTek) Chinese Dialect Recognition Challenge 1 and won the first place among 110 teams. The system trains an acoustic model (AM) firstly with connectionist temporal classification (CTC) to recognize the given phonetic sequence annotation and then train another RNN to classify dialect category by utilizing the intermediate features as inputs from the AM. Compared with a three-stage system we further explore, our results show that the two-stage system can achieve high accuracy for Chinese dialects recognition under both short utterance and long utterance conditions with less training time.

show abstract

Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

Ren

Chen

2019

View full text Add to dashboard Cite

Speaker embeddings become growing popular in the text-independent speaker verification task. In this paper, we propose two improvements during the training stage. The improvements are both based on triplet cause the training stage and the evaluation stage of the baseline x-vector system focus on different aims. Firstly, we introduce triplet loss for optimizing the Euclidean distances between embeddings while minimizing the multi-class cross entropy loss. Secondly, we design an embedding similarity measurement network for controlling the similarity between the two selected embeddings. We further jointly train the two new methods with the original network and achieve stateof-the-art. The multi-task training synergies are shown with a 9% reduction equal error rate (EER) and detected cost function (DCF) on the 2016 NIST Speaker Recognition Evaluation (SRE) Test Set.

show abstract

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

Ma¹,

Ren²,

Xu³

2021

Preprint

View full text Add to dashboard Cite

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

Ma¹,

Ren²,

Xu³

2021

View full text Add to dashboard Cite

A Study on Angular Based Embedding Learning for Text-independent Speaker Verification

Chen

Ren

2019

View full text Add to dashboard Cite

Learning a good speaker embedding is important for many automatic speaker recognition tasks, including verification, identification and diarization. The embeddings learned by softmax are not discriminative enough for open-set verification tasks. Angular based embedding learning target can achieve such discriminativeness by optimizing angular distance and adding margin penalty. We apply several different popular angular margin embedding learning strategies in this work and explicitly compare their performance on Voxceleb speaker recognition dataset. Observing the fact that encouraging inter-class separability is important when applying angular based embedding learning, we propose an exclusive inter-class regularization as a complement for angular based loss. We verify the effectiveness of these methods for learning a discriminative embedding space on ASV task with several experiments. These methods together, we manage to achieve an impressive result with 16.5% improvement on equal error rate (EER) and 18.2% improvement on minimum detection cost function comparing with baseline softmax systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zongze Ren

Two-Stage Training for Chinese Dialect Recognition

Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

A Study on Angular Based Embedding Learning for Text-independent Speaker Verification

Contact Info

Product

Resources

About