2021
DOI: 10.48550/arxiv.2110.05745
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Abstract: Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription. This paper proposes VarArray, an arraygeometry-agnostic speech separation neural network model. The proposed model is applicable to any number of microphones without retraining while leveraging the nonlinear correlation between the input channels. The proposed method adapts different elements that were proposed before separately, including transform-a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 23 publications
0
10
0
Order By: Relevance
“…The right diagram of Fig. 2 shows our proposed cross-channel layer, which was inspired by a transform-aggregate-concatenate model [12,13]. The idea is as follows.…”
Section: Picknet: Model With Cross-channel Layersmentioning
confidence: 99%
“…The right diagram of Fig. 2 shows our proposed cross-channel layer, which was inspired by a transform-aggregate-concatenate model [12,13]. The idea is as follows.…”
Section: Picknet: Model With Cross-channel Layersmentioning
confidence: 99%
“…Since real multi-channel recordings may be obtained from different devices with different microphone array geometries, it is desirable for a speech separation model to be able to deal with any array geometries without modification or retraining. Therefore, we make use of a recently proposed array-geometryagnostic model, called VarArray [12], in our investigation.…”
Section: Background Work 21 Vararray Modelmentioning
confidence: 99%
“…This is typically realized with window-based processing, where a speech separation neural network model is used to process each windowed signal. CSS can be performed with either single microphone [10,11] or a microphone array [5,12] with the latter being more effective in many cases.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations