In the paper, we present a software pipeline for speech recognition to automate the creation of training datasets, based on desired unlabeled audios, for low resource languages and domain-specific area. Considering the commoditizing of speech recognition, more teams build domain-specific models as well as models for local languages. At the same time, lack of training datasets for low to middle resource languages significantly decreases possibilities to exploit last achievements and frameworks in the Speech Recognition area and limits the wide range of software engineers to work on speech recognition problems. This problem is even more critical for domain-specific datasets. The pipeline was tested for building Ukrainian language recognition and confirmed that the created design is adaptable to different data source formats and expandable to integrate with existing frameworks.
Structure optimization of the multi-channel on-board radar with antenna aperture synthesis and algorithm for power line selection on the background of the earth surface
ONLINE SESSION. INFOCOMMUNICATION TECHNOLOGIES AND NETWORKS
A dramatic change in the abilities of language models to provide state of the art accuracy in a number of Natural Language Processing tasks is currently observed. These improvements open a lot of possibilities in solving NLP downstream tasks. Such tasks include machine translation, speech recognition, information retrieval, sentiment analysis, summarization, question answering, multilingual dialogue systems development and many more. Language models are one of the most important components in solving each of the mentioned tasks. This paper is devoted to research and analysis of the most adopted techniques and designs for building and training language models that show a state of the art results. Techniques and components applied in creation of language models and its parts are observed in this paper, paying attention to neural networks, embedding mechanisms, bidirectionality, encoder and decoder architecture, attention and self-attention, as well as parallelization through using Transformer. Results: the most promising techniques imply pretraining and fine-tuning of a language model, attention-based neural network as a part of model design, and a complex ensemble of multidimensional embeddings to build deep context understanding. The latest offered architectures based on these approaches require a lot of computational power for training language model and it is a direction of further improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.