Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents crosslanguage automatic phonetic segmentation using Hidden Markov Models (HMMs). The underlying notion is segmentation based on articulation (manner and place) so as to provide extensive models that will be applicable across languages. A test on the Appen Spanish speech corpus gives phone recognition accuracy of 61.15% when bootstrapped with acoustic models trained on the TIMIT as compared with a baseline result of 54.63% for flat start initialization of the monophone models.
This paper presents mLEARn, an open-source implementation of multi-layer perceptron in C++. The techniques and algorithms implemented represent existing approaches in machine learning. mLEARn is written using simple C++ constructs. The aim of mLE ARn is to provide a simple and extendable machine learning platform for students in courses involving C++ and machine learning. The source code and documentation can be downloaded from https://github.com/kalu-o/mLEARn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.