Computer vision is a wide area of theoretical research and technical methods connected with object detection, object tracking and object classification. In this article computer vision is considered in context of embedding it into automobiles in order to automate the road traffic process through video stream analysis. During road traffic it is vital to detect objects quickly and correctly, so the authors pay attention to the pattern recognition quality, especially to the visual information semantic integrity preservation. Their main purpose is to find the ways of its possible improvement respectively to three basic stages of the pattern recognition process. To avoid semantic integrity violations of information in the initial stage of the image analysis the authors propose normalization; in the second stage new clustering method was developed based on particle swarm optimization and k-means algorithm; in the final stage of the pattern recognition process the Haar cascade classifier was used with normalized training samples. The obtained image processing algorithm was implemented in case of blurred and noisy images and proved its effectiveness respectively to the visual information semantic integrity preservation.
In this paper, we set forth a new longitudinal corpus and a toolset in an effort to address the influence of voice-aging on speaker verification. We have examined previous longitudinal research of agerelated voice changes as well as its applicability to real world use cases. Our findings reveal that scientists have treated agerelated voice changes as a hindrance instead of leveraging it to the advantage of the identity validator. Additionally, we found a significant dearth of publicly available corpora related to both the time span of and the number of participants in audio recordings. We also identified a significant bias toward the development of speaker recognition technologies applicable to government surveillance systems compared to speaker verification systems used in civilian IT security systems. To solve the aforementioned issues, we built an open project with the largest publicly available longitudinal speaker database, which includes 229 speakers with an average talking time exceeding 15 hours spanning across an average of 21 years per speaker. We assembled, cleaned, and normalized audio recordings and developed software tools for speech features extractions, all of which we are releasing to the public domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.