In order to identify a person based on some physiological attributes or behavioral characteristics various biometrics have been introduced so far which includes face, retina, fingerprints etc. But research has revealed that the human ear also possesses some unique features which can be used as a biometric for recognition of humans. This paper uses the ear as a biometric to identify human while comparing the performance analysis of dimensionality reduction techniques PCA and LDA with various classification techniques.
Speech and speaker recognition has yet not achieved the state-of-the-art position. Keyword detection in audio clips is gaining importance as it contributes to the audio recognition and detection systems. In this area, very few works have been carried out. In this paper, we present our experiment on keyword detection within recorded news clips. It is based on Assamese language spoken by Assamese native speakers. For this experiment, the audio clips are collected from local TV news debates, whereas the keywords are recorded by random speakers. The keywords are selected for recording considering the fact that they appear somewhere within the audio clips for a finite number of times. Mel Frequency Cepstral Coefficient (MFCC) is considered as feature and Auto Associative Neural Network (AANN) is considered as the classifier tool. With this detection model an average accuracy of 87% is achieved.
Abstract: FACE is one of the major sources of social information like race, age, gender etc. At different levels of classification, prediction and identification face plays a major role, apart from other parts of the human body. As per literature Race is a form of classification for categorizing human beings in to groups based on geographic boundaries, physical appearances(including face), ethnicity and social status. In this paper we are trying to focus on different facial datasets those are currently available without any cost (but with licensing restrictions). Here, we are also representing our study of different works carried-out related with the racial classification and related topics.FERET. The two experiments were carried-out using ANN and CNN respectively. In the 1 st experiment, they (i) calculated Geometric Features, (ii) extracted Skin Color and (iii) calculated Normalized Forehead Area. For the network, they used, out of 447 images (from FERET), 320 were used for training, 37 for validation and 97 for testing. In the 2 nd experiment, they used a pre-trained model [37] for the extraction of features from the samples (both training and testing). The used Network consisted of 13 convolution layers. After performing the feature extraction, training and testing was performed. The result of the experiments showed superiority of CNN over ANN as the accuracies were 98.6% and 82.4% respectively irrespective of costs. It also showed that time taken for extraction of features and training of the network by CNN is more than ANN. VI. CONCLUSIONWe divided our survey in two parts: (i) about the different recognized facial datasets and the (ii) about different procedures and methods adopted by various researchers from feature selection to race and ethnicity classification. From the survey, it is noticed that the face can precisely give the race. The survey shows that in the last few years, lots of researches are carried out and which basically signifies a tremendous progress in the learning process of race from facial images. In this survey we tried to provides, more or less, all the existing facial image datasets and also a comprehensive review of the advances in race.
Objectives :The proposed method is based on a unique technique of Deep learning for identifying spoken words with reference to Assamese language. Most of the DNN based algorithms have been successfully implemented in the field of image recognition, computer vision, natural language processing and medical picture analysis. Methods: The method used here is the Bidirectional Long Short Term Memory (BLSTM). BLSTM incorporates both past and future situations together. The speech database for this research work is hired from the repository of Indian Language Technology Proliferation and Development Center (ILTP-DC). This repository contains 32,335 utterances by 1000 numbers of male and female participants, which is comprised of 262 unique Assamese native words. The BLSTM based recognition model is using 10 out of the 262 unique words and the remaining words are used in construction or generation of synthesized sentences. The feature extraction module uses 39 feature coefficients, which are composed of MFCC, ∆MFCC and ∆∆MFCC coefficients. Findings: The Word Error Rate (WER) of the BLSTM based recognition model is 18.84% with an average accuracy of 98.12%, which sets one promising benchmark when compared to recent findings. Novelty: In this work an attempt has been made with a different approach to detect certain keywords of Assamese language by adopting deep learning methodology. The future objective of this proposed work is to improve the detection capability of this model by considering multiple DNN models together in a hybrid approach along with the inclusion of additional features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.