In the process of knowledge discovery, the reliability of results depends upon the effectiveness of attributes selected for decision. The curse of dimensionality refers to the phenomenon in which the excessive number of dimensions affect the analysis. In order to eradicate the curse of dimensionality in text analysis, we are proposing an ontology-based semantic measure for intelligent selection/reduction of features. Among the various text mining techniques, ontology-based mining has a significant contribution to the field. The ontology-based semantic measures, which are mathematical models used to find the similarity between various concepts in the ontology, have made a significant contribution to feature engineering. The proposed measure is an amalgamation of semantic similarity, relatedness, and distance. The measure allows performing an in-depth analysis of various semantic relationships between concepts of the English language. The performance of the measure was evaluated against benchmarked dimension reduction techniques such as PCA. The results show improvement by reducing the size of dimensions up to 35%. The results were further evaluated by training a classifier to validate that the features are not creating any underfit/overfit model.
INDEX TERMSFeature engineering, dimension reduction, semantic measures, ontology.
This paper proposes a system to automatically locate and extract songs from digitized movies. We focus on the genre of Bollywood movies. A song grammar particularly applicable to this genre is proposed and used subsequently to construct a probabilistic timed automaton to differentiate songs. The proposed system has been implemented and test results indicate both high precision and recall. Songs being a major driver in the success of Bollywood movies, a potentially significant application of the proposed system lies in automatically mining the vast Bollywood movie archives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.