The evolution of information Technology has led to the collection of large amount of data, the volume of which has increased to the extent that in last two years the data produced is greater than all the data ever recorded in human history. This has necessitated use of machines to understand, interpret and apply data, without manual involvement. A lot of these texts are available in transliterated code-mixed form, which due to the complexity are very difficult to analyze. The work already performed in this area is progressing at great pace and this work hopes to be a way to push that work further. The designed system is an effort which classifies Hindi as well as Marathi text transliterated (Romanized) documents automatically using supervised learning methods (KNN), Naïve Bayes and Support Vector Machine (SVM)) and ontology based classification; and results are compared to in order to decide which methodology is better suited in handling of these documents. As we will see, the plain machine learning algorithm applications are just as or in many cases are much better in performance than the more analytical approach.
The evolution of information Technology has led to the collection of large amount of data, the volume of which has increased to the extent that in last two years the data produced is greater than all the data ever recorded in human history. This has necessitated use of machines to understand, interpret and apply data, without manual involvement. A lot of these texts are available in transliterated code-mixed form, which due to the complexity are very difficult to analyze. The work already performed in this area is progressing at great pace and this work hopes to be a way to push that work further. The designed system is an effort which classifies Hindi as well as Marathi text transliterated (Romanized) documents automatically using supervised learning methods (KNN), Naïve Bayes and Support Vector Machine (SVM)) and ontology based classification;and results are compared to in order to decide which methodology is better suited in handling of these documents. As we will see, the plain machine learning algorithm applications are just as or in many cases are much better in performance than the more analytical approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.