J. K. Rani scite author profile

Eng. Technol. Appl. Sci. Res.

2019

This work deals with document classification. It is a supervised learning method (it needs a labeled document set for training and a test set of documents to be classified). The procedure of document categorization includes a sequence of steps consisting of text preprocessing, feature extraction, and classification. In this work, a self-made data set was used to train the classifiers in every experiment. This work compares the accuracy, average precision, precision, and recall with or without combinations of some feature selection techniques and two classifiers (KNN and Naive Bayes). The results concluded that the Naive Bayes classifier performed better in many situations.

show abstract

A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy

Eng. Technol. Appl. Sci. Res.

C.³

2019

Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.

show abstract

Maximum Entropy Approach based Named Entity Recognition in Punjabi Language

Singh¹,

Rani²,

Kaur³

2013

IJCA

Named Entity Recognition is the task of identifying and classifying named entities into some predefine categories like person, location, organization etc. NER is used in many applications like text summarization, text classification, question answering and machine translation systems etc. For English a lot of work has already been done in the field of NER, where capitalization is a major key for rules, whereas Indian languages do not have such feature. This makes the task difficult for Indian Languages. This work reports about the evaluation of a Named Entity Recognition (NER) system for Punjabi language using the Maximum Entropy Approach (MAXENT). A manually tagged Punjabi news corpus is used for the evaluation which was developed from Punjabi newspaper available online. The training set annotated with a NE tagset of 12 tags is used. A MAXENT based NER system for Punjabi has reported an overall Precision, Recall and FScore values of 90.92%, 72.30% and 80.55% respectively with feature set context word, Part of Speech (POS) information, NE tag of previous word and First name Gazetteer list.

show abstract

A Comparative Approach of Dimensionality Reduction Techniques in Text Classification

2019

A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy

C.³

2019