2011
DOI: 10.5120/3131-4315
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Feature Selection Method for Classification of Medical Documents from Pubmed

Abstract: The exponential growth of online repositories in medical science has led to the development of various text mining tool . Theses tools assist the users in analyzing text data stored in the online repositories like Pubmed and medline. The pubmed repositories are growing at the rate of 500000 articles per year. Classification of Medline documents becomes very complex due to high dimensionality of feature space. In this study we discussed how dimensionality is reduced. We study and compared various dimensionality… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…Preprocessing comprises unstructured data and it is not feasible to classify the documents directly using data mining techniques. Preprocessing is essential in text mining and in performing, the documents are converted into a list of features or keywords by performing stop words removal and word stemming process are performed [11].Stop words are the words which occur repeatedly in the documents and they do not provide any meaning within the document. Stop words are "and", "are", "this" and so on.…”
Section: B Preprocessingmentioning
confidence: 99%
“…Preprocessing comprises unstructured data and it is not feasible to classify the documents directly using data mining techniques. Preprocessing is essential in text mining and in performing, the documents are converted into a list of features or keywords by performing stop words removal and word stemming process are performed [11].Stop words are the words which occur repeatedly in the documents and they do not provide any meaning within the document. Stop words are "and", "are", "this" and so on.…”
Section: B Preprocessingmentioning
confidence: 99%
“…As the word count increases, TF-IDF value also increases in a direct proportion, but is offset by the occurrence of the word in the set of documents to control for the fact that some words are generally more common than others. (Sitaula et al, 2012;Sagar et al, 2011). But an ideal document consistently makes use of synonyms for a single word so that same words generally do not repeat.…”
Section: Similarity and Performance Measuresmentioning
confidence: 99%