DOI: 10.1007/978-3-540-85836-2_34
|View full text |Cite
|
Sign up to set email alerts
|

Document-Base Extraction for Single-Label Text Classification

Abstract: Abstract. Many text mining applications, especially when investigating TextClassification (TC), require experiments to be performed using common textcollections, such that results can be compared with alternative approaches. With regard to single-label TC, most text-collections (textual data-sources) in their original form have at least one of the following limitations: the overall volume of textual data is too large for ease of experimentation; there are many predefined classes; most of the classes consist of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…The preparation of Usenet Articles ("20 Newsgroups") based documentbases adopted the approach of Deng et al [8], where the entire collection was randomly split into two document-bases covering 10 classes each: 20NG.D10000.C10 and 20NG.D9997.C10. The preparation of Reuters-21578 and the MedLine-OHSUMED document-bases recalled the idea of Wang et al [25], where the Reuters.D6643.C8 and OHSUMED.D6855.C10 document-bases were generated.…”
Section: Resultsmentioning
confidence: 99%
“…The preparation of Usenet Articles ("20 Newsgroups") based documentbases adopted the approach of Deng et al [8], where the entire collection was randomly split into two document-bases covering 10 classes each: 20NG.D10000.C10 and 20NG.D9997.C10. The preparation of Reuters-21578 and the MedLine-OHSUMED document-bases recalled the idea of Wang et al [25], where the Reuters.D6643.C8 and OHSUMED.D6855.C10 document-bases were generated.…”
Section: Resultsmentioning
confidence: 99%
“…Based on the probability of a particular answer a i ∈ S a , yielding a certain utility u given the indicators x 0 and x 1 , the joint probability h of a tuple x 0 , x 1 , u is determined. If the statistical properties of the components computing x 0 and x 1 are unknown, classification accuracies found in the literature (Wang et al, 2008) or heuristics are helpful means to estimate h. In such environments the incorporation of user feedback and machine learning techniques helps refining h and therefore yields more accurate query strategies. If the joint probability function and the cost estimates are accurate, STS will return an optimal query strategy (MacQueen, 1964).…”
Section: Common Utility Mass Functionmentioning
confidence: 99%
“…This approach used supervised learning where a feature may be connected with multiple labels. It is opposed to single-label classification when each feature is associated with a single class (label) [7]. Moreover, MLC is extensively applied in real-world problems, such as bioinformatics, e-commerce…etc.…”
Section: Introductionmentioning
confidence: 99%