Studies in Classification, Data Analysis, and Knowledge Organization
DOI: 10.1007/3-540-31314-1_48
|View full text |Cite
|
Sign up to set email alerts
|

Text Classification with Active Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 3 publications
0
3
0
Order By: Relevance
“…We finally ended up with 1000 articles classified as science, 1000 as technology, 1000 as Intrinsic Science, 457 as Extrinsic Science, 158 as Intrinsic Technology and 802 as Extrinsic Technology. This process is known as 'active learning' (Novak et al, 2006), and it includes manual controls (classifying random samples). We carried out a k-fold cross-validation (k = 5) (Arlot and Celisse, 2010).…”
Section: Automatic Classificationmentioning
confidence: 99%
“…We finally ended up with 1000 articles classified as science, 1000 as technology, 1000 as Intrinsic Science, 457 as Extrinsic Science, 158 as Intrinsic Technology and 802 as Extrinsic Technology. This process is known as 'active learning' (Novak et al, 2006), and it includes manual controls (classifying random samples). We carried out a k-fold cross-validation (k = 5) (Arlot and Celisse, 2010).…”
Section: Automatic Classificationmentioning
confidence: 99%
“…Choosing when to stop the active learning process is most often simply dictated by the number of labels an oracle will provide (Novak et al, 2006) (often referred to as a label budget). There are, however, other approaches based on performance achieved on a hold-out test set (Campbell et al, 2000), although these suffer from the difficulty of getting labelled examples which necessitates the use of active learning in the first place.…”
Section: Related Workmentioning
confidence: 99%
“…These categories do not need to describe the data in details, the important thing is that they show to the system the user's view of the data -which documents are similar and which are different from the user's perspective. The process of manually marking the documents with categories is time consuming but can be significantly speeded up by the use of active learning [5], [8].…”
Section: Introductionmentioning
confidence: 99%