2011
DOI: 10.1177/0165551511412028
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection strategies for automated classification of digital media content

Abstract: This paper proposes strategies for feature selection of digital news articles that allow an effective implementation of learning algorithms for the unsupervised classification of news articles. With the appropriate selection of a small subset of features a correct identification of related news can be achieved, thus enabling organizations and individual users to keep track of current events. The paper defines a quality measure of the discriminatory power of each feature and verifies that the selection of a fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 17 publications
0
8
0
Order By: Relevance
“…Most of the text document categorisation is adopted for the vector space model [2730] to represent the documents, that is to say, each unique term in vocabulary represents one dimension in the future vector space. Therefore, the text document data sets can be represented as a document-by-term matrix D ( n × m ) , where n and m indicate the number of documents and the number of terms occurring in the document data set, respectively.…”
Section: Preliminariesmentioning
confidence: 99%
“…Most of the text document categorisation is adopted for the vector space model [2730] to represent the documents, that is to say, each unique term in vocabulary represents one dimension in the future vector space. Therefore, the text document data sets can be represented as a document-by-term matrix D ( n × m ) , where n and m indicate the number of documents and the number of terms occurring in the document data set, respectively.…”
Section: Preliminariesmentioning
confidence: 99%
“…Three major German and Austrian television channels have published their guidelines for indexing (ARD/ORF/ZDF, ) the content of programs. Research, however, is mainly focused on automatic classification and text mining of news items (e.g., Rocha & Cobo, ) rather than conceptual indexing. Indeed, news material from early on constituted an important part of information retrieval (IR) test collections, such as the TREC collections, the first example being Salton's Time magazine collection (Sanderson, ).…”
Section: Previous Workmentioning
confidence: 99%
“…Furthermore, there has been mounting interest in organizing news and newspaper articles using automatic techniques such as text classification and indexing algorithms (e.g. Chen and Lin 2000;Evans and Klavans 2003;Casillas et al 2003;Mamakis et al 2011;Rocha and Cobo 2011), or, on occasion, those based on user-features, in a usercentered fashion, such as the automatic summarization and categorization of news derived from user choices (Banos et al 2006), user modeling (Wongchokprasitti and Brusilovsky 2007), or user profiles (Bouras and Tsogkas 2010). However, as some authors have pointed out, automatic indexing and user-based retrieval systems such as Google's are not exempt from bias or subjectivity either (e.g.…”
Section: Media Knowledge Organization In the Context Of Ko Studiesmentioning
confidence: 99%