Proceedings of the 2013 ACM Symposium on Document Engineering 2013
DOI: 10.1145/2494266.2494296
|View full text |Cite
|
Sign up to set email alerts
|

Incremental hierarchical text clustering with privileged information

Abstract: In many text clustering tasks, there is some valuable knowledge about the problem domain, in addition to the original textual data involved in the clustering process. Traditional text clustering methods are unable to incorporate such additional (privileged) information into data clustering. Recently, a new paradigm called LUPI -Learning Using Privileged Information -was proposed by Vapnik to incorporate privileged information in classification tasks. In this paper, we extend the LUPI paradigm to deal with text… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0
3

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(18 citation statements)
references
References 4 publications
0
15
0
3
Order By: Relevance
“…Clustering methods will be applied in order to distinguish documents which reports practical application of a PSS method or tool from those which have a theoretical perspective. Some approaches for this task are: i) use the BoED as privileged information, being a complement to the textual representation based on the traditional BoW (MARCACINI; REZENDE, 2013;SINOARA et al, 2014); ii) explore the use of other text representation techniques, such as the bag-of-related-words (ROSSI; REZENDE, 2011); iii) employ the interactive textual feature selection (CORRÊA et al, 2015); iv) apply methods to identify descriptors for the clusters found, such as the method proposed by Santos, Rezende and Oliveira (2014).…”
Section: Discussionmentioning
confidence: 99%
“…Clustering methods will be applied in order to distinguish documents which reports practical application of a PSS method or tool from those which have a theoretical perspective. Some approaches for this task are: i) use the BoED as privileged information, being a complement to the textual representation based on the traditional BoW (MARCACINI; REZENDE, 2013;SINOARA et al, 2014); ii) explore the use of other text representation techniques, such as the bag-of-related-words (ROSSI; REZENDE, 2011); iii) employ the interactive textual feature selection (CORRÊA et al, 2015); iv) apply methods to identify descriptors for the clusters found, such as the method proposed by Santos, Rezende and Oliveira (2014).…”
Section: Discussionmentioning
confidence: 99%
“…We use the LIHC (LUPI-based incremental hierarchical clustering) [25] for the automatic generation of a hierarchical clustering of texts, and through this, produce a topic hierarchy. This topic hierarchy is then processed so that the most representative topics are used as aspects of the items.…”
Section: Extracting Aspects Through Hierarchy Clusteringmentioning
confidence: 99%
“…The technical information used is a traditional bag-of-words representation, containing the frequency of the terms present in the document. Privileged information in text processing domain consists of information besides traditional term frequency (TF) or term frequency-inverse document frequency (TF-IDF) [25]. In this work, we use the part-of-speech tag of words as privileged information, since they represent a linguistic information about the terms located in the documents.…”
Section: Extracting Aspects Through Hierarchy Clusteringmentioning
confidence: 99%
“…Several state-of-the-art approaches to improve the automatic generation of clusters have been proposed in the literature [20,2,16,6,14]. The LUPI (Learning Using Privileged Information) paradigm, proposed by Vapnik and Vashist [21] to incorporate privileged information in the classification task, was applied to the clustering task in [6] and [14].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, Vapnik and Vashist [21] proposed a new machine learning paradigm called Learning Using Privileged Information (LUPI), which allows the incorporation of this additional privileged information during a machine learning process. The LUPI paradigm was extended for clustering tasks [14], in which the privileged information is used to obtain a more robust initial clustering model. This initial model is used for incremental clustering of new textual information available, so as to enable its use in dynamic scenarios.…”
Section: Introductionmentioning
confidence: 99%