Text, Speech and Dialogue
DOI: 10.1007/978-3-540-74628-7_16
|View full text |Cite
|
Sign up to set email alerts
|

Bilingual News Clustering Using Named Entities and Fuzzy Similarity

Abstract: Abstract. This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge for multilingual news clustering. In the vectorial news representation we take into account the category of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 12 publications
0
9
0
Order By: Relevance
“…Based on the basic information of the documents and additional information from Wikipedia, the documents are clustered in each language. There are also some studies on multilingual document clustering such as English-Spanish [14][15][16] , English-Japanese 17 and English-Bulgarian 18 . However there is not much study bilingual clustering towards English and Malay documents 19 .…”
Section: Related Workmentioning
confidence: 99%
“…Based on the basic information of the documents and additional information from Wikipedia, the documents are clustered in each language. There are also some studies on multilingual document clustering such as English-Spanish [14][15][16] , English-Japanese 17 and English-Bulgarian 18 . However there is not much study bilingual clustering towards English and Malay documents 19 .…”
Section: Related Workmentioning
confidence: 99%
“…Also, for the same reason this dataset contains a large number of unrelated articles. In [15], the authors stated that NEs play an important role in news documents. They wanted to exploit that characteristic by considering them as the only distinguishing features of the documents.…”
Section: Experiments 1: Nes In Speechmentioning
confidence: 99%
“…Friburger et al (2002) and Montalvo et al (2007) point out the effect of named entity recognition in improving monolingual and bilingual document clustering respectively. So we extract named entities through named entities recognition tools released by Stanford University, and then calculate the similarity between the two named entity sets as a feature according to Equation (7).…”
Section: Bilingual Named Entities Similaritymentioning
confidence: 99%