2002
DOI: 10.1007/978-1-4615-0933-2_3
|View full text |Cite
|
Sign up to set email alerts
|

Corpora for Topic Detection and Tracking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2006
2006
2010
2010

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(22 citation statements)
references
References 0 publications
0
22
0
Order By: Relevance
“…the two-expert settings used in (Chen, 2006;Lu et al, 2007) or even in the highly professionally organised TDT evaluation on topics in news (Cieri et al, 2002). Zeng et al (2004) used three experts, but they investigated sub-topics of general Web query, which are easier to judge than scientific sub-topics.…”
Section: Cluster Qualitymentioning
confidence: 99%
“…the two-expert settings used in (Chen, 2006;Lu et al, 2007) or even in the highly professionally organised TDT evaluation on topics in news (Cieri et al, 2002). Zeng et al (2004) used three experts, but they investigated sub-topics of general Web query, which are easier to judge than scientific sub-topics.…”
Section: Cluster Qualitymentioning
confidence: 99%
“…The documents contain 11,334 distinct words; average document length is 460. The second dataset contains 200 documents from the TDT-1 corpus [24]. TDT documents are slightly longer, average length is 540 words, but the number of distinct words is somewhat smaller: 9,379.…”
Section: Y Imentioning
confidence: 99%
“…Because it is so small, the Hong-Kong parallel corpus has a significant word coverage problem. In order to alleviate the problem, we augmented the corpus with the TDT2 and TDT3 [24] pseudo-parallel datasets. These corpora contain 46,692 Chinese news stories along with their SYSTRAN translations into English.…”
Section: Chinese Resourcesmentioning
confidence: 99%
See 2 more Smart Citations