Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09 2009
DOI: 10.3115/1609067.1609104
|View full text |Cite
|
Sign up to set email alerts
|

Person identification from text and speech genre samples

Abstract: In this paper, we describe experiments conducted on identifying a person using a novel unique correlated corpus of text and audio samples of the person's communication in six genres. The text samples include essays, emails, blogs, and chat. Audio samples were collected from individual interviews and group discussions and then transcribed to text. For each genre, samples were collected for six topics. We show that we can identify the communicant with an accuracy of 71% for six fold cross validation using an ave… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(20 citation statements)
references
References 17 publications
0
19
0
1
Order By: Relevance
“…The unmasking method for author verification of long documents based on very frequent word frequencies was successfully tested in cross-topic conditions (Koppel et al, 2007) but Kestemont, et al (2012) found that its reliability was significantly lower in cross-genre conditions. Function words have been found to be effective when topics of the test corpus are excluded from the training corpus (Baayen et al, 2002;Goldstein-Stewart et al, 2009;Menon and Choi, 2011). However, Mikros and Argiri (2007) demonstrated that function word features actually correlate with topic.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The unmasking method for author verification of long documents based on very frequent word frequencies was successfully tested in cross-topic conditions (Koppel et al, 2007) but Kestemont, et al (2012) found that its reliability was significantly lower in cross-genre conditions. Function words have been found to be effective when topics of the test corpus are excluded from the training corpus (Baayen et al, 2002;Goldstein-Stewart et al, 2009;Menon and Choi, 2011). However, Mikros and Argiri (2007) demonstrated that function word features actually correlate with topic.…”
Section: Related Workmentioning
confidence: 99%
“…However, Mikros and Argiri (2007) demonstrated that function word features actually correlate with topic. Other types of features found effective in cross-topic and crossgenre authorship attribution are punctuation mark frequencies (Baayen et al, 2002), LIWC features (Goldstein-Stewart et al, 2009), and character n-grams (Stamatatos, 2013). To enhance the performance of attribution models based on character n-gram features, Sapkota et al (2015) define several n-gram categories and then they combine n-grams that correspond to word affixes and punctuation marks.…”
Section: Related Workmentioning
confidence: 99%
“…In all these cases, a topic is mainly characterized by coarse‐grained thematic areas. Finally, some corpora that control topic and genre have been built (by hiring people for writing under controlled conditions) to explore cross‐domain attribution (Baayen, van Halteren, Neijt, & Goldstein‐Stewart et al, ; Tweedie, ). The latter approach is certainly the most reliable, providing a fine‐grained range of topics and genres.…”
Section: Previous Workmentioning
confidence: 99%
“…Madigan et al () demonstrated that part‐of‐speech features are more effective than word unigrams in cross‐topic conditions. Function words have been found to be effective when topics of the test corpus are excluded from the training corpus (Baayen, et al, ; Goldstein‐Stewart et al, ; Menon & Choi, ). However, Mikros and Argiri () demonstrated that function word features actually correlate with topic.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation