We briefly report on the four shared tasks organized as part of the PAN 2019 evaluation lab on digital text forensics and authorship analysis. Each task is introduced, motivated, and the results obtained are presented. Altogether, the four tasks attracted 373 registrations, yielding 72 successful submissions. This, and the fact that we continue to invite the submission of software rather than its run output using the TIRA experimentation platform, demarcates a good start into the second decade of PAN evaluations labs.
The PAN 2017 shared tasks on digital text forensics were held in conjunction with the annual CLEF conference. This paper gives a high-level overview of each of the three shared tasks organized this year, namely author identification, author profiling, and author obfuscation. For each task, we give a brief summary of the evaluation data, performance measures, and results obtained. Altogether, 29 participants submitted a total of 33 pieces of software for evaluation, whereas 4 participants submitted to more than one task. All submitted software has been deployed to the TIRA evaluation platform, where it remains hosted for reproducibility purposes.
The aim of modern authorship attribution approaches is to analyze known authors and to assign authorships to previously unseen and unlabeled text documents based on various features. In this paper we present a novel feature to enhance current attribution methods by analyzing the grammar of authors. To extract the feature, a syntax tree of each sentence of a document is calculated, which is then split up into length-independent patterns using pq-grams. The mostly used pq-grams are then used to compose sample profiles of authors that are compared with the profile of the unlabeled document by utilizing various distance metrics and similarity scores. An evaluation using three different and independent data sets reveals promising results and indicate that the grammar of authors is a significant feature to enhance modern authorship attribution methods.
PAN 2018 explores several authorship analysis tasks enabling a systematic comparison of competitive approaches and advancing research in digital text forensics. More specifically, this edition of PAN introduces a shared task in cross-domain authorship attribution, where texts of known and unknown authorship belong to distinct domains, and another task in style change detection that distinguishes between single-author and multi-author texts. In addition, a shared task in multimodal author profiling examines, for the first time, a combination of information from both texts and images posted by social media users to estimate their gender. Finally, the author obfuscation task studies how a text by a certain author can be paraphrased so that existing author identification tools are confused and cannot recognize the similarity with other texts of the same author. New corpora have been built to support these shared tasks. A relatively large number of software submissions (41 in total) was received and evaluated. Best paradigms are highlighted while baselines indicate the pros and cons of submitted approaches. (stylistic fingerprint) but she also shares some properties with other people of similar background (age, gender, education, etc.) It is quite challenging to define or measure both personal style (for each individual author) and collective style (males, females, young people, old people, etc.). In addition, it remains unclear what one should modify in her texts in order to attempt to hide her identity or to mimic the style of another author. This edition of PAN deals with these challenging issues. Author identification puts emphasis on the personal style of individual authors. The most common task is authorship attribution where there is a set of candidate authors (suspects), with samples of their texts, and one of them is selected as the most likely author of a text of disputed authorship [31]. This can be a closed-set (one of the suspects is surely the true author) or an open-set (the true author may not be among the suspects) attribution case. This edition of PAN focuses on closed-set cross-domain authorship attribution, that is, when the texts unquestionably written by the suspects and the texts of disputed authorship belong to different domains. This is a realistic scenario suitable for several applications. For example, imagine the case of a crime novel published anonymously when all candidate authors have only published fantasy novels [13] or a disputed tweet when the available texts written by the suspects are newspaper articles. To be able to control the domain of texts, we turned to so-called fanfiction [11]. This term refers to the large body of contemporary fiction that is nowadays created by non-professional authors ('fans'), who write in the tradition of a well-known source work, such as the Harry Potter series by J.K. Rowling, that is sometimes called the 'canon'. These writings or 'fics' within such a 'fandom' heavily borrow characters, motives, settings, etc. from the source fandom. Fanfiction prov...
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre perspective; and (iii) author obfuscation, addressing author masking and obfuscation evaluation. In total, 35 teams participated in all three shared tasks of PAN 2016 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.