Texterra: A Framework for Text Analysis

Турдаков, Денис; Astrakhantsev, N. A.; Недумов, Ярослав; Sysoev, A. N.; Andrianov, Ivan; Maiorov, V. D.; Fedorenko, D. G.; Коршунов, Антон; Кузнецов, Сергей

doi:10.15514/ispras-2014-26(1)-18

Cited by 6 publications

(1 citation statement)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, TerMine 2 is based on CValue/NC-Value methods (and academic usage only); FlexiTerm contains C-Value and "a simple term variant normalisation method" [41]; TOPIA 3 lists only one method without algorithm description and it is not updated since 2009; TermRider 4 utilizes TF-IDF only; TermSuite [11] ranks candidates by Weirdness method, but focuses on recognizing term variants based on syntactic and morphological patterns. Some tools are limited by searching for mentions of (named) entities (for example, OpenCalais 5 ) or named entites and Wikipedia concepts (Texterra [42]). Another tool 6 supports only supervised recognition of 1-word and 2-words terms.…”

Section: Atr Software Toolsmentioning

confidence: 99%

ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala

Astrakhantsev

2017

Lang Resources & Evaluation

Self Cite

View full text Add to dashboard Cite

Automatically recognized terminology is widely used for various domain-specific texts processing tasks, such as machine translation, information retrieval or ontology construction. However, there is still no agreement on which methods are best suited for particular settings and, moreover, there is no reliable comparison of already developed methods. We believe that one of the main reasons is the lack of state-of-the-art methods implementations, which are usually non-trivial to recreate.In order to address these issues, we present ATR4S, an open-source software written in Scala that comprises more than 15 methods for automatic terminology recognition (ATR) and implements the whole pipeline from text document preprocessing, to term candidates collection, term candidates scoring, and finally, term candidates ranking. It is highly scalable, modular and configurable tool with support of automatic caching.We also compare 13 state-of-the-art methods on 7 open datasets by average precision and processing time. Experimental comparison reveals that no single method demonstrates best average precision for all datasets and that other available tools for ATR do not contain the best methods.

show abstract

Section: Atr Software Toolsmentioning

confidence: 99%