2018
DOI: 10.1108/el-06-2017-0128
|View full text |Cite
|
Sign up to set email alerts
|

Semi-automatic extraction of multiword terms from domain-specific corpora

Abstract: Purpose A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…Research in Serbian language with agricultural engineering domain conducted by [28] provides a hybrid approach by combining linguistic and statistical information. The Candidate terms are obtained using the frequency of occurrence of text sequences in the corpus.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Research in Serbian language with agricultural engineering domain conducted by [28] provides a hybrid approach by combining linguistic and statistical information. The Candidate terms are obtained using the frequency of occurrence of text sequences in the corpus.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In the case of the Serbian language, a lot of work has been done in the fields of IE and NERfrom general analysis of necessary resources and appropriate methods (Vitas and Pavlovi c-Lažeti c, 2008), over collections of handcrafted rules for specific types of named entities (Gucul-Milojevi c, 2010;Krstev et al, 2011;Paji c et al, 2011), to the extraction of multiword expressions and named entities in specific domains, for example, multiword expressions in the agricultural engineering domain, mining and geology, named entities in the culinary domain, weather information from meteorological texts or keyword-based search of bilingual digital libraries (Paji c et al, 2012(Paji c et al, ,2018Stankovi c et al, 2016;Stankovi c-Vuji ci c et al, 2014). Some of the initiatives for spoken archives that use NLP for IE include the CHoral project (Heeren et al, 2009), aimed at building technology for spoken document retrieval for heritage collections and New South Voices (Atkins Library, University of North Carolina at Charlotte), that provides access to transcripts of interviews, narratives and conversations documenting life in the 20th century.…”
Section: El 385/6mentioning
confidence: 99%
“…Taking into consideration the quick development of NLP technologies and the importance of Machine translation (MT) in the dissemination of knowledge and in building new resources, it is important to extend the studies and cover new areas of research, such as Botany. The automatic identification of terms in this field helps in improving the quality of NLP applications, computer assisted translation tools and automatic translation tools (Temmerman and Knops, 2004) as well as lexicon creation, acquisition of novel terms, text classification, text indexing, machine-assisted translation and other NLP tasks (Pajić et al, 2018). For this reason, in this paper, we focus on the automatic extraction of flower and plant names, and we intend to address the shortcomings in this domain with the help of AI.…”
Section: Introductionmentioning
confidence: 99%
“…
Multiword Terms (MWTs) are domain-specific Multiword Expressions (MWE) (Pajić et al, 2018) where two or more lexemes converge to form a new unit of meaning (León Araúz and Cabezas García, 2020). The task of processing MWTs is crucial in many Natural Language Processing (NLP) applications, including Machine Translation (MT) and terminology extraction.
…”
mentioning
confidence: 99%