2020
DOI: 10.1075/ivitra.24.05wah
|View full text |Cite
|
Sign up to set email alerts
|

Computational extraction of formulaic sequences from corpora

Abstract: We describe a new algorithm for the extraction of formulaic language from corpora. Entitled MERGE (Multi-word Expressions from the Recursive Grouping of Elements), it iteratively combines adjacent bigrams into progressively longer sequences based on lexical association strengths. We then provide empirical evidence for this approach via two case studies. First, we compare the performance of ME… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…At the same time, string matching and regular expressions can also be used to identify formulaic language ( Durrant & Mathews-Aydnl, 2011 ). Through multiple linear regression analysis, Wahl (2019) found that the number of rules is positively correlated with the types of recognized formulaic language. The more rules are made, the more types of formulaic language are recognized.…”
Section: Related Workmentioning
confidence: 99%
“…At the same time, string matching and regular expressions can also be used to identify formulaic language ( Durrant & Mathews-Aydnl, 2011 ). Through multiple linear regression analysis, Wahl (2019) found that the number of rules is positively correlated with the types of recognized formulaic language. The more rules are made, the more types of formulaic language are recognized.…”
Section: Related Workmentioning
confidence: 99%
“…However, not much work was done on the automatic recognition of longer routine expressions. Wahl and Gries [19] is an exception, but they still focus on the phrase level and units shorter than complete sentences. Finding reused text passages and sentences is often important for the analysis of documents.…”
Section: Related Workmentioning
confidence: 99%
“…With the advent of the technology, the computers are nowadays used to retrieve the linguistics information from textual data which is known as Computational Linguistics (CL) [8]- [9]. CL is classified into many categories but among them context clues, semantic, and syntactic [9]- [11] matching is widely used in the domain of linguistics. CL helps in identifying and matching of related words from input datasets with the data dictionary which is known as domain knowledge [12].…”
Section: Introductionmentioning
confidence: 99%