In the melting pot of web‐crawled texts: The challenges of extracting English words from Croatian corpora

Čolakovac, Jasmina Jelčić; Borucinsky, Mirjana

doi:10.1111/ijal.12485

Int J App Linguistics

2023

DOI: 10.1111/ijal.12485

|View full text |Cite

In the melting pot of web‐crawled texts: The challenges of extracting English words from Croatian corpora

Jasmina Jelčić Čolakovac

Mirjana Borucinsky

Abstract: The focus of this paper are English words and phrases used in Croatian which, unlike loanwords, have not undergone major adaptations at the orthographic, phonetic, or other levels apart from being influenced by the inflectional system of the recipient language. A list of English words in Croatian corpora was compiled using automatic algorithm extraction, corpus query language in Sketch Engine, and manual word list evaluation with the end goal of publishing the first comprehensive online database of English wor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Shedding new light on English loanwords in Croatian: computational-linguistic and corpus linguistic perspectives

Bogunović

2024

Poznan Studies in Contemporary Linguistics

View full text Add to dashboard Cite

English loanwords in Croatian have been thoroughly investigated in terms of the degree of their adaptation, use across different styles and domains, and speakers’ attitudes. Most studies rely on selectively chosen examples or specialized corpora. To provide systematic data on a subgroup of English loanwords, those that occur in orthographically unadapted forms, new resources have been recently developed. They provide the data on the identification of unadapted English loanwords in Croatian, their meaning, native equivalents, and frequency. The aim of this paper is to bring together recent findings on unadapted English loanwords in Croatian, obtained from a set of studies in which computational-linguistic and corpus linguistic methods were used, with the purpose of providing a new insight into the phenomenon.

show abstract

Shedding new light on English loanwords in Croatian: computational-linguistic and corpus linguistic perspectives

Bogunović

2024

Poznan Studies in Contemporary Linguistics

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

In the melting pot of web‐crawled texts: The challenges of extracting English words from Croatian corpora

Cited by 1 publication

References 33 publications

Shedding new light on English loanwords in Croatian: computational-linguistic and corpus linguistic perspectives

Shedding new light on English loanwords in Croatian: computational-linguistic and corpus linguistic perspectives

Contact Info

Product

Resources

About