Discourse Analysis

Dick, Hilary Parsons; Nightlinger, Jake

doi:10.1002/9781118786093.iela0103

Cited by 1 publication

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach uses the linguistics features of words, sentences, and documents. The linguistics approach includes lexical analysis [27], syntactic analysis [28], discourse analysis [29]. To identify sentence sets, We use LingPipe [35].…”

Section: Linguistic Annotationmentioning

confidence: 99%

HULTIG-C: NLP Corpus and Services in the Cloud

Pais

Cordeiro

Jamil

2023

Preprint

View full text Add to dashboard Cite

Nowadays, the use of language corpora for many purposes has increased significantly. General corpora exist for numerous languages, but research often needs more specialized corpora. The Web’s rapid growth has significantly improved access to thousands of online documents, highly specialized texts and comparable texts on the same subject covering several languages in electronic form. However, research has continued to concentrate on corpus annotation instead of corpus creation tools. Consequently, many researchers create their corpora, independently solve problems, and generate project-specific systems. The corpus construction is used for many NLP applications, including machine translation, information retrieval, and question-answering. This paper presents a new NLP Corpus and Services in the Cloud called HULTIG-C. HULTIG-C is characterized by various languages that include unique annotations such as keywords set, sentences set, named entity recognition set, and multiword set. Moreover, a framework incorporates the main components for license detection, language identification, boilerplate removal and document deduplication to process the HULTIG-C. Furthermore, this paper presents some potential issues related to constructing multilingual corpora from the Web.

show abstract

Section: Linguistic Annotationmentioning

confidence: 99%