Neues Vom Heutigen Deutsch 2019
DOI: 10.1515/9783110622591-021
|View full text |Cite
|
Sign up to set email alerts
|

Neues von KorAP

Abstract: Neues von KorAP 1 EinleitungSeit Mai 2017 befindet sich KorAP1 am IDS in einer Beta-Version im Parallelbetrieb zu COSMAS II2 und ermöglicht einen umfassenden Zugriff auf einen Teil von DeReKo (Kupietz et al. 2010). In der Zwischenzeit sind viele neue Funktionen für die Datensuche hinzugekommen, um von Nutzern getestet und bewertet zu werden. Während KorAP noch weit von der Stabilität entfernt ist, mit der COSMAS II seine Dienste anbieten kann, ermöglicht der Status der Beta-Version einen hohen Grad an Flexibil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…DeReKo currently contains more than 50 billion [henceforth, b] tokens and comprises a multitude of genres, such as (a large number of) newspaper texts, fiction, or specialized texts, with a current growth rate of ∼3b words per year (Kupietz et al, 2018). Tokenization was carried out using the KorAP tokenizer (Kupietz et al, 2021), the deterministic finite automaton scanning rules of which are based on those of the Apache Lucene tokenizer. Part‐of‐speech tagging and lemmatization is based on TreeTagger (Schmid, 1994).…”
Section: Data and Preprocessingmentioning
confidence: 99%
“…DeReKo currently contains more than 50 billion [henceforth, b] tokens and comprises a multitude of genres, such as (a large number of) newspaper texts, fiction, or specialized texts, with a current growth rate of ∼3b words per year (Kupietz et al, 2018). Tokenization was carried out using the KorAP tokenizer (Kupietz et al, 2021), the deterministic finite automaton scanning rules of which are based on those of the Apache Lucene tokenizer. Part‐of‐speech tagging and lemmatization is based on TreeTagger (Schmid, 1994).…”
Section: Data and Preprocessingmentioning
confidence: 99%