2021
DOI: 10.2478/jazcas-2021-0049
|View full text |Cite
|
Sign up to set email alerts
|

Sharing Data Through Specialized Corpus-Based Tools: The Case of GramatiKat

Abstract: This paper presents a specialized corpus tool GramatiKat in the context of Open Science principles, namely data sharing, which offers opportunities for original research and facilitates verifiability of research and building on previous research. The tool is designed primarily for examining grammatical categories from the quantitative point of view. It offers grammatical profiles of particular lemmas (currently 14 thousand Czech nouns) and the proportion of individual grammatical categories within a part of sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 2 publications
0
0
0
Order By: Relevance
“…Many of the nouns and some of the verbs searched had very low frequencies and were from an informal register, making TenTen's size and composition suitable for this task, but its tagging and lemmatization meant much manual clean-up and verification was needed; in the case of higher-frequency items this quickly became unwieldy. GramatiKat, a corpus tool that allows searching of nouns by percentage of case and number forms (Kováříková & Kovářík 2021;Kováříková 2021), made it easy to identify medium-to high-frequency nouns in the Czech National Corpus where the relevant cells were suspiciously underweighted; we looked at all such items for inclusion. All items were then checked manually in the 100m-token SYN2015 corpus.…”
Section: Corpus and Handbook Data (Selection/identification Of Lexemes)mentioning
confidence: 99%
“…Many of the nouns and some of the verbs searched had very low frequencies and were from an informal register, making TenTen's size and composition suitable for this task, but its tagging and lemmatization meant much manual clean-up and verification was needed; in the case of higher-frequency items this quickly became unwieldy. GramatiKat, a corpus tool that allows searching of nouns by percentage of case and number forms (Kováříková & Kovářík 2021;Kováříková 2021), made it easy to identify medium-to high-frequency nouns in the Czech National Corpus where the relevant cells were suspiciously underweighted; we looked at all such items for inclusion. All items were then checked manually in the 100m-token SYN2015 corpus.…”
Section: Corpus and Handbook Data (Selection/identification Of Lexemes)mentioning
confidence: 99%