The CLARIN-NL Data Curation Service: Bringing Data to the Foreground

Oostdijk, N.H.J.; Heuvel, H. van den; Treurniet, Maaske

doi:10.2218/ijdc.v8i2.278

Cited by 3 publications

(3 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regardless of the theoretical framework, the actual model is a mixture of both. Table 1 shows an example of the fields and steps involved in a representative research data lifecycle [22][23][24][25][26][27][28][29][30].…”

Section: Literature Reviewmentioning

confidence: 99%

A study on formalizing the knowledge of data curation activities across different fields

Minamiyama,

Takeda,

Hayashi

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

In recent years, with the trend of open science, there have been many efforts to share research data on the internet. To promote research data sharing, data curation is essential to make the data interpretable and reusable. In research fields such as life sciences, earth sciences, and social sciences, tasks and procedures have been already developed to implement efficient data curation to meet the needs and customs of individual research fields. However, not only data sharing within research fields but also interdisciplinary data sharing is required to promote open science. For this purpose, knowledge of data curation across the research fields is surveyed, analyzed, and organized as an ontology in this paper. As the survey, existing vocabularies and procedures are collected and compared as well as interviews with the data curators in research institutes in different fields are conducted to clarify commonalities and differences in data curation across the research fields. It turned out that the granularity of tasks and procedures that constitute the building blocks of data curation is not formalized. Without a method to overcome this gap, it will be challenging to promote interdisciplinary reuse of research data. Based on the analysis above, the ontology for the data curation process is proposed to describe data curation processes in different fields universally. It is described by OWL and shown as valid and consistent from the logical viewpoint. The ontology successfully represents data curation activities as the processes in the different fields acquired by the interviews. It is also helpful to identify the functions of the systems to support the data curation process. This study contributes to building a knowledge framework for an interdisciplinary understanding of data curation activities in different fields.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

A study on formalizing the knowledge of data curation activities across different fields

Minamiyama,

Takeda,

Hayashi

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…T-Scan biedt daarbij twee opties voor het onderliggende corpus: -SoNaR totaal (Oostdijk et al 2013; voor onderzoekers is dit corpus toegankelijk op https://portal.clarin.inl.nl/opensonar_whitelab); -Subtlex (Keuleers et al 2010). Bij SoNaR gaat het vooral om schriftelijk taalgebruik, waarbij informele genres qua omvang in de minderheid zijn.…”

Section: ２unclassified

Tekstgenres analyseren op lexicale complexiteit met T‑Scan

Maat

Dekker²

2016

Tijdschrift Voor Taalbeheersing

View full text Add to dashboard Cite

Using T-Scan to analyse the lexical complexity of text genresT-Scan is a tool for the automatic analysis of Dutch text. This paper presents the first large-scale corpus analysis with T-Scan, focusing on lexical complexity. A collection of nearly 1000 text specimens was assembled, containing ten genres: travel blogs, celebrity news features, novels, textbooks for vocational secondary schools, textbooks for general secondary schools, news reports, opinion pieces, political programs, medical advice texts and research articles. The lexical complexity features in the analysis include morphology, word frequency, various word concreteness indices, personal pronouns, names and verb tense. Systematic genre differences are found, such that a genre detection model comprising 18 T-Scan features correctly identifies 83 percent of the corpus texts. Most lexical features differentiating genres intuitively relate to text topic complexity. A closer analysis is offered of the contrast between the two textbook samples in the corpus, which differ only in the educational levels they cater for. Again, topic variation seems a more important factor than stylistic variation. We demonstrate a new method to examine stylistic variation, which consists of within-genre comparisons using the genre prediction; more specifically, ‘deviant’ texts are compared to ‘typical’ members of their genre.

show abstract

“…Candidates for curation were identified and for each it was assessed as to (1) whether it would be desirable to have the resource curated and (2) whether successful curation would be feasible. A more elaborate description of how these criteria can be operationalized is given in Oostdijk et al (2013).…”

Section: Data Curationmentioning

confidence: 99%

Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands

Odijk

2015

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

We hope that this volume will be the first in a series of publications where members of the CLARIN community share their experiences and their results with their colleagues and with the humanities and social sciences research communities at large. CLARIN, the Common Language Resources and Technology Infrastructure, is a European Research Infrastructure for the humanities and social sciences, with a specific focus on language in all its forms, and in the many roles it plays in our society and in research, be it as carrier of information, record of the past, means of human expression or object of study.CLARIN provides a broad range of services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources.The CLARIN community comprises a variety of groups of people, such as those who build and maintain the infrastructure, those who provide data and tools, and most importantly: those who make use or intend to make use of the CLARIN infrastructure to facilitate and innovate their research. In order to ensure convergence and cross-fertilization between and amongst these groups it is important that they get together and exchange problems and solutions, successes and failures, things yet to be done, and inspiring examples of the capabilities of the infrastructure.The annual CLARIN conference is one of the places where members of the CLARIN community meet. These proceedings present a selection of the highlights of the 2014 annual conference and we hope that they will not only serve to keep people inside CLARIN informed of what is happening, but that they will also reach a much broader circle of researchers who could benefit from what CLARIN has to offer, or who could contribute to the further development of the CLARIN infrastructure.

show abstract

The CLARIN-NL Data Curation Service: Bringing Data to the Foreground

Cited by 3 publications

References 3 publications

A study on formalizing the knowledge of data curation activities across different fields

A study on formalizing the knowledge of data curation activities across different fields

Tekstgenres analyseren op lexicale complexiteit met T‑Scan

Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands

Contact Info

Product

Resources

About