2022
DOI: 10.18637/jss.v103.i02
|View full text |Cite
|
Sign up to set email alerts
|

stringi: Fast and Portable Character String Processing in R

Abstract: Effective processing of character strings is required at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation. Pattern searching, string collation and sorting, normalization, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package for fast and portable handling of string data based on ICU (Internatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0
6

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 62 publications
(41 citation statements)
references
References 13 publications
0
35
0
6
Order By: Relevance
“…and the packages dplyr (Wickham et al, 2018), ggplot2 (Wickham, 2016), lmerTest (Kuznetsova et al, 2017), psych (Revelle, 2018), lme4 (Bates et al, 2015), effects (Fox & Weisberg), simr (Green & MacLeod, 2016), WebPower (Zhang & Mai, 2022), lsr (Navarro, 2015), and stringi (Gagolewski, 2022) to create letter matrices and finally analyze the data. The entire analysis script as well as the script for generating the letter matrices was preregistered in advance (https://osf.io/yuh78).…”
Section: Methodsmentioning
confidence: 99%
“…and the packages dplyr (Wickham et al, 2018), ggplot2 (Wickham, 2016), lmerTest (Kuznetsova et al, 2017), psych (Revelle, 2018), lme4 (Bates et al, 2015), effects (Fox & Weisberg), simr (Green & MacLeod, 2016), WebPower (Zhang & Mai, 2022), lsr (Navarro, 2015), and stringi (Gagolewski, 2022) to create letter matrices and finally analyze the data. The entire analysis script as well as the script for generating the letter matrices was preregistered in advance (https://osf.io/yuh78).…”
Section: Methodsmentioning
confidence: 99%
“…The Lens is an open-access platform archiving hundreds of millions of scholarly articles. Search results were exported to ris format and combined using custom code (RStudio, v2021.09.0, build 351, RStudio PBC) drawing on the following packages (Synthesisr, Data.table, Tidyverse, Expss, Revtools 17 , dplyr, stringi 18 ), to filter by publication year (>2019); remove exact and fuzzy duplicate matches of title and doi; remove systematic reviews and meta-analyses and then identify relevant articles by abstract keyword search (Supplementary Material B - Combine Screening Code) . The screened results were imported into Rayyan 19 for manual screening (Figure 1).…”
Section: Methodsmentioning
confidence: 99%
“…Then, passing the nrows argument we can indicate the number of rows to fetch. In [29] it is noted that effective processing of character strings is needed at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation; compare, e.g., [81] and [16]. Pattern searching, string collation and sorting, normalisation, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics.…”
Section: File Connections (*)mentioning
confidence: 99%
“…In the following subsections, we review the most essential elements of the regex syntax as we did in [29]. One general introduction to regexes is [25].…”
Section: Matching Individual Charactersmentioning
confidence: 99%