2018
DOI: 10.21105/joss.00655
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Consistent Tokenization of Natural Language Text

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0
4

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 91 publications
(54 citation statements)
references
References 8 publications
0
50
0
4
Order By: Relevance
“…All CDR3s-amino acid (a.a.) from each patient's TCR repertoire were deconstructed into overlapping subsequences ("k-mers") of length 3 (k = 3) using the tokenizers R package (29). Then, all k-mers were condensed into a k-mer frequency distribution containing each k-mer's frequency across all CDR3s of a given repertoire.…”
Section: K-mer-based Cdr3 Subsequence Analysismentioning
confidence: 99%
“…All CDR3s-amino acid (a.a.) from each patient's TCR repertoire were deconstructed into overlapping subsequences ("k-mers") of length 3 (k = 3) using the tokenizers R package (29). Then, all k-mers were condensed into a k-mer frequency distribution containing each k-mer's frequency across all CDR3s of a given repertoire.…”
Section: K-mer-based Cdr3 Subsequence Analysismentioning
confidence: 99%
“…Consecutively, we analyzed the data using text mining methods (Silge and Robinson, 2016 ) to estimate the frequency and the association of body patterns, such as radar charts and correlations with Bonferroni correction for multiple comparisons. All the analyses were conducted in (RStudio Team, 2015 ) v1.2.1335® (RStudio Team, 2015 ) using the packages FactoMineR v1.42 (Lê et al, 2008 ), tidyverse v1.2.1 (Wickham et al, 2019a ), tidytext v0.2.2 (Silge and Robinson, 2016 ), dplyr v0.8.3 (Wickham et al, 2019b ), widyr v0.1.2 (Robinson, 2019 ), tokenizers v0.2.1 (Mullen et al, 2018 ), quanteda v1.5.1 (Benoit et al, 2018 ) and igraph v1.2.4.1 (Csardi and Nepusz, 2006 ).…”
Section: Methodsmentioning
confidence: 99%
“…Using the combination of rentrez [8], easyPubMed [9], pubmed.mineR [10], tokenizers [11] and blogdown [12] R packages, and Hugo [13] it…”
Section: Methodsmentioning
confidence: 99%