- r-base-core (>= 4.2.2.20221110-1)
- r-api-4.0
- r-cran-stringi (>= 1.0.1)
- r-cran-rcpp (>= 0.12.3)
- r-cran-snowballc (>= 0.5.1)
- libc6 (>= 2.17)
- libgcc-s1 (>= 3.0)
- libstdc++6 (>= 11)
Convert natural language text into tokens. Includes tokenizers for
shingled n-grams, skip n-grams, words, word stems, sentences,
paragraphs, characters, shingled characters, lines, tweets, Penn
Treebank, regular expressions, as well as functions for counting
characters, words, and sentences, and a function for splitting longer
texts into separate documents, each with the same number of words.
The tokenizers have a consistent interface, and the package is built
on the 'stringi' and 'Rcpp' packages for fast yet correct
tokenization in 'UTF-8'.
Installed Size: 881.7 kB
Architectures: arm64 amd64