r-cran-tokenizers - 0.3.0-1 main

Convert natural language text into tokens. Includes tokenizers for
shingled n-grams, skip n-grams, words, word stems, sentences,
paragraphs, characters, shingled characters, lines, tweets, Penn
Treebank, regular expressions, as well as functions for counting
characters, words, and sentences, and a function for splitting longer
texts into separate documents, each with the same number of words.
The tokenizers have a consistent interface, and the package is built
on the 'stringi' and 'Rcpp' packages for fast yet correct
tokenization in 'UTF-8'.

Priority: optional
Section: gnu-r
Suites: crimson dawn landing 
Maintainer: Debian R Packages Maintainers <r-pkg-team [꩜] alioth-lists.debian.net>
 
Homepage Source Package
 

Dependencies

Installed Size: 881.7 kB
Architectures: arm64  amd64 

 

Versions

0.3.0-1 arm64 0.3.0-1 amd64