nltk.tokenizer.punk Tokenizer

From GM-RKB

Jump to navigation Jump to search

An nltk.tokenizer.punk Tokenizer is a text tokenizer included in NLTK.

See: NLTK Stemmer.

References

2014

http://www.nltk.org/_modules/nltk/tokenize/punkt.html
- QUOTE: This tokenizer divides a text into a list of sentences, by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences. It must be trained on a large collection of plaintext in the target language before it can be used.
  The NLTK data package includes a pre-trained Punkt tokenizer for English.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=nltk.tokenizer.punk_Tokenizer&oldid=730471"

Concept