NLTK Python Toolkit
An NLTK Python Toolkit is a broad-coverage Python-based NLP Toolkit.
- Context:
- It can be used to create an NLTK-based NLP System.
- It can (typically) contain:
- an NLTK Tokenizer, e.g. NLTK punkt tokenizer.
- a Stemmer.
- a PoS Tagger.
- a Chunker.
- a Parser.
- a Classifier.
- a Clusterer.
- an NLTK Data Package.
- …
- Example(s):
- NLTK v3.0: NLTK 3.0.2 (2014-03-13).
- NLTK v2.0: NLTK 2.0.3 (2012-09-24).
- …
- Counter-Example(s):
- See: OpenNLP System.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Natural_Language_Toolkit Retrieved:2015-7-12.
- The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook.
NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning.
NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems.
- The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook.
2014
- http://www.nltk.org/
- QUOTE: NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics, NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-driven project.
NLTK has been called “a wonderful tool for teaching, and working in, computational linguistics using Python,” and “an amazing library to play with natural language.”
Natural Language Processing with Python provides a practical introduction to programming for language processing. Written by the creators of NLTK , it guides the reader through the fundamentals of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure, and more. The book is being updated for Python 3 and NLTK 3. (The original Python 2 version is still available at http://nltk.org/book_1ed.)
- QUOTE: NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum.
- https://github.com/nltk/nltk
- NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.
2011
- (NLTK, 2011) ⇒ http://code.google.com/p/nltk/
- QUOTE: Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
2009
- (Bird et al., 2009) ⇒ Steven Bird, Ewan Klein, and Edward Loper. (2009). “Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly
- QUOTE: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.
- http://www.nltk.org/code
- NLTK includes the following software modules (~120k lines of Python code):
- Corpus readers: interfaces to many corpora
- Tokenizers: whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter
- Stemmers: Porter, Lancaster, regexp
- Taggers: regexp, n-gram, backoff, Brill, HMM, TnT
- Chunkers: regexp, n-gram, named-entity
- Parsers: recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, …
- Semantic interpretation: untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface
- WordNet: WordNet interface, lexical relations, similarity, interactive browser
- Classifiers: decision tree, maximum entropy, naive Bayes, Weka interface, megam
- Clusterers: expectation maximization, agglomerative, k-means
- Metrics: accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation
- Estimation: uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell