spaCy NLP System
A spaCy NLP System is a Python/Cython-based natural language processing library.
- Context:
- It can be designed for production usage, offering high-speed performance and state-of-the-art accuracy.
- It can be used for various NLP tasks such as tokenization, named-entity recognition, part-of-speech tagging, and syntactic parsing.
- It is known for its non-destructive tokenization, meaning the original text can be fully reconstructed from the tokenized output.
- It supports over 25 languages with statistical models for 8 languages and pre-trained word vectors.
- It integrates with deep learning frameworks, allowing for the use of convolutional neural network models for tagging, parsing, and named entity recognition.
- It provides built-in visualizers for syntax and named entities, aiding in the analysis and interpretation of text data.
- It is designed with a focus on efficiency, scalability, and integration into existing Python-based software stacks.
- It is released under the MIT license, making it freely available for commercial and non-commercial use.
- ...
- Example(s):
- v3.x (2021-present): Enhanced models with transformer support, improved pipeline customization, and added features for machine learning workflows.
- v2.x (2017-2020): Introduction of convolutional neural network models for NLP tasks and improvements in API for model training and updating.
- v2.0.11 (2018-04-04).
- v1.x (~2015-2017): Initial releases focusing on providing a solid foundation for NLP tasks with efficiency and ease of use.
offer different sets of features.
- …
- Counter-Example(s):
- See: NER System, MIT License, Syntactic Parsing System, Natural Language Toolkit, Deep Learning, Convolutional Neural Network, Tokenization, Part-of-Speech Tagging, Named-Entity Recognition.
References
2018a
- https://github.com/explosion/spaCy
- QUOTE: spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/SpaCy Retrieved:2018-5-23.
- spaCy ( /speɪˈsiː/ Template:Respell) is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. It offers the fastest syntactic parser in the world.[1][2][3] The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.[4]
Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage. As of version 1.0, spaCy also supports deep learning workflows that allow connecting statistical models trained by popular machine learning libraries like TensorFlow, Keras, Scikit-learn or PyTorch. spaCy's machine learning library, Thinc, is also available as a separate open-source Python library. On November 7, 2017, version 2.0 was released. It features convolutional neural network models for part-of-speech tagging, dependency parsing and named entity recognition, as well as API improvements around training and updating models, and constructing custom processing pipelines.
- spaCy ( /speɪˈsiː/ Template:Respell) is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. It offers the fastest syntactic parser in the world.[1][2][3] The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.[4]
- ↑ Choi et al. (2015). It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool.
- ↑ "Google’s new artificial intelligence can’t understand these sentences. Can you?". https://www.washingtonpost.com/news/wonk/wp/2016/05/18/googles-new-artificial-intelligence-cant-understand-these-sentences-can-you/. Retrieved 2016-12-18.
- ↑ "Facts & Figures | spaCy Usage Documentation". https://spacy.io/usage/facts-figures. Retrieved 2017-11-08.
- ↑ "Models & Languages | spaCy Usage Documentation". https://spacy.io/usage/models#languages. Retrieved 2017-11-08.
2018b
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/SpaCy#Main_features Retrieved:2018-5-23.
- Non-destructive tokenization.
- Named entity recognition.
- Support for over 25 languages * Statistical models models for 8 languages
- Pre-trained word vectors.
- Part-of-speech tagging.
- Labelled dependency parsing
- Syntax-driven sentence segmentation.
- Text classification.
- Built-in visualizers for syntax and named entities.
- Deep learning integration