spaCy Lemmatizer
Jump to navigation
Jump to search
A spaCy Lemmatizer is a word lemmatisation system within spaCy (that assigns base forms to tokens using rules based on part-of-speech tags or lookup tables).
- Context:
- It can be implemented for different languages via language-specific factories.
- It can use a standalone pipeline component that can be added to a spaCy NLP pipeline.
- It can determine the lemma of a word based on its intended meaning and part of speech within a sentence.
- It can operate in different modes, such as "rule" or "lookup," depending on the configuration and available language-specific lemmatizer.
- It can be configured to overwrite existing lemmas or to use specific modes for lemmatization.
- It can utilize the spaCy-lookups-data extension package for its default data.
- It can support customizable lemmatization by allowing users to specify the mode (e.g., "lookup" or "rule") and whether to overwrite existing lemmas.
- It can be part of a customizable pipeline where it is positioned after components that assign coarse-grained POS tags.
- ...
- Example(s):
- Counter-Example(s):
- See: spaCy, Natural Language Processing, Lemma, Part-of-Speech Tagging.