JamSpell
Jump to navigation
Jump to search
A JamSpell is a Context-Sensitive Spelling Correction Error System developed by Filipp Ozinov.
- AKA: JamSpell Spelling Error Correction System, JamSpell Spellchecker, JamSpell Spell Checking System.
- Context:
- It is available at https://github.com/bakwc/JamSpell
- It can solve a JamSpell Spelling Error Correction Task by implementing a JamSpell Spelling Error Correction Algorithm.
- It uses a Synthetic Spelling Error Generation System.
- It has been evaluated by a JamSpell Benchmark Task.
- Example(s):
jamspell 0.0.12
- python implementation by Filipp Ozinov (2020).- …
- Counter-Example(s)
- See: Spelling Error Correction (SEC) System, Grammatical Error Correction (GEC) System, WikiText Error Correction (WTEC) System, Natural Language Processing (NLP) System.
References
2020a
- (Ozinov, 2020) ⇒ Github Project: https://github.com/bakwc/JamSpell Retrieved: 2020-02-06
- QUOTE: JamSpell is a spell checking library with following features:
- accurate - it consider words surroundings (context) for better correction;
- fast' - near 5K words per second;
- multi-language - it's written in C++ and available for many languages with swig bindings
- QUOTE: JamSpell is a spell checking library with following features:
- ...
Errors | Top 7 Errors | Fix Rate | Top 7 Fix Rate | Broken | Speed (words/second) | |
---|---|---|---|---|---|---|
JamSpell | 3.25% | 1.27% | 79.53% | 84.10% | 0.64% | 4854 |
Norvig | 7.62% | 5.00% | 46.58% | 66.51% | 0.69% | 395 |
Hunspell | 13.10% | 10.33% | 47.52% | 68.56% | 7.14% | 163 |
Dummy | 13.14% | 13.14% | 0.00% | 0.00% | 0.00% | - |
- Model was trained on 300K wikipedia sentences + 300K news sentences (english). 95% was used for train, 5% was used for evaluation. Errors model was used to generate errored text from the original one. JamSpell corrector was compared with Norvig's one, Hunspell and a dummy one (no corrections).
2020b
- (Melli et al., 2020) ⇒ Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, and Olga Moreira. (2020). “GM-RKB WikiText Error Correction Task and Baselines.”. In: Proceedings of LREC 2020 (LREC-2020).
- QUOTE: Although the task of correcting natural language human-written text is different from that of correcting Wiki pages, we tested and compared spelling correction tools for performance evaluation purposes. We tested JamSpell, a Python library that checks and corrects spelling in text, and Pypyenchant similar spelling tool. JamSepll library has the full sentence as input and considers the context. ...
Tab.1 and 2 summarizes the task results. It shows the number of TP, FTs and the Eq.1 performances score for all the WikiText repairing tools described in Sec. 3.1 trained and tested on GM-RKB and Wikipedia datasets described in Sec.4.
- QUOTE: Although the task of correcting natural language human-written text is different from that of correcting Wiki pages, we tested and compared spelling correction tools for performance evaluation purposes. We tested JamSpell, a Python library that checks and corrects spelling in text, and Pypyenchant similar spelling tool. JamSepll library has the full sentence as input and considers the context.
Model | TP | FP | Score |
---|---|---|---|
JamSpell | 18,324 | 460,916 | -2,286,256 |
Pyenchant | 18,630 | 4,717,170 | -23,567,220 |
WikiFixer MLE | 9,838 | 449 | 7,593 |
WikiFixer NNet GM-RKB | 16,061 | 696 | 12,581 |
WikiFixer NNet Wikipedia | 8,678 | 524 | 6,058 |
Wikifixer NNet Wikipedia pretrained + GM-RKB | 13,841 | 490 | 11,391 |
Wikifixer NNet Wikipedia 7,000 pages+GM-RKB | 16,003 | 652 | 12,743 |
Model | TP | FP | Score |
---|---|---|---|
JamSpell | 11,479 | 312,809 | -1,552,566 |
Pyenchant | 9,656 | 8,351,825 | -41,749,469 |
WikiFixer MLE | 252 | 166 | -578 |
WikiFixer NNet GM-RKB | 3,954 | 287 | 2,519 |
WikiFixer NNet Wikipedia | 6,385 | 2,11 | 5,330 |
Wikifixer NNet Wikipedia pretrained + GM-RKB | 3,284 | 160 | 2,484 |
Wikifixer NNet Wikipedia 7,000 pages+GM-RKB | 6,056 | 277 | 4,671 |
2019
- (Kantor et al., 2019) ⇒ Yoav Kantor, Yoav Katz, Leshem Choshen, Edo Cohen-Karlik, Naftali Liberman, Assaf Toledo, Amir Menczel, and Noam Slonim (2019). "Learning to Combine Grammatical Error Corrections". In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. ACL 2019. DOI: 10.18653/v1/W19-4414.
- QUOTE: We tested Enchant, JamSpell and Norvig spellcheckers, finding our spellchecker outperforms those in terms of spelling correction ...
All Categories | P | R | F0.5 |
---|---|---|---|
Norvig | 0.5217 | 0.0355 | 0.1396 |
Enchant | 0.2269 | 0.0411 | 0.1192 |
JamSpell | 0.4385 | 0.0449 | 0.1593 |
our | 0.5116 | 0.0295 | 0.1198 |
R:SPELL | P | R | F0.5 |
---|---|---|---|
Norvig | 0.5775 | 0.6357 | 0.5882 |
Enchant | 0.316 | 0.6899 | 0.3544 |
JamSpell | 0.5336 | 0.6977 | 0.5599 |
our | 0.6721 | 0.5297 | 0.6378 |