eosTokFeat.pl Program
Jump to navigation
Jump to search
An eosTokFeat.pl Program is a heuristic NLP pre-processing program developed by Gabor Melli.
- Context:
- It can (typically) make use of the EOSTOKFEAT.pm Module.
- It can (typically) include an End-of-Sentence Detector (with the
--eos
flag). - It can (typically) include an Text Tokenizer (with the
--tok
flag). - It can (typically) include an Text Token Featurizer (with the
--feat
flag).
- Example(s):
$ echo "Mr. Toro \"announced\" Sony's new Vaio (VPC-F223FX/S) today. It compares well to the iPad 2." | ./eosTokFeat.pl --eos
Mr. Toro "announced" Sony's new Vaio (VPC-F223FX/S) today. </s> It compares well to the iPad 2. </s>$ echo "Mr. Toro \"announced\" Sony's new Vaio (VPC-F223FX/S) today. It compares well to the iPad 2." | ./eosTokFeat.pl --eos --tok
Mr. Toro " announced " Sony's new Vaio (VPC-F223FX/S ) today . </s> It compares well to the iPad 2 . </s>echo "I saw E. coli under the microscope with Dr. Smith. They were moving." | ./eosTokFeat.pl --eos --tok --feat
- Counter-Example(s):
- NLTK.
- See: datgen.c.
References
2014
$ git clone http://github.com/gmelli/text-featurizer
Cloning into 'text-featurizer'...
$ ls -l text-featurizer/src/
-rw-rw-r-- 1 Gabor Gabor 3.0K Apr 21 13:56 eosTokFeat.pl
-rw-rw-r-- 1 Gabor Gabor 59K Apr 21 13:56 EOSTOKFEAT.pm
cpan -i Lingua::Stem::Snowball