n-Gram Generation System: Difference between revisions
Jump to navigation
Jump to search
(Created page with "An n-Gram Generation System is a tuple generation system that can solve an n-Gram generation task (to produce an n-gram sets for a sequence record). * <B>S...") |
m (Text replacement - " it." to " it.") |
||
Line 7: | Line 7: | ||
===2008=== | ===2008=== | ||
* http://code.prashanthellina.com/code/generate_ngrams.py | * http://code.prashanthellina.com/code/generate_ngrams.py | ||
** QUOTE: The “generate_ngrams.py” script creates [[uni-gram|uni]], [[bi-gram|bi]] and [[tri-gram]]s of whatever [[text]] is piped into it. The following command pipes all the txt files through both the scripts to create the [[ngram set|ngram]]s [[file]]. <code>for i in `find gutenberg_txt/ -name "*.txt"`; do cat $i | python remove_gutenberg_text.py | grep -i -v "project gutenberg" | python generate_ngrams.py >> gutenberg_ngrams; done</code> | ** QUOTE: The “generate_ngrams.py” script creates [[uni-gram|uni]], [[bi-gram|bi]] and [[tri-gram]]s of whatever [[text]] is piped into [[it]]. The following command pipes all the txt files through both the scripts to create the [[ngram set|ngram]]s [[file]]. <code>for i in `find gutenberg_txt/ -name "*.txt"`; do cat $i | python remove_gutenberg_text.py | grep -i -v "project gutenberg" | python generate_ngrams.py >> gutenberg_ngrams; done</code> | ||
---- | ---- | ||
__NOTOC__ | __NOTOC__ | ||
[[Category:Concept]] | [[Category:Concept]] |
Revision as of 22:15, 7 November 2015
An n-Gram Generation System is a tuple generation system that can solve an n-Gram generation task (to produce an n-gram sets for a sequence record).
- See: N-gram Model, Skip Gram.
References
2008
- http://code.prashanthellina.com/code/generate_ngrams.py
- QUOTE: The “generate_ngrams.py” script creates uni, bi and tri-grams of whatever text is piped into it. The following command pipes all the txt files through both the scripts to create the ngrams file.
for i in `find gutenberg_txt/ -name "*.txt"`; do cat $i | python remove_gutenberg_text.py | grep -i -v "project gutenberg" | python generate_ngrams.py >> gutenberg_ngrams; done
- QUOTE: The “generate_ngrams.py” script creates uni, bi and tri-grams of whatever text is piped into it. The following command pipes all the txt files through both the scripts to create the ngrams file.