gensim.corpora.WikiCorpus
Jump to navigation
Jump to search
A gensim.corpora.WikiCorpus is a text-focused MediaWiki XML file parser (for a MediaWiki XML snapshot file).
- Context:
- …
- Example(s):
python -m gensim.scripts.segment_wiki -i -f mediawiki-180324.xml.gz -o mediawiki-180324.json
####
for line in smart_open('mediawiki-180324.json'):
article = json.loads(line)
print("Article title: %s" % article['title'])- …
- Counter-Example(s):
- See: MediaWiki Markup Parser.