MediaWiki XML Snapshot File Parser

Context:
- It can be a Text-focused MediaWiki XML Snapshot File Parser, such as gensim.corpora.WikiCorpus [1].
- It can be a Raw Content-focused MediaWiki XML Snapshot File Parser, such as (Heaton, 2017).
Example(s):
- GMRKB.ReadMWDump;
- gensim.corpora.WikiCorpus;
- one based on: xml.etree.ElementTree, such as [2];
- …
Counter-Example(s):
- a WikiText Markup Parser.
- a DOM Parser.
See: MediaWiki Markup Parser.

References

(Heaton, 2017) ⇒ Jeff Heaton. (2017). “Reading Wikipedia XML Dumps with Python." Blog post
- QUOTE: … The code below shows you the beginning of this file. As you can see the file is made up of page tags that contain revision tags. … To read this file it is important that the XML is streamed and not read directly into memory as a DOM parser might do. The xml.etree.ElementTree class can be used to do this. The following imports are needed for this example. For the complete source code see the following GitHub link. ...