MediaWiki XML Snapshot File Parser

From GM-RKB
(Redirected from MediaWiki XML Dump Parser)
Jump to navigation Jump to search

A MediaWiki XML Snapshot File Parser is an XML parser for a MediaWiki XML snapshot file.



References

2017

  • (Heaton, 2017) ⇒ Jeff Heaton. (2017). “Reading Wikipedia XML Dumps with Python." Blog post
    • QUOTE: … The code below shows you the beginning of this file. As you can see the file is made up of page tags that contain revision tags. … To read this file it is important that the XML is streamed and not read directly into memory as a DOM parser might do. The xml.etree.ElementTree class can be used to do this. The following imports are needed for this example. For the complete source code see the following GitHub link. ...