MediaWiki XML Snapshot File Parser: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "---- __NOTOC__" to "---- __NOTOC__")
m (Text replacement - ". ----" to ". ----")
 
Line 12: Line 12:
** a [[DOM Parser]].
** a [[DOM Parser]].
* <B>See:</B> [[MediaWiki Markup Parser]].
* <B>See:</B> [[MediaWiki Markup Parser]].
----
----
----
----

Latest revision as of 21:30, 17 September 2021

A MediaWiki XML Snapshot File Parser is an XML parser for a MediaWiki XML snapshot file.



References

2017

  • (Heaton, 2017) ⇒ Jeff Heaton. (2017). “Reading Wikipedia XML Dumps with Python." Blog post
    • QUOTE: … The code below shows you the beginning of this file. As you can see the file is made up of page tags that contain revision tags. … To read this file it is important that the XML is streamed and not read directly into memory as a DOM parser might do. The xml.etree.ElementTree class can be used to do this. The following imports are needed for this example. For the complete source code see the following GitHub link. ...