MediaWiki XML Data Snapshot File

From GM-RKB
Jump to navigation Jump to search

A MediaWiki XML Data Snapshot File is a export file written in a MediaWiki Wiki Export File Format.



References

2017

  • http://heatonresearch.com/2017/03/03/python-basic-wikipedia-parsing.html
    • QUOTE: … Do not try to open the enwiki-latest-pages-articles.xml file directly with a XML or text editor, as it is very large. The code below shows you the beginning of this file. As you can see the file is made up of page tags that contain revision tags. … To read this file it is important that the XML is streamed and not read directly into memory as a DOM parser might do. The xml.etree.ElementTree class can be used to do this. The following imports are needed for this example. For the complete source code see the following GitHub link. ...

2013

2013b

  • http://en.wikipedia.org/wiki/Help:Export#Export_format
    • The format of the XML file you receive is the same in all ways. This format is codified in XML Schema at http://www.mediawiki.org/xml/export-0.6.xsd. This format is not intended for viewing in a web browser, though some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. If you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">” appear as &lt; and &gt;, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&amp;".

      In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal, or Wiki Markup Language). You only get the wikitext as you get when editing the article. (After export you can use alternative parsers to convert wikitext to other format)