MediaWiki XML Data Snapshot File
(Redirected from MediaWiki wiki export file)
Jump to navigation
Jump to search
A MediaWiki XML Data Snapshot File is a export file written in a MediaWiki Wiki Export File Format.
- Context:
- It can be imported using a MediaWiki XML Dump Import Tool [1].
- It can be parsed by a MediaWiki XML Data Snapshot File Parser.
- It can be utilized for data analysis, migration, or backup of a MediaWiki Server.
- It can be a Large File.
- ...
- Example(s):
- Counter-Example(s):
- Non-XML Export File.
- XML files not adhering to the MediaWiki Wiki Export File Format.
- See: MediaWiki Server, MediaWiki XML Dump Import Tool, XML Schema.
References
2017
- http://heatonresearch.com/2017/03/03/python-basic-wikipedia-parsing.html
- QUOTE: … Do not try to open the enwiki-latest-pages-articles.xml file directly with a XML or text editor, as it is very large. The code below shows you the beginning of this file. As you can see the file is made up of page tags that contain revision tags. … To read this file it is important that the XML is streamed and not read directly into memory as a DOM parser might do. The xml.etree.ElementTree class can be used to do this. The following imports are needed for this example. For the complete source code see the following GitHub link. ...
2013
- http://en.wikipedia.org/wiki/Help:Export
- Wiki pages can be exported in a special XML format to import into another MediaWiki installation or use it elsewise for instance for analysing the content. See also m:Syndication feeds for exporting other information but pages and Help:Import on importing pages.
2013b
- http://en.wikipedia.org/wiki/Help:Export#Export_format
- The format of the XML file you receive is the same in all ways. This format is codified in XML Schema at http://www.mediawiki.org/xml/export-0.6.xsd. This format is not intended for viewing in a web browser, though some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. If you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">” appear as < and >, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&".
In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal, or Wiki Markup Language). You only get the wikitext as you get when editing the article. (After export you can use alternative parsers to convert wikitext to other format)
- The format of the XML file you receive is the same in all ways. This format is codified in XML Schema at http://www.mediawiki.org/xml/export-0.6.xsd. This format is not intended for viewing in a web browser, though some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. If you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">” appear as < and >, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&".