Wikipedia Data Snapshot
Jump to navigation
Jump to search
A Wikipedia Data Snapshot is a database snapshot of a Wikipedia.
- Context:
- It can (typically) be a Language-specific Wikipedia, such as an english Wikipedia snapshot.
- It can be used to populate a Wikipedia Mirror Site.
- It can be used as a Large Text Dataset.
- ...
- Example(s):
- Counter-Example(s):
- See: Wikidata Knowledge Base, Wikipedia-based Word Mention Normalization, WikiPrep.
References
2014
- http://en.wikipedia.org/wiki/Wikipedia:Database_download
- Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.
2010
- http://en.wikipedia.org/wiki/Wikipedia_database
- Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.
- WikiXML Wikipedia Snapshot.
- http://ilps.science.uva.nl/WikiXML/
- http://zookma.science.uva.nl/wikixml/data/wikixml-en-2008
- This document describes a set of XML collections based on Wikipedia. Each collection is the result of conversion of Wikipedia in one language to an XML format that combines information from the original wikitext of articles with the result of the rendering of articles as XHTML. Different XML conversions of Wikipedia are available from a number of other projects: Ludovic Denoyer's dump used at INEX 2006 and CLEF WiQA 2006.
- created by the University of Amsterdam.
- INEX Wikipedia Snapshot
2006
- (Denoyer & Gallinary, 2006) ⇒ Ludovic Denoyer, and Patrick Gallinari. (2006). “The Wikipedia XML Corpus." SIGIR Forum.
- http://www-connex.lip6.fr/~denoyer/wikipediaXML/
- Used at INEX 2006 (and INEX 2007).
- CLEF WiQA Wikipedia Snapshot.
- XML Snapshot.
- http://ilps.science.uva.nl/WiQA/
- Welcome to WiQA, the Question Answering using Wikipedia pilot that will be launched at CLEF 2006. At this time, December 2005, the definition of the pilot task has been frozen. Please sign up for the mailing list or head to the WiQA wiki to contribute.