2011 DesignandImplementationoftheSwe
- (Dohrn & Riehle, 2011) ⇒ Hannes Dohrn, and Dirk Riehle. (2011). “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ISBN:978-1-4503-0909-7 doi:10.1145/2038558.2038571
Subject Headings: Wikitext Markup Parser; Sweble Wikitext Parser.
Notes
Cited By
- Google Scholar: ~ 40 Citations.
- ACM DL: ~ 7 Citations.
- Semantic Scholar: ~ 22 Citations.
Quotes
Author Keywords
- Wiki; Wikipedia; Wiki Parser; Parsing Expression Grammar, PEG; Abstract Syntax Tree; AST; WYSIWYG; Sweble.
Copyright Information
- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
- WikiSym’11, October 3-5, 2011, Mountain View, CA, USA.
- Copyright: 2011 ACM 978-1-4503-0909-7/11/10 ...$10.00.
Abstract
The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki's content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to Media Wiki, the software running Wikipedia, and most other wiki engines.
This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia.
The parser is available as open source from http://sweble.org
1. Introduction
2. Prior And Related Work
2.1 Related And Prior Work
2.2 Prior Parser Attempts
3. Wikitext And Mediawiki
3.1 How The Mediawiki Parser Works
3.2 Challenges To Parsing Wikitext
4. The Sweble Wikitext Parser
4.1 Requirements For The Parser
4.2 Parser Design
4.3 AST Design
4.4 Parser Implementation
5. Limitations
6. Conclusions
7. Acknowledgements
We would like to thank Carsten Kolassa, Michel Salim and Ronald Veldema for their help and support.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 DesignandImplementationoftheSwe | Hannes Dohrn Dirk Riehle | Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia | 10.1145/2038558.2038571 | 2011 |