Submissions/Parsoid: Dealing with Wikitext so you don't have to™

From Wikimania 2014 • London, United Kingdom

This is an accepted submission for Wikimania 2014.

Submission no. 5036
Title of the submission
Parsoid: Dealing with Wikitext so you don't have to™
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
Author of the submission
The Parsoid team: Subramanya Sastry, Gabriel Wicke, C. Scott Ananian, Marc Ordinas i Llopis, Arlo Breault
E-mail address
Country of origin
Affiliation, if any (organisation, company etc.)
Wikimedia Foundation
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)

Parsoid is changing the way we can work with wiki content by representing it as equivalent and editable semantic HTML+RDFa markup. It powers the VisualEditor, but is also used by a growing number of innovative projects including the Flow discussion system, the Kiwix offline reader, and the new ContentTranslation and PDF rendering systems. In the longer term, it is on track to provide the default content representation and Wikitext user interface for MediaWiki.

In this presentation, we will illustrate some of the problems we faced while building the bi-directional conversion between Wikitext and HTML. We will show how we addressed some of them, and which limitations remain. Addressing the remaining limitations will mostly involve cleaning up broken wikitext. We will show some examples, and point out the few cases where limitations actually impact non-broken wikitext. We will also describe how we systematically test the quality of the conversion to catch issues like 'dirty diffs' early before they break pages in production, and where this testing has failed in the past.

The second part of the presentation will focus on how the HTML+RDFa format and the Parsoid API can help you write more powerful gadgets, bots, edit or data extraction tools. We will illustrate this using examples from existing projects (see list of current users). Semi-automated content translations including template adaptations is a good example for a problem that was very hard to solve on the wikitext level, but becomes tractable in HTML. It is also an example of users taking an API and building innovative tools around it. As a more hands-on example, we will demonstrate how easy it is to build a small editing gadget for micro-contributions.

Finally, we will close our presentation by talking about future plans for Parsoid and MediaWiki's content representation in general. This includes directly storing HTML+RDFa for pages to speed up the site for editors and visual diffing for a more intuitive comparison of article versions. We will also show prototypes of new ways to structure the content itself using HTML-based templating and data-driven widgets for data tables and other common page elements.

  • Technology, Interface & Infrastructure
Length of session (if other than 30 minutes, specify how long)
60 minutes
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)
Special requests

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Tpt (talk) 20:00, 31 March 2014 (UTC)[reply]
  2. JackHerrick (talk) 21:35, 31 March 2014 (UTC)[reply]
  3. EpochFail (talk) 14:29, 4 April 2014 (UTC)[reply]
  4. Hardik95 (talk) 16:54, 4 April 2014 (UTC)[reply]
  5. GorillaWarfare (talk) 22:10, 4 April 2014 (UTC)[reply]
  6. KartikMistry (talk) 06:55, 5 April 2014 (UTC)[reply]
  7. Santhosh.thottingal (talk) 12:01, 5 April 2014 (UTC)[reply]
  8. Ocaasi (talk) 01:57, 8 April 2014 (UTC)[reply]
  9. Quiddity (talk) 20:13, 12 April 2014 (UTC)[reply]
  10. the wub "?!" 23:41, 13 April 2014 (UTC)[reply]
  11. --Elitre (talk) 12:11, 20 April 2014 (UTC)[reply]
  12. Mr. Stradivarius ♪ talk ♪ 07:44, 27 April 2014 (UTC)[reply]
  13. Valhallasw (talk) 19:17, 1 June 2014 (UTC)[reply]
  14. Santhosh.thottingal (talk) 11:41, 25 July 2014 (UTC)[reply]
  15. Chitetskoy (talk) 00:06, 9 August 2014 (UTC) or my Kagebunshin[reply]
  16. Add your username here.