Submissions/Fixing grammar errors semi-automatically
This is an accepted submission for Wikimania 2014.
- Submission no. 5022
- Title of the submission
- Fixing grammar errors semi-automatically
- Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
- Author of the submission
- Daniel Naber, Marcin Miłkowski
- E-mail address
- daniel.naberlanguagetool.org, Marcin.Milkowskiifispan.waw.pl
- Country of origin
- Germany, Poland
- Affiliation, if any (organisation, company etc.)
- Abstract (at least 300 words to describe your proposal)
To improve the quality of text on Wikipedia, we developed a system that scans all Wikipedia edits for style and grammar errors. Anyone can correct the errors, often without any editing but just with some clicks. The software fetches the Atom feed of changes at least once a minute and runs LanguageTool on the edited paragraphs to find errors that have been introduced with that edit. LanguageTool is our Open Source style and grammar checker software that supports many languages, including English, French, German, and Polish.
LanguageTool detects problems that a common spell checker won't detect. Typical errors it detects include:
- missing possessive apostrophes: "Download software from the teachers computer" instead of "Download software from the teacher's computer"
- agreement errors: "He has two brother" instead of "He has two brothers"
- a vs. an, e.g. "a Indian film" instead of "an Indian film"
- missing space after a sentence period
The basic approach for finding errors is to search the text for patterns of known errors. Many of the patterns are quite simple and all patterns are independent of each other. Thus LanguageTool can easily be extended to detect new kinds of potential problems, also ones specific to Wikipedia. For example, the German rules of LanguageTool detect weasel words like "many people say", which are not wrong, but usually not appropriate for Wikipedia. The presentation will give a brief introduction on how to write new error detection rules. It will also explain the reasons for false alarms, some of which are due to bugs, some of which are due to the way we extract text from the Wikipedia.
Our wish list for the future contains more Wikipedia-specific error detection rules and closer integration into MediaWiki, for example integration into the Visual Editor. The presentation will provide some ideas on how this could be achieved.
- Technology, Interface & Infrastructure
- Length of session (if other than 30 minutes, specify how long)
- 30 minutes
- Will you attend Wikimania if your submission is not accepted?
- yes (Daniel), not yet decided (Marcin)
- Slides or further information (optional)
- Special requests
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).