Submissions/Fatg Persian-Tajik article translator

From Wikimania 2014 • London, United Kingdom
Jump to navigation Jump to search

After careful consideration, the Programme Committee has decided not to accept the below submission at this time. Thank you to the author(s) for participating in the Wikimania 2014 programme submission, we hope to still see you at Wikimania this August.

Submission no. 5020
Title of the submission
Fatg Persian-Tajik article translator
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
Author of the submission
E-mail address
Country of origin
Personal homepage or blog

Project Page

Abstract (at least 300 words to describe your proposal)

During the Wikimania 2012 in Washington D.C, I had a chance to speak with Amir, which gave me the idea of article translation from Persian Wikipedia to Tajik. Persian is spoken mainly in Iran and Afghanistan and uses the Arabic alphabets in the written form. On the other hand Tajik is the same as Persian language, spoken in Tajikistan and parts of Afghanistan and has the only difference of using Cyrillic in written form. Persian Wikipedia at the moment is the largest Wikipedia in the middle east and has far more articles than the Tajik version. This Tool has a goal of making the translation possible. Several methods have been tested to achieve the possible results. Fuzzy Logic, Bulk Data Collection from both Wikipedias and Internet, character conversion, machine-learning and at the end crowd-sourcing to lessen the errors. During the process there were(are) several problems to address. These problems and possible solutions will be discussed. As these kind of works are new in both languages, the results would be very interesting. The accuracy of the methods and time consumption for the correction and also users involvement in the project will be shown. There will be a presentation about the tool itself too. It is hosted on WMFLABS platform and uses several programming (scripting) languages like Python, PHP and JavaScript. As said before there are some unresolved problems. These are going to be described and maybe there are some new ideas or feedbacks from the audience.

  • Technology, Interface & Infrastructure
Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)

The tool is, at the moment, incomplete but the source code would be available in April. The delay is mostly because of creating the user crowd-sourcing interface.

Special requests

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Add your username here.