Submissions/Panel: Wikisource, from digitization to data

From Wikimania 2014 • London, United Kingdom

This is an accepted submission for Wikimania 2014.

Submission no. 6031
Title of the submission

Wikisource, from digitization to data

Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)


Author of the submission

User:Charles Matthews (real name)

E-mail address

Special:EmailUser/Charles Matthews


User:Charles Matthews

Country of origin

United Kingdom

Affiliation, if any (organisation, company etc.)

Cambridge University Wikipedia Society

Personal homepage or blog


Abstract (at least 300 words to describe your proposal)

Wikisource is one of the major Wikimedia projects but it is not well understood, even by experienced Wikimedians, partly because it has aspects of library, archive and "something else". What there is to discuss is the "nous ne savons quoi" (Wikisource is of course multilingual) in the light of recent developments. What is the project doing and where is it going?

A panel of Wikisourcers and an outside voice or two will try to explain and elucidate. The group will answer questions, discuss points and possibly just argue about the "free library that anyone can improve". The choice of topics will be driven by the audience, either proposed directly on the day or submitted prior to the event, but some can be predicted.

Wikisource's texts are available in a few formats but it may be preferable to publish them in more forms compatible with e-readers. Even the forms that do exist, such as the buggy PDF generator, are often limited by the source material. This could lead to a discussion of the merits of proofread scans versus unsourced texts. Should the latter be deleted entirely across the board, as some Wikisources do already? Is MediaWiki software fully compatible with the former? How do "born digital" documents fit into this arrangement? Derivative "original" works (annotation, translation, etc) can be quite contentious on Wikisource and add another angle. Is even so much as an added wikilink a violation of the purity of the text? What are the minimum and maximum tolerances of "added value"?

Access to the Wikidata knowledge base was enabled earlier in the year. What will Wikisource do with this and how can the projects be integrated? Can texts be supported by Wikidata? With at least two items, work and edition, for every text, can other Wikimedians handle the way this is done? Branching out from one project, how can Wikisource be better coordinated with its other sister projects, like Wikipedia?

Should Wikisource be prepared to host dumps of databases, for example from the GLAM sector? The current content policy and practice might be considered to rule that out, but, going forward, catalogues/metadata of collections may be worth thinking about. Going further in that general direction, there is no sister project that hosts tabulated data.

What about OERs (Open Educational Resources)? Having regard to Wikiversity, they might seem to be spoken for by a sister project, but published educational materials that would not be editable are in a rather different position. It should be noted also that metadata is not a strength of OER repositories generally, but Wikisource gives quite good support (and its inclusion as a Wikidata substrate will improve that). Classifying reference and more general non-fiction content on Wikisource by topic, rather than by "type of text" (e.g. by the person rather than the type or origin of a biography) could probably also benefit by using Wikidata indexation to help with "internal catalogue" classification. A start has been made on a tool to match external tables of contents to Wikidata.


Open Data

Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)

Dominic McDevitt-Parks

Magnus Manske:

  • Mix'n'match tool for matching Wikidata items to external catalog IDs.
  • BEACON for getting lists of external IDs in Wikidata.

The other participants were Andrew Billinghurst and Charles Matthews.

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Scott (talk) 20:37, 23 March 2014 (UTC)[reply]
  2. Edward B (talk) 21:24, 23 March 2014 (UTC)[reply]
  3. Anthonyhcole (talk) 07:51, 24 March 2014 (UTC)[reply]
  4. AdamBMorgan (talk) 12:02, 24 March 2014 (UTC)[reply]
  5. Tpt (talk) 14:22, 4 April 2014 (UTC)[reply]
  6. Slowking4 (talk) 20:33, 6 April 2014 (UTC)[reply]
  7. Micru (talk) 19:42, 14 April 2014 (UTC)[reply]
  8. Xelgen (talk) 19:20, 29 April 2014 (UTC)[reply]
  9. Aschroet (talk) 12:50, 12 May 2014 (UTC)[reply]
  10. Dyolf77 (talk) 14:17, 12 May 2014 (UTC)[reply]
  11. Daniel Mietchen (talk) 08:09, 23 June 2014 (UTC)[reply]
  12. Dick Bos (talk) 10:16, 10 July 2014 (UTC)[reply]
  13. Frank Hendriks (talk) 12:34, 4 August 2014 (UTC)[reply]
  14. VIGNERON (talk) 13:18, 8 August 2014 (UTC)[reply]
  15. Add your username here.