Re: [dm-l] Use of wikis for transcription

26 Jul 2009


      ...
...
"Daniel Paul O'Donnell" daniel.odonnell@gmail.com
My main caveat about this is that this is probably only suitable for a
first pass at character capture rather than any serious editorial work,
and even then it might prove problematic.
Surely not.  These texts were designed to be read. Most of the stuff is 
pretty obvious, the biggest problem is the use of shorthand terms, which 
requires some knowledge of the technical terms.  This is where 'knowledge of 
the crowd' can be useful.
...
...
The main problem is that there  is no way of automatically enforcing a 
policy on what is to be captured
or how these details are to be recorded: if you have multiple people
working on a project, and you have no control over what they enter, they
will invariably diverge in practice: one scholar might expand
abbreviations, another prefer to leave them unexpanded; one might record
multiple options for reading difficult passages, another just choose the
most likely, and yet others use different symbols for indicating
uncertainty.
Why?  This is an issue about conventions.
...
...
Same is true of damage or codicological features: one
scholar might want to record information about the damage down in lines
38 or so and ff.; another might just transcribe around the cut and damage.
Conventions again. Most of the texts I look at don't have any damage.
...
...
XML and XML tools like the OxygenXML editor are designed precisely to
give you this kind of control. There was a time when they could seem
quite intimidating.
I find software pretty intimidating.  How does this work?
...
...
Nowadays, however, they are becoming ever more user
friendly. So one, slightly more formal way, of setting a project up so
that more than one person could transcribe texts might be to ask
everybody to get a copy of something like OxygenXML (although in
principle, it wouldn't matter what editor they used), and then store
their common transcriptions online in a version control repository like
subversion. People would work on transcriptions on their home computers
and then log the files back into the common repository when they were
finished. Subversion logs changes and lets more than one person work on
the same file at the same time. And you can show the world what you are
doing by also publishing the repository (this is how Digital Medievalist
works, in fact: we have a subversion repository with all the XML files.
This repository is copied to a non-public website (so we can check our
work) once every minute or two and then to the public site once a day.
Individual editors download files to edit from the central repository).
OK
...
...
The above assumes that your plan is to have a group of previously
identified editors work on the same project (negotiated collaboration).
Yes.
...
...
If your goal instead is crowd sourcing (i.e. just putting MS images and
or transcriptions up and letting anybody transcribe or edit them, then
you'll need some other solution.
Also interesting.  My experience is that those who can read these 
manuscripts are limited to less than a crowd.
...
...
Bottom line: a wiki is a good informal way of sharing work (with loss of
policy control) or for crowdsourcing very simple questions (are these
the right letters? what letters does this image contain?). But for
really encoding expert knowledge or doing anything complicated with the
text at all, you are going to want to use XML. There are very robust
ways of doing that in a distributed fashion.
I suspect that crowdsourcing may work for stuff like translation and 
particularly for image recognition.  As mentioned above, reading the 
manuscripts is very easy with a limited amount of practice.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [dm-l] Use of wikis for transcription