Re: [dm-l] OCR on Manuscripts

21 Mar 2008


      Melissa Terras is the great expert on this, of course, along with our
own Arianna Ciula and Paul Stokes. My understanding from a talk I heard
her give last year in London is that it is a holy grail of OCR
companies, but nowhere near practical.
I'm not an expert on the current state of this field, but I believe the
approach currently is at best to assist humans in the interpretation of
characters by looking for patterns that can be highlighted, rather than
attempt to replace humans in interpret strings of manuscript text. So in
this regard something like how we used to use photocopiers to increase
contrast or different types of light to emphasis different types of
strokes or ink. But now going farther than this in looking for patterns
that can then be approved or interpreted by a human.
So my sense is that if the question is "can I use a computer to avoid
keying the content of this manuscript into a computer?" the answer is
no. But if the question is "can I use a computer to help me recognise,
interpret, or classify aspects or characters in this script?" the answer
is yes depending on the specific issues you are looking at analysing.
Good convenient starting places on the current state of the field are
the articles in DM by Paul Stokes (newly published in DM 3:
http://www.digitalmedievalist.org/journal/3/stokes/) and Arianna Ciula
(DM 1: http://www.digitalmedievalist.org/journal/1.1/ciula/)%5B1]. We have
a review coming of Terras's book _Image to Interpretation: An
intelligent system to aid historians in reading the Vindolanda texts_
(Oxford: 2006).
-dan
[1] If you haven't been to the DM site in a while, you'll see it has
been improved and reorganised. Old URLs for journal articles should
still work however.
On Fri, 2008-03-21 at 13:58 -0400, Dot Porter wrote:
...
Dear List,
I've received a query from a medievalist who is interested in applying
 OCR to manuscripts. I'm not really aware of recent work in this area
 and I'm wondering what, if anything, is being done at this time or in
 the recent past. Last time I looked into it good OCR from handwritten
 texts was a long way off - for nicely written, straight English text,
 to say nothing of heavily abbreviated medieval Latin or Old English
 writing. But I'd be delighted to be proven wrong.
Thanks!
 Dot
--

Dot Porter, University of Kentucky
 #####
 Program Coordinator
 Collaboratory for Research in Computing for Humanities
 http://www.rch.uky.edu
 Center for Visualization and Virtual Environments
 http://www.vis.uky.edu
 dporter@uky.edu 859-257-1257 x.82115

-- 
Daniel Paul O'Donnell, PhD
Department Chair and Associate Professor of English
Director, Digital Medievalist Project http://www.digitalmedievalist.org/
Chair, Text Encoding Initiative http://www.tei-c.org/

Department of English
University of Lethbridge
Lethbridge AB T1K 3M4
Vox +1 403 329-2377
Fax +1 403 382-7191
Email: daniel.odonnell@uleth.ca
WWW: http://people.uleth.ca/~daniel.odonnell/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [dm-l] OCR on Manuscripts