RE: [dm-l] Searching Latin texts and orthographical variants

21 Jun 2005


      Hi Jim!
I don't think the computational linguistics has to get too involved when
you're dealing with scholastic Latin.  If -nt[vowel]- gets converted to
-nc[vowel]- and -ae- gets converted to -e-, for example, then you're
well on the way. You lose some fine granularity in your search, but
that's going to happen anyway when you're dealing with heterogeneous
materials like what you describe. I worked out a few rules like this for
a processing routine in Pascal years ago for work on 14th century Latin
preaching (Bromyard), and I'd be happy to dig out the code and send the
relevant bits. If you do it with hashes and you don't have rules like
this, won't you have to create the association between scientia and
sciencia (and all other pairs) explicitly?
Peter
________________________________
From: dm-l-bounces@uleth.ca [mailto:dm-l-bounces@uleth.ca] On
Behalf Of James R. Ginther
    Sent: Thursday, June 16, 2005 12:43 PM
    To: dm-l@uleth.ca
    Subject: [dm-l] Searching Latin texts and orthographical
variants
    
    
    I am the project director of the Electronic Grosseteste, a
research resource that provides access to electronic medieval Latin
texts and an integrated bibliography.  The textbase is composed of a
variety of Latin texts (most of them under copyright but still
searchable).  Right now the search engine is pretty primitive, and one
enhancement I would like to make is to account for orthographical
variants in the texts.  Some texts were classicized, while other editors
followed either the orthography of a single manuscript or attempted to
follow some sort of convention based generally on Latin texts in later
medieval England (these are the facts, and this post is not about the
joy of debating editorial practice).  Ideally, I would like to allow
searches to include returns for classical and "medieval" spellings.  For
example, if a user queried "scientia" the engine would return matches
for "scientia" and "sciencia". (wildcards are permitted, btw). 
     
    Now I work in Perl5, and so my initial thought was to create a
set of hash tables that would map these variants since hashes would
allow for more than one variant per entity, and the engine would then
perform a lookup for each query element.  Now I suppose coding into the
engine the "orthographical rules" is another option, but I'll be honest
and admit that computational linguistics has never been my thing.  And,
the beauty of hashes in Perl is that they are compiled very quickly, and
don't eat too much memory. 
     
    Now before I go and reinvent the wheel with these hash tables,
does anyone know of an open-source method or resource that addresses
this kind of problem (I know that Brepols--pardon me,
Brepolis...yeesh---has this all figured out but they don't play will
with others, so that's a closed door.).  My limited scouring of the web
has yielded no joy, and so I seek the sage advice of this community.  
     
    Many thanks
    Jim
     
    --------------------
    Dr James R. Ginther, PhD
    Assoc. Professor of Medieval Theology
    & Director of Graduate Studies
    Dept of Theological Studies
    St Louis University
    ginthej@slu.edu
    ---------------------------------
    dept: http://theology.slu.edu/
    research: http://www.grosseteste.com/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

RE: [dm-l] Searching Latin texts and orthographical variants