I am the project director of the Electronic Grosseteste, a research resource that provides access to electronic medieval Latin texts and an integrated bibliography. The textbase is composed of a variety of Latin texts (most of them under copyright but still searchable). Right now the search engine is pretty primitive, and one enhancement I would like to make is to account for orthographical variants in the texts. Some texts were classicized, while other editors followed either the orthography of a single manuscript or attempted to follow some sort of convention based generally on Latin texts in later medieval England (these are the facts, and this post is not about the joy of debating editorial practice). Ideally, I would like to allow searches to include returns for classical and "medieval" spellings. For example, if a user queried "scientia" the engine would return matches for "scientia" and "sciencia". (wildcards are permitted, btw).
Now I work in Perl5, and so my initial thought was to create a set of hash tables that would map these variants since hashes would allow for more than one variant per entity, and the engine would then perform a lookup for each query element. Now I suppose coding into the engine the "orthographical rules" is another option, but I'll be honest and admit that computational linguistics has never been my thing. And, the beauty of hashes in Perl is that they are compiled very quickly, and don't eat too much memory.
Now before I go and reinvent the wheel with these hash tables, does anyone know of an open-source method or resource that addresses this kind of problem (I know that Brepols--pardon me, Brepolis...yeesh---has this all figured out but they don't play will with others, so that's a closed door.). My limited scouring of the web has yielded no joy, and so I seek the sage advice of this community.
Many thanks Jim
-------------------- Dr James R. Ginther, PhD Assoc. Professor of Medieval Theology & Director of Graduate Studies Dept of Theological Studies St Louis University ginthej@slu.edu --------------------------------- dept: http://theology.slu.edu/ research: http://www.grosseteste.com/
I wonder what the Dictionary of Old English does for their corpus? They used not to have any kind of normalisation routine, and I (sheepish grin) still use a version I got from the Oxford Text Archive some time ago, so I don't know how the on-line interface works. Does anybody? Anybody from the DOE here? If not I can e-mail the query on to them (not that that should shut down discussion.
-d
On Thu, 2005-16-06 at 13:42 -0500, James R. Ginther wrote:
I am the project director of the Electronic Grosseteste, a research resource that provides access to electronic medieval Latin texts and an integrated bibliography. The textbase is composed of a variety of Latin texts (most of them under copyright but still searchable). Right now the search engine is pretty primitive, and one enhancement I would like to make is to account for orthographical variants in the texts. Some texts were classicized, while other editors followed either the orthography of a single manuscript or attempted to follow some sort of convention based generally on Latin texts in later medieval England (these are the facts, and this post is not about the joy of debating editorial practice). Ideally, I would like to allow searches to include returns for classical and "medieval" spellings. For example, if a user queried "scientia" the engine would return matches for "scientia" and "sciencia". (wildcards are permitted, btw).
Now I work in Perl5, and so my initial thought was to create a set of hash tables that would map these variants since hashes would allow for more than one variant per entity, and the engine would then perform a lookup for each query element. Now I suppose coding into the engine the "orthographical rules" is another option, but I'll be honest and admit that computational linguistics has never been my thing. And, the beauty of hashes in Perl is that they are compiled very quickly, and don't eat too much memory.
Now before I go and reinvent the wheel with these hash tables, does anyone know of an open-source method or resource that addresses this kind of problem (I know that Brepols--pardon me, Brepolis...yeesh---has this all figured out but they don't play will with others, so that's a closed door.). My limited scouring of the web has yielded no joy, and so I seek the sage advice of this community.
Many thanks Jim
Dr James R. Ginther, PhD Assoc. Professor of Medieval Theology & Director of Graduate Studies Dept of Theological Studies St Louis University ginthej@slu.edu
dept: http://theology.slu.edu/ research: http://www.grosseteste.com/ _______________________________________________ Digital Medievalist Project Homepage: http://www.digitalmedievalist.org Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm RSS (announcements) server: http://www.digitalmedievalist.org/rss/rss2.cfm Wiki: http://sql.uleth.ca/dmorgwiki/index.php Change membership options: http://listserv.uleth.ca/mailman/listinfo/dm-l Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm Contact editorial Board: digitalmedievalist@uleth.ca dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l