Apologies if this sounds a bit like carping about what is indeed a great step forward in enhancing access to these resources, but I'm rather disappointed to see that UM still appears to think that HTML is the only way of delivering them.
According to the front page "All texts in the archive are valid SGML documents, tagged in conformance with the TEI Guidelines, and converted to the TEI Lite DTD for wider use."
However I'm blessed if I can see any way in which the texts can be obtained from the site *in* that SGML format. Which rather devalues that nice promise about "wider use" doesn't it?
Some of us would like to be able to do other kinds of research, beyond what the web interface offers. I'd like to do corpus linguistics kinds of things. I want to crunch these texts up and look for linguistic patterns in them. I don't want to depend on the web for access and I don't want to be restricted to what UM has decided is the way these texts should be presented.
Why do I have to resort to "screen scraping"? Do I really have to download each file individually and convert the HTML back into something more expressive? Actually that's not too hard -- as a quick "view source" shows -- but what a waste of time! That's not what I signed up for when I joined the digital revolution!
still crazy
Lou
Al Magary wrote:
[Announcement by Paul Schaffner forwarded from another medieval list. This is a wonderful expansion of a digital library of primary sources. Cheers, Al Magary]
With apologies for cross-posting.
The University of Michigan's freely accessible 'Corpus of Middle English prose and verse' more than doubled in size over the weekend. The additions are all in full text, transcribed from modern editions. Most are also linked page-by-page to page images of the editions from which they were taken, so you can always go back to check the transcription against the actual page of the print. Most front and back matter is omitted from the transcription (in order to maximize the amount of actual Middle English that we could produce), but the entire book was scanned in most cases, and can be read online in page-image form.
A brief list of the 146 books in the CME can be found here: http://www.hti.umich.edu/c/cme/browse.html
And here is the blurb from the 'what's new' page:
Corpus of Middle English Prose and Verse http://www.hti.umich.edu/c/cme/
To the 62 searchable texts of the original CME have now been added 85 additional texts, many of them among the largest and most significant monuments of Middle English, including the earlier and later versions of the Wycliffite Bible, Trevisa's and the anonymous translations of Higden's Polychronicon, Cursor Mundi, both versions of Guy of Warwick, the chronicles of Robert Mannyng and Robert of Gloucester, two versions of Mandeville's travels, Hoccleve's Regiment of Princes, the A, B, and C texts of Piers Plowman in Skeat's edition, the Pricke of Conscience, the Ormulum, and numerous saints' legends, including the Laud MS of the South English Legendary. The new texts also include the complete Chaucer Society '8-text' single-MS transcriptions of the Canterbury Tales. The bulk of these additional texts, transcribed from modern editions, were produced during 2000 thanks to a generous grant from the Gladys Krieble Delmas Foundation.
One text of the original CME (Marion Glasscoe's edition of Julian of Norwich, obtained from the Oxford Text Archive) has been removed at the request of its present publisher. Our apologies to those who have linked to this text; they should remove their link.
These texts have been in the queue to go online for five years, and their appearance coincides with some major changes to our retrieval and display system: we would appreciate hearing about any problems you may encounter using them.
pfs
Paul Schaffner | pfs@umich.edu | http://www-personal.umich.edu/~pfs/ 316 Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1205
Digital Medievalist Project Homepage: http://www.digitalmedievalist.org Journal (Spring 2005-): http://www.digitalmedievalist.org/journal.cfm RSS (announcements) server: http://www.digitalmedievalist.org/rss/rss2.cfm Wiki: http://sql.uleth.ca/dmorgwiki/index.php Change membership options: http://listserv.uleth.ca/mailman/listinfo/dm-l Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm Contact editorial Board: digitalmedievalist@uleth.ca dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l