The discussion of Junicode and fonts raises a very important issue in making medieval resources available on the web.
Using Junicode or another font created with the right unicode glyphs in is great if you can make sure your users viewing the pages have that font and are in fact using it to view these pages.
I've seen all sorts of solutions from the bad to not-so-bad but never really seen one that doesn't have some limitations somewhere.
Of course the inevitable future tense of computing means that when everyone is using browsers which support CSS3's "webfonts" then this won't be as much of a problem. (This is a way to have the browser download the font on the fly if it does not already have a copy of it.)
At the OTA we are primarily concerned with freely archiving electronic texts for long term preservation and so are able to store unicode character entities with less worries about how actually to display them for users. (Side note: If you have electronic medieval editions of texts and want free archiving of them for posterity, please see http://www.ahds.ac.uk/litlangling/depositing/index.htm for more information.)
So what is best? Obviously encoding your webpages as (say) UTF-8 is a good start. Force user to download a font for your site with appropriate glyphs? Use images of the glyphs instead of actual characters(*shudder*)? Transliterate into ascii characters/editorial marks? Use markup to allow easy replacement of different solutions on the fly?
I have my own preferences but am interested in what other people have done.
-James
--- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk
James Cummings wrote:
So what is best? Obviously encoding your webpages as (say) UTF-8 is a good start. Force user to download a font for your site with appropriate glyphs? Use images of the glyphs instead of actual characters(*shudder*)? Transliterate into ascii characters/editorial marks? Use markup to allow easy replacement of different solutions on the fly?
I have my own preferences but am interested in what other people have done.
Many factors may come into play in this decision, including not just current functionality, but (I would hope for anyone putting much time and energy into a scholarly project) a plan for future accessibility of the data--humanities projects have, or should have, a long time horizon of usabilty, whatever the publication mode.
From the latter point of view as much as or more than the former, and because we are recording a variety of glyphs in our manuscript that are not (currently?) in Unicode, I have taken the tack in the transcription part of my current project of recording every manuscript glyph using an XML entity (here I simplify slightly). It's easy (here I simplify a bit more) to replace these entities on the fly with Unicode characters, transliterations into ASCII with editorial markup, or whatever, and they're reasonably convenient to deal with in writing things like search routines. Make the raw file look like doggy doo-doo, of course:
Murray McGillivray
I absolutely hate workarounds like using images or substituting "3" for yogh. Like James, I like UTF-8 for web pages. It makes for smaller files/faster downloads than entities, and IE (always the problem child) seems to handle UTF-8 better than it does entities for some reason. But for the "base" (presumably XML) file the considerations are different. Don't want to get into a long discussion, but perhaps the most important is not to require particular fonts or applications. It's hard to say what's best for the long haul, but Murray's solution seems very good to me. I'll just add that the Medieval Unicode Font Initiative (http://helmer.aksis.uib.no/mufi/) suggests entity names for all the Unicode characters it recognizes as useful for medievalists, and also for all the code points is claims in the Private Use Area.
And Murray--if you have questions about whether particular characters are present in the Unicode standard I may be able to help.
Peter
Murray McGillivray wrote:
James Cummings wrote:
So what is best? Obviously encoding your webpages as (say) UTF-8 is a good start. Force user to download a font for your site with appropriate glyphs? Use images of the glyphs instead of actual characters(*shudder*)? Transliterate into ascii characters/editorial marks? Use markup to allow easy replacement of different solutions on the fly?
I have my own preferences but am interested in what other people have done.
Many factors may come into play in this decision, including not just current functionality, but (I would hope for anyone putting much time and energy into a scholarly project) a plan for future accessibility of the data--humanities projects have, or should have, a long time horizon of usabilty, whatever the publication mode.
From the latter point of view as much as or more than the former, and because we are recording a variety of glyphs in our manuscript that are not (currently?) in Unicode, I have taken the tack in the transcription part of my current project of recording every manuscript glyph using an XML entity (here I simplify slightly). It's easy (here I simplify a bit more) to replace these entities on the fly with Unicode characters, transliterations into ASCII with editorial markup, or whatever, and they're reasonably convenient to deal with in writing things like search routines. Make the raw file look like doggy doo-doo, of course:
http://www.gawain-ms.ca
Murray McGillivray
dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l
Hi all,
My preference would be for alternative XSL renderings; one would produce a pure Unicode text for those with adequate fonts, and an alternative transformation would identify characters not covered by any of the fonts commonly distributed with most OSes, and either substitute something at least usable (such as a plain s for historical long s, allowing readability) or some sign such as [s], with an explanation of its signification at the beginning of the text.
Cheers, Martin
At 02:49 AM 24/06/2004, you wrote:
So what is best? Obviously encoding your webpages as (say) UTF-8 is a good start. Force user to download a font for your site with appropriate glyphs? Use images of the glyphs instead of actual characters(*shudder*)? Transliterate into ascii characters/editorial marks? Use markup to allow easy replacement of different solutions on the fly?
I have my own preferences but am interested in what other people have done.
______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com