Hello Dan,
You may use my Open XML Editor ("http://www.philo.de/xmledit/") to transcode XML documents from UCS-4 to UTF-8. The editor currently supports a couple of input formats, but only UTF-8 as output format. Loading the document and using the "save as ..." function should do the job. Be careful: just saving the document will replace the original with a UTF-8 version.
Note also that this is not a simple character by character transformation. Instead, a model of the XML document is internally built and this model is then serialized. This implies that - the transformation might not work if the input XML document is not wellformed, - some additional normalization of the XML document takes place (in particular: whitespace between markup in the XML prolog, i.e. before the first tag, is normalized to line breaks, empty elements are expanded, e.g. <foo/> becomes <foo></foo>, and character references are expanded), - external entities are not transcoded.
A step-by-step explanation in German how the transformation from UCS-4 into UTF-8 works is included in my draft about "Zahlen- und Zeichencodierung", available at "http://philo.de/lv/multimedia/codierung.html". Non-German speakers may find some useful links to English Web-pages about this topic there as well.
Dieter Köhler
At 12:53 05.10.2004 -0600, Daniel O'Donnell wrote:
Digital Medievalist Journal (Inaugural Issue Fall 2004). Call for papers: http://www.digitalmedievalist.org/cfp.htm
Hello all, I need some advice on converting Unicode character references. Currently, am encoding character references in what I believe is UCS-4 format (Universal Character Set). This means they look like this in my source files:
႐
I want to import xhtml documents into Open Office, which seems to need UTF-8 encoding (I don't know what UTF stands for). Does anybody know of a filter that might do the conversions for me? Or have advice on using open office (Windows version) with UCS-4 encoding?
-dan
Daniel Paul O'Donnell, PhD Associate Professor of English University of Lethbridge Lethbridge AB T1K 3M4 Tel. (403) 329-2377 Fax. (403) 382-7191 E-mail daniel.odonnell@uleth.ca Home Page http://people.uleth.ca/~daniel.odonnell/
Project web site: http://www.digitalmedievalist.org/ dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l