Re: [dm-l] Character reference conversions: help?

5 Oct 2004


      According to a Wiki article 
(http://en.wikipedia.org/wiki/Universal_character_set):
"The Universal Character Set is a character encoding that is defined by the 
international standard ISO/IEC 10646. It maps hundreds of thousands of 
abstract characters, each identified by an unambiguous name, to integers, 
called numeric code points.
Since 1991, the Unicode Consortium has been working with ISO to develop the 
Unicode Standard and ISO/IEC 10646 in tandem. The repertoire, character 
names, and code points of Version 2.0 of the Unicode Standard are identical 
to those of ISO/IEC 10646-1:1993 with its first seven published amendments. 
After Unicode 3.0 was published in February 2000, the new and updated 
characters were brought into the UCS via ISO/IEC 10646-1:2000."
It seems to me (as is implied by the second paragraph above) that entity 
code values in UCS should refer to the same character position as in 
Unicode. So, for example, if &#x1090 is the "Z" character in UCS-4 (I know, 
it's not!) then it should also be the "Z" character in UTF-8 (assuming the 
same font is used). If so, no conversion of numerically encoded entities 
should be needed. Of course, I could be wrong!
Since you ask the question, I assume the transfer isn't working as 
expected. What exactly is happening?
David
At 11:53 AM 05/10/2004, you wrote:
...
Hello all,
        I need some advice on converting Unicode character references. 
Currently, am encoding character references in what I believe is UCS-4 
format (Universal Character Set). This means they look like this in my 
source files:
&#x1090;
I want to import xhtml documents into Open Office, which seems to need 
UTF-8 encoding (I don't know what UTF stands for). Does anybody know of a 
filter that might do the conversions for me? Or have advice on using open 
office (Windows version) with UCS-4 encoding?
David Badke
http://badke.ca/david
http://bestiary.ca

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [dm-l] Character reference conversions: help?