On Wed, 30 Jun 2004, Martin Holmes wrote:
Hi James,
Are you actually marking up the markup? For example, are you taking an SGML text and marking it up (SGML code included) in TEI, or are you mainly concerned with converting prior markup formats to current ones? I don't believe I've ever seen one tag-based markup system marked up in another.
Our concern is mainly to migrate to a more useable format rather than record the original format in xml. Since we will always keep a copy of the original deposit, we want to preserve the intent behind the markup, not necessarily the markup choices itself. So yes, we are concerned with conversion to more flexible markup schemes (though just having stuff all in the _same_ format would be nice whatever that format was). So with COCOA converting: <SN Scene 2> <SSD Enter Martin Holmes> <Q Martin>No markup is objective or transparent <Q James>Can't disagree with that! ...
to
<div type="scene"> <head>Scene 2</head> <stage>Enter Martin Holmes</stage> <sp><speaker>Martin</speaker><l>No markup is objective or transparent</l></sp> <sp><speaker>James</speaker><l>Can't disagree with that!</l></sp> ... </div>
This is a fairly neutral change as far as the content goes and I can't seem much data loss in it. Although it is (I think) an accurate reflection of the intention of the original markup, it isn't in any way an XMl version of the COCOA markup.
Part of the problem in conversion comes in identifying the intent behind the original markup in a field that I don't understand. (This is good news for the medieval resources we hold, since I generally understand that field... but I wouldn't trust myself as much with say Buddhist texts.) The only solution I can think of for that is to go ahead and do the conversion, and show the before and after versions to a specialist in that area.
-James
--- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk