[Fwd: The time has come to make some <choice>s]

List overview All Threads
Download

newer

older

Hacks: an example.

palaeography and computing

Daniel O'Donnell

28 Jun 2004 28 Jun '04

6:21 p.m.

Hello all, I realise that many of us are on the TEI list and hence have seen this; it is an important question, however, and I thought I would distribute it here as well. If you are just beginning in computer applications, the issue is fairly advanced (though it has already come up), and you're not likely to write a position paper on it in the next 30 days; I hope in this one instance, you'll forgive me for not explaining it unless asked. -dan

-------- Original Message -------- Subject: The time has come to make some s Date: Mon, 28 Jun 2004 19:20:35 -0400 From: Syd Bauman Syd_Bauman@BROWN.EDU Reply-To: editors@tei-c.org To: TEI-L@LISTSERV.BROWN.EDU

CALL FOR VOLUNTEERS TO INFLUENCE P5! ---- --- ---------- -- --------- ---

Over the last year there has been some public discussion of the "replacement Janus" mechanism for P5. The quick overview is that, in order to avoid putting content (which might need markup) in attribute values (which can't have markup), it has been suggested that we change the so-called Janus tags from things like

<corr sic="adnire">admire</corr> <orig reg="w">vv</orig>ith <orig reg="w">vv</orig>hat charity the <abbr expan="Text Encoding Initiative Consortium">tei-c</abbr>

to things like

<choice> <sic>adnire</sic> <corr>admire</corr> </choice> <choice><orig>vv</orig><reg>w</reg></choice>ith <choice> <reg>w</reg> <orig>vv</orig> </choice>hat charity the <choice src="abbr"> <abbr>tei-c</abbr> <expan>Text Encoding Initiative Consortium</expan> </choice>

The TEI Council would like to enlist the assistance of interested parties in helping them consider the issues involved. Therefore, this is a call for volunteers to write up a short working paper on the issues surrounding the use of the new mechanism and of the previous attribute mechanism. (Note that some have referred to the new mechanism as the "choice" mechanism, but the name of the element used in the examples above is not carved in stone, and is one issue the authors of this working paper may wish to address.)

Current thinking is that a first draft of the paper should be ready by Sat 07 Aug, and the final version should be ready on Thu 16 Sep.

Interested volunteers should reply to editors@tei-c.org.

-- Daniel Paul O'Donnell, PhD Associate Professor of English University of Lethbridge Lethbridge AB T1K 3M4 Tel. (403) 329-2377 Fax. (403) 382-7191 E-mail daniel.odonnell@uleth.ca Home Page http://people.uleth.ca/~daniel.odonnell/

Show replies by date

Patrick Sahle

29 Jun 29 Jun

3:41 p.m.

New subject: The time has come to make some <choice>s

...

The TEI Council would like to enlist the assistance of interested parties in helping them consider the issues involved. Therefore, this is a call for volunteers to write up a short working paper on the issues surrounding the use of the new mechanism and of the previous attribute mechanism.

Is there really that much to say about it? There are two problems with the old practice (the not-Janus-practice) of which one has already been mentioned:

1.) You cannot tag attribute values any further. So you limit your own possibilities of sophisticated (that is: adequate) information (re-)coding.

2.) The point which hasn't been mentioned (on this list - I'm not subscribed to the TEI-List). The old practice is simply to supplant one theory (or notion) of text with another theory (or notion) of text. You had (for example) to decide wether to believe that text is a material object (a document --> <sic>) or an abstract entity (a linguistic event, a communication intention --> <corr>). Now - with the Janus practice - there's just pluralistic democracy of various theories / notions (or an integrative theory) of "text".

Well, there could be a third area of discussion: What is the status of an electronic text and what does the borderline between "text" and "markup" really mean (for this case)? The problem here could be, that - if you believe in the ontological discrimination of "text" and "markup" - you seem to double the portion of "text" in question. But this is no question of practical relevance and only leads to a philosophical sophistry which maybe should better be left to an even more specialised debate (and my forthcoming PhD-thesis ;-)) ...

cheers, patrick ___________________________________________________________________ Universität zu Köln Historisch-Kulturwissenschaftliche Informationsverarbeitung Albertus-Magnus-Platz 50923 Koeln http://www.hki.uni-koeln.de/

Privat: Häuschensweg 2a 50827 Köln 0049 - (0)221 - 2805695 Sahle@uni-koeln.de http://www.uni-koeln.de/~ahz26/

Martin Holmes

4:06 p.m.

New subject: The time has come to make some <choice>s

Hi there,

At 02:41 PM 29/06/2004, you wrote:

...

Well, there could be a third area of discussion: What is the status of an electronic text and what does the borderline between "text" and "markup" really mean (for this case)? The problem here could be, that - if you believe in the ontological discrimination of "text" and "markup" - you seem to double the portion of "text" in question. But this is no question of practical relevance and only leads to a philosophical sophistry which maybe should better be left to an even more specialised debate (and my forthcoming PhD-thesis ;-)) ...

This is a fascinating topic. I'd argue that markup and its content is just data; "texts" are generated from markup using specific transformations for specific audiences or purposes. Given this:

<corr>Martin</corr><sic>Marnit</sic>

one "text" might show "Martin" with a mouseover popup indicating the misspelling in the original source, and another might show "Marnit" with a mouseover explaining the assumed correct form. The differences embody editorial approaches and purposes, and it's these that give birth to texts. The markup merely strives after completeness and transparency.

My 2 cents, Martin

______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com

James Cummings

4:52 p.m.

New subject: The time has come to make some <choice>s

On Tue, 29 Jun 2004, Martin Holmes wrote:

...

Hi there,

At 02:41 PM 29/06/2004, you wrote:

...
Well, there could be a third area of discussion: What is the status of an electronic text and what does the borderline between "text" and "markup" really mean (for this case)? The problem here could be, that - if you believe in the ontological discrimination of "text" and "markup" - you seem to double the portion of "text" in question. But this is no question of practical relevance and only leads to a philosophical sophistry which maybe should better be left to an even more specialised debate (and my forthcoming PhD-thesis ;-)) ...

This is a fascinating topic. I'd argue that markup and its content is just data; "texts" are generated from markup using specific transformations for specific audiences or purposes. Given this:

<corr>Martin</corr><sic>Marnit</sic>

one "text" might show "Martin" with a mouseover popup indicating the misspelling in the original source, and another might show "Marnit" with a mouseover explaining the assumed correct form. The differences embody editorial approaches and purposes, and it's these that give birth to texts. The markup merely strives after completeness and transparency.

There is the feeling that many have that once you strip away the markup, you should be left with a bare version of 'the text'. They argue that all such alternatives should be stored in attributes (even with the aforementioned problems) in order to separate the interpretation from the text. But the problems with this are legion. Aside from the obvious need for markup inside these interpretative readings, the choice of markup itself is, of course, an interpretation of structure that they are imposing on the text. (That way lies overlapping hierarchy discussion again...) Moreover, the same process of stripping away the markup to reveal 'the text' is still simplistically possible, it is just that the act of 'stripping' in this case doesn't mean 'remove the tags' but instead process them so that 'the text' is the result. Whether 'the text' is with corrections made, abbreviations expanded, spelling regularised, or any of the other possible applications of this <choice> type of encoding is a decision that is made at the point of processing. But, I'm sure you know all this and I'm preaching to the converted. ;-)

-James

--- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk

Patrick Sahle

30 Jun 30 Jun

3:41 a.m.

New subject: The time has come to make some <choice>s

I find both positions (James Cummings, Martin Holmes) very close to each other and to my own point of view. Would you agree if I describe them as some sort of modern textual materialism, where the 'text' always is something which is 'medialised' - which is an (digital) document rather than an abstract idea? And on the other hand we would have 'data' containing all the representational (and interpretative) information we have about a 'text' - I would call this a 'transmedial' text. As I would call the new media revolution a process of 'transmedialisation'.

patrick

[...]

...

...
At 02:41 PM 29/06/2004, you wrote:

...
Well, there could be a third area of discussion: What is the status of an electronic text and what does the borderline between "text" and "markup" really mean (for this case)? The problem here could be, that - if you believe in the ontological discrimination of "text" and "markup" - you

seem

...

...
...
to double the portion of "text" in question. But this is no question of practical relevance and only leads to a philosophical sophistry which

maybe

...

...
...
should better be left to an even more specialised debate (and my forthcoming PhD-thesis ;-)) ...

This is a fascinating topic. I'd argue that markup and its content is just data; "texts" are generated from markup using specific transformations for specific audiences or purposes. Given this:

<corr>Martin</corr><sic>Marnit</sic>

one "text" might show "Martin" with a mouseover popup indicating the misspelling in the original source, and another might show "Marnit" with a mouseover explaining the assumed correct form. The differences embody editorial approaches and purposes, and it's these that give birth to texts. The markup merely strives after completeness and transparency.

There is the feeling that many have that once you strip away the markup, you should be left with a bare version of 'the text'. They argue that all such alternatives should be stored in attributes (even with the aforementioned problems) in order to separate the interpretation from the text. But the problems with this are legion. Aside from the obvious need for markup inside these interpretative readings, the choice of markup itself is, of course, an interpretation of structure that they are imposing on the text. (That way lies overlapping hierarchy discussion again...) Moreover, the same process of stripping away the markup to reveal 'the text' is still simplistically possible, it is just that the act of 'stripping' in this case doesn't mean 'remove the tags' but instead process them so that 'the text' is the result. Whether 'the text' is with corrections made, abbreviations expanded, spelling regularised, or any of the other possible applications of this <choice> type of encoding is a decision that is made at the point of processing. But, I'm sure you know all this and I'm preaching to the converted. ;-)

-James

___________________________________________________________________ Universität zu Köln Historisch-Kulturwissenschaftliche Informationsverarbeitung Albertus-Magnus-Platz 50923 Koeln http://www.hki.uni-koeln.de/

Privat: Häuschensweg 2a 50827 Köln 0049 - (0)221 - 2805695 Sahle@uni-koeln.de http://www.uni-koeln.de/~ahz26/

Martin Holmes

9:22 a.m.

New subject: The time has come to make some <choice>s

Hi all,

At 02:41 AM 30/06/2004, you wrote:

...

I find both positions (James Cummings, Martin Holmes) very close to each other and to my own point of view. Would you agree if I describe them as some sort of modern textual materialism, where the 'text' always is something which is 'medialised' - which is an (digital) document rather than an abstract idea? And on the other hand we would have 'data' containing all the representational (and interpretative) information we have about a 'text' - I would call this a 'transmedial' text. As I would call the new media revolution a process of 'transmedialisation'.

I like this terminology. I think also there will be situations where one person's markup, which they considered to be objective and transparent, in the future becomes a text which is the subject of someone else's markup. We could envisage a time when early markup projects become the subject of scholarly investigation, and both their source markup and their various textual renderings (medialisations?) are themselves marked up. Re-reading my own comment "The markup merely strives after completeness and transparency", I guess that's a pretty big "merely".

Cheers, Martin

James Cummings

12:56 p.m.

New subject: The time has come to make some <choice>s

On Wed, 30 Jun 2004, Martin Holmes wrote:

...

Hi all,

I like this terminology. I think also there will be situations where one person's markup, which they considered to be objective and transparent, in the future becomes a text which is the subject of someone else's markup. We could envisage a time when early markup projects become the subject of scholarly investigation, and both their source markup and their various textual renderings (medialisations?) are themselves marked up. Re-reading my own comment "The markup merely strives after completeness and transparency", I guess that's a pretty big "merely".

We don't need to "envisage a time when early markup projects become the subject of scholarly investigation", there are those who study just this. The OTA has been collecting electronic texts for nearly 30 years... very few of them are actually in XML. The study of earlier markup forms (like say COCOA) is undertaken by those looking at the history of humanities computing. I gave a paper at the ALLC-ACH this year looking at the problems the OTA might have in migrating copies of its resources to XML. (A big undertaking...) A lot of the pilot study for this involved the identification and scoping of what form of markup those resources listed as "unknown markup" really were in.

All of this goes to a point (believe it or not), that the documenation of your chosen encoding scheme and why you've put $ before all the adjectives (or whatnot) is one of the most important aspects for its preservation.

-James

--- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk

Martin Holmes

2:28 p.m.

New subject: The time has come to make some <choice>s

Hi James,

Are you actually marking up the markup? For example, are you taking an SGML text and marking it up (SGML code included) in TEI, or are you mainly concerned with converting prior markup formats to current ones? I don't believe I've ever seen one tag-based markup system marked up in another.

Cheers, Martin

At 11:56 AM 30/06/2004, you wrote:

...

On Wed, 30 Jun 2004, Martin Holmes wrote:

...
Hi all,

I like this terminology. I think also there will be situations where one person's markup, which they considered to be objective and transparent,

in the

...
future becomes a text which is the subject of someone else's markup. We

could

...
envisage a time when early markup projects become the subject of scholarly investigation, and both their source markup and their various textual renderings (medialisations?) are themselves marked up. Re-reading my own comment "The markup merely strives after completeness and transparency", I guess that's a pretty big "merely".

We don't need to "envisage a time when early markup projects become the subject of scholarly investigation", there are those who study just this. The OTA has been collecting electronic texts for nearly 30 years... very few of them are actually in XML. The study of earlier markup forms (like say COCOA) is undertaken by those looking at the history of humanities computing. I gave a paper at the ALLC-ACH this year looking at the problems the OTA might have in migrating copies of its resources to XML. (A big undertaking...) A lot of the pilot study for this involved the identification and scoping of what form of markup those resources listed as "unknown markup" really were in.

All of this goes to a point (believe it or not), that the documenation of your chosen encoding scheme and why you've put $ before all the adjectives (or whatnot) is one of the most important aspects for its preservation.

-James

Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk

dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l

James Cummings

1 Jul 1 Jul

3:16 a.m.

New subject: The time has come to make some <choice>s

On Wed, 30 Jun 2004, Martin Holmes wrote:

...

Hi James,

Are you actually marking up the markup? For example, are you taking an SGML text and marking it up (SGML code included) in TEI, or are you mainly concerned with converting prior markup formats to current ones? I don't believe I've ever seen one tag-based markup system marked up in another.

Our concern is mainly to migrate to a more useable format rather than record the original format in xml. Since we will always keep a copy of the original deposit, we want to preserve the intent behind the markup, not necessarily the markup choices itself. So yes, we are concerned with conversion to more flexible markup schemes (though just having stuff all in the _same_ format would be nice whatever that format was). So with COCOA converting: <SN Scene 2> <SSD Enter Martin Holmes> <Q Martin>No markup is objective or transparent <Q James>Can't disagree with that! ...

<div type="scene"> <head>Scene 2</head> <stage>Enter Martin Holmes</stage> <sp><speaker>Martin</speaker><l>No markup is objective or transparent</l></sp> <sp><speaker>James</speaker><l>Can't disagree with that!</l></sp> ... </div>

This is a fairly neutral change as far as the content goes and I can't seem much data loss in it. Although it is (I think) an accurate reflection of the intention of the original markup, it isn't in any way an XMl version of the COCOA markup.

Part of the problem in conversion comes in identifying the intent behind the original markup in a field that I don't understand. (This is good news for the medieval resources we hold, since I generally understand that field... but I wouldn't trust myself as much with say Buddhist texts.) The only solution I can think of for that is to go ahead and do the conversion, and show the before and after versions to a specialist in that area.

-James

--- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk

7830

Age (days ago)

7832

Last active (days ago)

dm-l@uleth.ca

8 comments

4 participants

tags (0)

participants (4)

Daniel O'Donnell
James Cummings
Martin Holmes
Patrick Sahle