Dear DMs,
Maybe this should be better posted on the TEI list, but I hope some of you could have interesting experience in the field !
I'm encoding an ongoing project in TEI XML, and I'm encountering problems to represent concurrent hierarchies. I'd like to markup some rhetorical elements, that can either be contained in paragraphes (<p> tags), or contain <p> tags, depending on their length and importance. Therefore, I can't use <seg>, for instance.
From a "logical" point of view, it seems to me that <div> tags would be perfect
(with a convenient type), but in this case I would have a problem of concurrent hierarchies (possible overlapping of <div> and <p> elements).
The solutions proposed in the current TEI Guidelines for the handling of multiple hierarchies would work, BUT I'm afraid they would fairly complicate the processing of the encoded texts with standard XML tools.
I thought about customizing the DTD to make <p> tags empty, which would be convenient for my purpose, but the markup wouldn't be TEI compliant any more (as CDATA is not allowed as child of div or body)... In a way, it seems to be the problem too for, eg, the quite seducing HORSE approach recently discussed by S. Bauman reguarding TEI (http://www.mulberrytech.com/Extreme/Proceedings/html/2005/Bauman01/EML2005Ba...) ?
Thanks in advance for your help and experience (especially concerning the efficiency of processing the marked texts) !
Marjorie
Marjorie Burghart wrote:
Dear DMs,
Maybe this should be better posted on the TEI list, but I hope some of you could have interesting experience in the field !
Well, there is a fair degree of overlap between the TEI list and DM, so I'm sure some more qualified people will wade in...
Perhaps some of the members of the Overlap SIG which has a summary of some of the major ways of dealing with overlap on the TEI Wiki...
http://www.tei-c.org/wiki/index.php/SIG:Overlap
I'm encoding an ongoing project in TEI XML, and I'm encountering problems to represent concurrent hierarchies. I'd like to markup some rhetorical elements, that can either be contained in paragraphes (<p> tags), or contain <p> tags, depending on their length and importance. Therefore, I can't use <seg>, for instance. From a "logical" point of view, it seems to me that <div> tags would be perfect (with a convenient type), but in this case I would have a problem of concurrent hierarchies (possible overlapping of <div> and <p> elements).
This is the normal kind of problem people have with XML. However, although there are technical solutions to represent both hierarchies, I've always found it easiest to simply compromise and mark up the structure of one of them as dominant. (Usually for me this means marking page/folio breaks as milestones.) This is the milestone method mentioned in chapter 31 of the guidelines.
The solutions proposed in the current TEI Guidelines for the handling of multiple hierarchies would work, BUT I'm afraid they would fairly complicate the processing of the encoded texts with standard XML tools.
They do. If the methods of avoiding choosing one hierarchy over the other is unacceptably complex, then I don't see many options. Personaly I'd just milestone one of them, or use the fragmentation method based on the @part attribute.
I thought about customizing the DTD to make <p> tags empty, which would be convenient for my purpose, but the markup wouldn't be TEI compliant any more (as CDATA is not allowed as child of div or body)... In a way, it seems to be the problem too for, eg, the quite seducing HORSE approach recently discussed by S. Bauman reguarding TEI (http://www.mulberrytech.com/Extreme/Proceedings/html/2005/Bauman01/EML2005Ba...)
I believe a variant on the HORSE/CLIX is quite likely to appear in the P5 guidelines.
-James
We should also add (I don't think I've seen it) that Dot Porter at Kentucky has been doing a lot with multiple hierarchies in TEI P4. I believe the Kentucky group have a couple of articles on the subject out.
-d
Dan O'Donnell wrote:
We should also add (I don't think I've seen it) that Dot Porter at Kentucky has been doing a lot with multiple hierarchies in TEI P4. I believe the Kentucky group have a couple of articles on the subject out.
-d
Hi Dan and all,
Sorry, I was on vacation last week and have only been going through my inbox this morning.
Marjorie, I believe that the articles listed on the TEI Overlapping Markup SIG website provide a good overview of the various approaches to handling overlapping markup/overlapping hierarchies: http://www.tei-c.org/wiki/index.php/SIG:Overlap
For work done at UK specifically, click down to "Kentucky Goddag." These are very technical papers, and mainly touch on methods for querying concurrent hierarchies rather than how to represent the concurrent hierarchies in markup. The Edition Production Technology, electronic editing software developed at UK by Kevin Kiernan through the Electronic Boethius and ARCHway projects, with continuing development through eBo, expresses overlapping markup through the use of empty elements. It looks a bit like HORSE/CLIX, but instead of only "breaking" the elements when there is an overlap, all elements are expressed as empty tags, so
<p>...</p>
becomes
<p __tag_type="START"/>...<p __tag_type="END"/>
This works well within the context of EPT, but if you're working with oXygen (for example), it would be easier to follow James' advice and select one hierarchy to take precedence, with the other expressed in empty tags. I look forward to HORSE/CLIX's inclusion in P5, it would be great to have a canonical method for handling overlaps.
Thanks, Dot
Dear Marjorie
As James has already pointed out, there are a number of varyingly satisfactory ways of dealing with the inescapable fact that a textual artefact can be regarded as having more than one hierarchic structure. The only issue really is deciding on which one to privilege when representing it in XML.
Your use of the phrase "rhetorical elements" (plus my recollections of a pleasant visit to Lyon some years ago!) suggests to me that you are trying to mark up the rhetorical structure of an existing text. This seems a somewhat different kind of structure from the usual structure of prose paragraphs and divisions, in that it can be determined only by some understanding of the text and also of other texts like it, and also in that it rarely if ever has any kind of associated conventional rendering.
The mechanism suggested by the TEI for this purpose is described, rather obscurely, in the chapter on "Simple Analytic Mechanisms", in particular in section 15.3 on "Spans and Interpretations", where there is an example showing how the narrative structure of an extract from the Poetic Edda might be encoded. Essentially it recommends what is now fashionably know as "standoff" annotation, in which a special kind of pointer element (a "span") is used to indicate a stretch of text and to associate a particular interpretive label with it. Such spans can be organized hierarchically in the same way as other XML elements. The stretches of texts indicated will often be existing "structural" XML elements such as paragraphs or sections, but do not need to: an empty <anchor> (or milestone) element can be inserted at any necessary point to act as a target for a link of this kind.
I would like to revise this particular section of the Guidelines for P5 in order to make it a bit more accessible: it would be great to include an example from Voragine!
best wishes
Lou
Marjorie Burghart wrote:
I'm encoding an ongoing project in TEI XML, and I'm encountering problems to represent concurrent hierarchies. I'd like to markup some rhetorical elements, that can either be contained in paragraphes (<p> tags), or contain <p> tags, depending on their length and importance. Therefore, I can't use <seg>, for instance.
From a "logical" point of view, it seems to me that <div> tags would be perfect
(with a convenient type), but in this case I would have a problem of concurrent hierarchies (possible overlapping of <div> and <p> elements).
The solutions proposed in the current TEI Guidelines for the handling of multiple hierarchies would work, BUT I'm afraid they would fairly complicate the processing of the encoded texts with standard XML tools.
I thought about customizing the DTD to make <p> tags empty, which would be convenient for my purpose, but the markup wouldn't be TEI compliant any more (as CDATA is not allowed as child of div or body)... In a way, it seems to be the problem too for, eg, the quite seducing HORSE approach recently discussed by S. Bauman reguarding TEI (http://www.mulberrytech.com/Extreme/Proceedings/html/2005/Bauman01/EML2005Ba...) ?
Thanks in advance for your help and experience (especially concerning the efficiency of processing the marked texts) !
Marjorie