Reposting for Roberto Rosselli Del Turco:
Il mer, 2004-06-23 alle 04:19, Daniel O'Donnell ha scritto:
For those who don't know what I am talking about: XSLT (eXtensible Stylesheet Language-Tranformations) is a stylesheet language used (amongst other things) for converting documents written in XML (eXtensible Markup Language) such as used by the TEI (Text Encoding Initiative) into HTML for display on the web.
Once the web site is up, we should put all these acronyms into a FAQ (Frequently Asked Questions) file. Does anybody know of a good existing list of acronyms for beginners?
Apart from good old Acronym Finder (http://www.acronymfinder.com/), here are some similar sites (in the sense that you don't have a listing, you have to look for the term you want explained):
http://www.techweb.com/encyclopedia/, http://foldoc.doc.ic.ac.uk/foldoc/index.html http://www.pcwebopaedia.com/
This is a long list of acronyms, but not fully up to date and subject to quite restrictive terms of use:
http://www.comadvantage.com/babel.html
Ciao
-- Roberto Rosselli Del Turco roberto.rossellidelturco at unito.it Dipartimento di Scienze rosselli at ling.unipi.it del Linguaggio Then spoke the thunder DA Universita' di Torino Datta: what have we given? (TSE) Hige sceal the heardra, heorte the cenre, mod sceal the mare, the ure maegen litlath. (Maldon 312-3)
On Tue, 22 Jun 2004, Daniel O'Donnell wrote:
Does anybody have experience using the recent releases of MSXML? I'm wondering in particular if their xslt is standards compliant. I understand from the XML bible http://www.ibiblio.org/xml/books/bible2/chapters/ch17.html that it is or was not. Microsoft's own pages make it sound like the latest versions might be
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/xmmscXMLOverview.asp,
but it can be difficult to trust them sometimes.
The standard MSXML XSLT FAQ given by people is: http://www.netcrucible.com/xslt/msxml-faq.htm
However, that is not very up to date.
I ask because a neat tool (Textpipe http://www.crystalsoftware.com.au/textpipe.html) I use has now built-in XSLT, but depends on MSXML.
Any interesting product. For that price I'd have expected a way to choose what parser it uses, but I'm assuming it is a .NET program which makes that more difficult. There is no reason to not use that for the types of conversions it uses and a sensible parser like saxon ( http://saxon.sf.net ) for real transformations.
For those who don't know what I am talking about: XSLT (eXtensible Stylesheet Language-Tranformations) is a stylesheet language used (amongst other things) for converting documents written in XML (eXtensible Markup Language) such as used by the TEI (Text Encoding Initiative) into HTML for display on the web.
Or into other XML structures, or plain text. :-)
Once the web site is up, we should put all these acronyms into a FAQ (Frequently Asked Questions) file. Does anybody know of a good existing list of acronyms for beginners?
http://www.acronymfinder.com/ as already mentioned is good in that it has lots, but doesn't give much explanation.
http://www.dictionary.com/ has less acronyms (though all the ones you mentioned) but usually links to better definitions.
Thanks for setting up the list,
-James
Well, let us start off with a little minor controversy.
In the midst of converts to and enthusiasts for XSLT and that family of tools, here is my two pennysworth. I suggest that a serious and full-scale electronic edition of a typical medieval work, with the (now!) standard requirement that it integrate text transcription/edition and images, to a standard satisfactory for a scholarly user, cannot be made by these tools from an XML base.
There are two reasons for this. The first reason is that it seems to me the fundamental requirement of such an edition is that it should present a single page of a manuscript transciption alongside a single manuscript image (or, in variants of this, a single column alongside the image, etc). Given the standard XML architecture of these editions as these have evolved, whereby textual divisions are set in the content of elements but pages are marked with empty anchor elements (eg <pb/>) this is just what XSLT etc find very tricky indeed. If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
The second reason is to do with the nature of the XSLT programming language and the kind of things we want to do with our displays, even in situations where the problem of overlapping hierarchies does not hit us. Take a single word in (for example) a line of transcription of a manuscript of the Miller's Tale. A reader might think: I would like to see what any or all other manuscripts have at this word; I want to know whether there is an editorial comment on the readings at this point; I would like to see how the pattern of readings at this point maps against the overall pattern of relationships among the manuscripts; I would like a lot of this information held within the display so that just passing the mouse over the word will pop up some of it. And I want this for every word in every manuscript, and I want all this generated real fast for each page as I am impatient, and I want quite a few other things too. Typically, this information is scattered right across many different XML source files. It all has to be fetched, amalgamated, sorted, served up for say some five hundred words on a typical manuscript page, all in a microsecond. And also, for the programmer: many things could go wrong in here, with all the conditional tests which need to be made at each point and all the possible branchings the program might have to take to cope with the messiness of manuscript life, so the programmer needs a responsive and transparent programming environment, where it is easy to diagnose what is going wrong, where, as the displays are built. I sure would hate to try to do this in XSLT etc. While XML is fine for many things, it does not look a great environment for programming to me.
The question is germane because it now seems that a lot of effort is going in to persuading humanities scholars, like us, that: A. we put all our data into XML, preferable the TEI variety B. we use XML programming tools like XSLT to get it to the reader I think the first proposition is unquestionably right: that battle has been won. But XML's victory in the first does not mean that XML is the right answer for the second. Indeed, I don't think it is.
So, over to you all. I have set people this challenge before but here it is again: someone, try to duplicate a typical single page say of our Hengwrt Digital Facsimile from our XML source. And good luck to you.
All the best Peter Robinson
Hi there,
At 08:49 AM 23/06/2004, you wrote:
If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
I've done a fairly simple transformation of Shakespeare plays which includes both Act / Scene divisions and pagebreaks, and if MS page images had been part of the project, it would have been easy to drop them in next to the appropriate pages.
Cheers, Martin
______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com
At 08:49 AM 23/06/2004, Peter Robinson wrote:
If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
I have not worked with XSLT, so this might seem like a naive question, but is it possible to write a bridge program to copy text from a specific starting tag to an ending tag, create a new file for just that text, and then display that file next to the MS image?
~ Martin Foys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin K. Foys Assistant Professor Department of English Hood College Frederick, MD 21701
vox: 301~696~3740 fax: 301~696~3586 ether: foys@hood.edu
Bayeux Tapestry Digital Edition: http://www.sd-editions.com
Transforming text from one presentation to another is a key rationale for XSLT.
The problem Peter is mentioning is a core one in structural markup languages, though it always seems trivial at first glance. That is to say, you can't markup the same stretch of text using two competing hierarchies. To give an example. Imagine a print text like the following, where dotted lines are page boundaries:
Chapter 1 Blah blah blah... ...blah blah blah. 1 -------------------- Blah blah blah.. ...blah blah blah.
Chapter 2 Blah blah blah... ...blah blah blah 2 --------------------- blah blah blah etc. 3
In marking this up, I have two structural choices: I can treat the textual division (chapter, paragraph, sentences, etc.) as my major hierarchy or I can treat the physical layout of the document (page divisions) as my hierarchy. These would produce the following markup: (textual division as hierarchy) <chapter n="1"> <p>Blah...<pageBreak n="1"/> blah...</p> </chapter> <chapter n="2"> <p>Blah...<pageBreak n="2"/> blah...</p> </chapter>
or (Layout as structure: this is unsual; I'm guessing this is how it might go) <page n="1"> <titleBegin />Chapter 1 <paragraphBegin />Blah blah </page> <page n="2"> etc.
The first option is the more usual structure. Peter's point is that this makes it difficult to tie page images to running text, since page images seem naturally aligned to the second encoding. Using XSLT, it should be possible to transform encoding 1 to encoding 2, however, since you could use the page break tags as begin and end tags. The main problem comes with all the other elements that are rhetorical rather than layout based: in going from encoding one to encoding two, since paragraphs extent across page boundaries, you can't keep them as containing elements; same would be true of sentences, and, if words are allowed to hyphenate across page boundaries, words as well. As we've already heard, there are several proposals for getting around the problem which can be suprisingly difficult.
-dan Martin K. Foys wrote:
At 08:49 AM 23/06/2004, Peter Robinson wrote:
If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
I have not worked with XSLT, so this might seem like a naive question, but is it possible to write a bridge program to copy text from a specific starting tag to an ending tag, create a new file for just that text, and then display that file next to the MS image? ~ Martin Foys
Martin K. Foys Assistant Professor Department of English Hood College Frederick, MD 21701 vox: 301~696~3740 fax: 301~696~3586 ether: foys@hood.edu Bayeux Tapestry Digital Edition: http://www.sd-editions.com _______________________________________________ dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l
The task Martin wants to perform is exactly the kind of thing that XSLT is good at and could do easily.
I also see Peter's point, echoed here. Until we have a markup language that handles overlapping hierarchies natively, all proposed solutions will be annoying in some way. Still, I have a higher opinion of the abilities of XSLT than Peter does (I've been using it in font development, of all things, so I look at it as pretty flexible), and I'll bet it could handle the XML workarounds for overlapping hierarchies such as the one proposed in the paper by Alexander Czmiel linked to by James Cummings.
But if a thing has got to perform fast, it might be best to use the XSLT to create static pages or a database, or turn data from a database into pages, rather than have it dynamically create pages from a large underlying text.
Peter Baker
Martin K. Foys wrote:
At 08:49 AM 23/06/2004, Peter Robinson wrote:
If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
I have not worked with XSLT, so this might seem like a naive question, but is it possible to write a bridge program to copy text from a specific starting tag to an ending tag, create a new file for just that text, and then display that file next to the MS image? ~ Martin Foys
Martin K. Foys Assistant Professor Department of English Hood College Frederick, MD 21701 vox: 301~696~3740 fax: 301~696~3586 ether: foys@hood.edu Bayeux Tapestry Digital Edition: http://www.sd-editions.com _______________________________________________ dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l
...and the URL would be this:
http://web.uvic.ca/lancenrd/shakespeare/
I forgot to include it!
The transformations are done client-side on the fly, so you'll need a browser such as Moz, Firefox or IE that has XSLT built in.
Cheers, Martin
At 09:06 AM 23/06/2004, you wrote:
Hi there,
At 08:49 AM 23/06/2004, you wrote:
If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
I've done a fairly simple transformation of Shakespeare plays which includes both Act / Scene divisions and pagebreaks, and if MS page images had been part of the project, it would have been easy to drop them in next to the appropriate pages.
Cheers, Martin
Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com
dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l
______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com
Peter, darling, you are talking nonsense. If you want a system based on words, mark up the words and XSLT will cope very well. (Yes, people do it: cf the BNC and many others). Likewise, if you want a system in which physical hierarchies matter, reflect that in your markup system.
There are no good or bad programming languages: just good or bad programmers. And designers.
Lou
Peter Robinson wrote:
Well, let us start off with a little minor controversy.
In the midst of converts to and enthusiasts for XSLT and that family of tools, here is my two pennysworth. I suggest that a serious and full-scale electronic edition of a typical medieval work, with the (now!) standard requirement that it integrate text transcription/edition and images, to a standard satisfactory for a scholarly user, cannot be made by these tools from an XML base.
There are two reasons for this. The first reason is that it seems to me the fundamental requirement of such an edition is that it should present a single page of a manuscript transciption alongside a single manuscript image (or, in variants of this, a single column alongside the image, etc). Given the standard XML architecture of these editions as these have evolved, whereby textual divisions are set in the content of elements but pages are marked with empty anchor elements (eg <pb/>) this is just what XSLT etc find very tricky indeed. If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools. The problem here is our old bugbear overlapping hierarchies, and XSLT etc just don't have any easy answer to this -- and maybe no reliable answer at all.
The second reason is to do with the nature of the XSLT programming language and the kind of things we want to do with our displays, even in situations where the problem of overlapping hierarchies does not hit us. Take a single word in (for example) a line of transcription of a manuscript of the Miller's Tale. A reader might think: I would like to see what any or all other manuscripts have at this word; I want to know whether there is an editorial comment on the readings at this point; I would like to see how the pattern of readings at this point maps against the overall pattern of relationships among the manuscripts; I would like a lot of this information held within the display so that just passing the mouse over the word will pop up some of it. And I want this for every word in every manuscript, and I want all this generated real fast for each page as I am impatient, and I want quite a few other things too. Typically, this information is scattered right across many different XML source files. It all has to be fetched, amalgamated, sorted, served up for say some five hundred words on a typical manuscript page, all in a microsecond. And also, for the programmer: many things could go wrong in here, with all the conditional tests which need to be made at each point and all the possible branchings the program might have to take to cope with the messiness of manuscript life, so the programmer needs a responsive and transparent programming environment, where it is easy to diagnose what is going wrong, where, as the displays are built. I sure would hate to try to do this in XSLT etc. While XML is fine for many things, it does not look a great environment for programming to me.
The question is germane because it now seems that a lot of effort is going in to persuading humanities scholars, like us, that: A. we put all our data into XML, preferable the TEI variety B. we use XML programming tools like XSLT to get it to the reader I think the first proposition is unquestionably right: that battle has been won. But XML's victory in the first does not mean that XML is the right answer for the second. Indeed, I don't think it is.
So, over to you all. I have set people this challenge before but here it is again: someone, try to duplicate a typical single page say of our Hengwrt Digital Facsimile from our XML source. And good luck to you.
All the best Peter Robinson
On Wed, 23 Jun 2004, Lou Burnard wrote:
Peter, darling, you are talking nonsense. If you want a system based on words, mark up the words and XSLT will cope very well. (Yes, people do it: cf the BNC and many others). Likewise, if you want a system in which physical hierarchies matter, reflect that in your markup system.
There are no good or bad programming languages: just good or bad programmers. And designers.
This may be a germane time to mention the lovely demo of an implementation of XML For Overlapping Structures I saw at the ALLC-ACH. The abstract for the paper is here:
http://www.hum.gu.se/allcach2004/AP/html/prop104.html
Although it is using a non XML data model (LMNL) it was able to produce well-formed XML. The demo involved a java client highlighting from one random bit of xml to another random bit of xml and providing an annotation, and then overlapping part of this annotation with another one, etc. (Sounds like a good teaching tool frankly, mark students on their annotations, without them needing to know XML).
The resulting XML followed a milestone principle, but used child elements in the 'start' tag to contain the annotation.
Not exactly the issue Peter and Lou are debating but might make interesting reading for a related issue.
-James --- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at ota dot ahds dot ac dot uk
I think Peter's point is a very good one, though; in XML, one must prioritize one hierarchy over another where they overlap; if the conceptual structure is prioritized, the physical must be relegated to milestones.
I think there is a long-term solution to this. I think multiple markups can exist on the same text, as long as they're in "different dimensions"; it's hard to explain, but imagine the text as a string going through space, and the markups as planes radiating out from it. Each markup would have to be keyed to positions in the text, rather than embedded directly in it, so instead of tags inside text you'd have tags which know which character position they belong in. Then you could combine two hierarchies in the manner which best suits your current purpose. For example, you might have a physical and a conceptual markup both keyed into the same text; either markup could be rendered as needed, and they could be combined using algorithms which generated well-formed output, prioritizing either one or the other, and reducing the second to milestones or divided tags linked by idrefs. In a system like this, each markup "dimension" would be much simpler and easier to manage. There are two major difficulties, though; one is that if the text itself is changed, all the markup pointers need to be updated to reflect the shift in character positions, and the other is that the combinatorial algorithms may need to be quite sophisticated, especially if more than two markup dimensions were to be rendered simultaneously.
I'm sure someone must have thought of this before, though -- does anyone have references to proposals like this?
Cheers, Martin
At 09:20 AM 23/06/2004, you wrote:
Peter, darling, you are talking nonsense. If you want a system based on words, mark up the words and XSLT will cope very well. (Yes, people do it: cf the BNC and many others). Likewise, if you want a system in which physical hierarchies matter, reflect that in your markup system.
______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com
Martin Holmes wrote:
I think Peter's point is a very good one, though; in XML, one must prioritize one hierarchy over another where they overlap; if the conceptual structure is prioritized, the physical must be relegated to milestones.
I think there is a long-term solution to this. I think multiple markups can exist on the same text, as long as they're in "different dimensions"; it's hard to explain, but imagine the text as a string going through space, and the markups as planes radiating out from it.
[... snip ...]
That's right. You can have multiple annotations on the same XML structure. This technique, now generally known as "stand-off" annotation, is currently quite popular in the field of linguistics, to handle things like prosodic vs syntactic hierarchies. You segment your text once, and then you create as many different views of it as you like. Each view constitutes a single hierarchy, and is composed of pointing elements which identify how the fragments of your segmented texts are to be combined in that particular hierarchy. Something like it is described in the chapter of the TEI Guidelines which addresses the issue of non-hierarchic structures.
Your Shakespeare is pretty neat, by the way! How does one get to see the XML source though?
Lou
Hi there,
At 10:55 AM 23/06/2004, you wrote:
Content-Transfer-Encoding: 7bit
Martin Holmes wrote:
I think Peter's point is a very good one, though; in XML, one must prioritize one hierarchy over another where they overlap; if the conceptual structure is prioritized, the physical must be relegated to milestones. I think there is a long-term solution to this. I think multiple markups can exist on the same text, as long as they're in "different dimensions"; it's hard to explain, but imagine the text as a string going through space, and the markups as planes radiating out from it.
[... snip ...]
That's right. You can have multiple annotations on the same XML structure. This technique, now generally known as "stand-off" annotation, is currently quite popular in the field of linguistics, to handle things like prosodic vs syntactic hierarchies. You segment your text once, and then you create as many different views of it as you like. Each view constitutes a single hierarchy, and is composed of pointing elements which identify how the fragments of your segmented texts are to be combined in that particular hierarchy. Something like it is described in the chapter of the TEI Guidelines which addresses the issue of non-hierarchic structures.
Sounds like exactly what I was thinking of. Do you know what kind of markup tools are used to edit texts in this way?
Your Shakespeare is pretty neat, by the way! How does one get to see the XML source though?
If you "View Source", you'll see something like this:
<docroot href="AWW_F_ISE.xml"></docroot>
where the href is the actual markup document. I use the "wrapper document" as a way to avoid hard-coding the XSL stylesheet into the original markup. The markup in this case is a proprietary system created by Mike Best, the prof involved, but TEI versions of the markup have subsequently been generated from it.
Cheers, Martin
______________________________________ Martin Holmes University of Victoria Humanities Computing and Media Centre mholmes@uvic.ca martin@mholmes.com mholmes@halfbakedsoftware.com http://www.mholmes.com http://web.uvic.ca/hcmc/ http://www.halfbakedsoftware.com
Given the standard XML architecture of these editions as these have evolved, whereby textual divisions are set in the content of elements but pages are marked with empty anchor elements (eg <pb/>) this is just what XSLT etc find very tricky indeed. If you can do it (and I have not yet seen this done, though I have heard lengthy explanations of how it *might* be done) you can only do it with great difficulty with the standard tools.
At the Constantijn Huijgens Instituut we have done this (more or less). Our edition of the 'Geraardsbergse handschrift' (Geraardsbergen Miscellany) is based on a TEI encoded XML file where text structure is used to define the hierarchy, and pages are encoded as pb-elements (milestones). Nevertheless, we present our edition on a page by page basis.
The edition is available at http://www.chi.knaw.nl/epub/mvn/geraardsbergen/index.html. I'm afraid everything is in Dutch right now, but from the 'Inleiding' (introduction) option, the xml and xslt source files are available. The edition pages can be accessed from the 'editie' (edition) or 'inhoud' (contents) option. On some pages (e.g. http://www.chi.knaw.nl/epub/mvn/geraardsbergen/f109r.html), there's also a facsimile.
(We'll create full facsimile-cum-transcription editions in the future; this one is a book based publication, converted into digital form recently).
In the present edition, performance is not an issue, as the pages contain static html. If I were to write this stylesheet again, in the interest of performance and clarity, I would separate the milestone handling from producing the final output. I'd use an intermediate stylesheet to transform the milestone elements into full page elements at the highest level of the hierarchy. These page elements would contain those segments of the text, div, p (etc.) elements that the associated page contains. The resulting XML file, would, I'm sure, perform beautifully in a dynamic environment.
Peter