RE: [dm-l] what xslt can't do..?

11 Apr 2005


      (With apologies to the list for not trimming quoted messages in this
exchange). I've just about run out of things I wanted to say on this,
but I wanted to respond to a couple of points in Peter R.'s last
posting. I've learned a lot from this exchange and thank all Peters and
others who have participated, in particular Peter R. for his patient and
thoughtful responses.
Peter R. wrote:
...
Still, this is quite a difference from the 'generate static 
files to optimize particular views' approach (actually, the 
cocoon example Peter B gives seems to be such, and so not 
really comparable to what A does).
That is indeed what I intended, and I do see it as comparable to
Anastasia: a relatively slow and complex pre-processing of the source
XML to enable a relatively fast and simple front-end. Whether the
pre-processing involves indexing as Anastasia does or chunking and
transforming as the theoretical Cocoon-based system would do, doesn't
seem to me to be material. They both have to be done, and redone every
time the source changes. And I admit that A's pre-processing will be
much, much faster than the kind of system I'm imagining.
...
Under that, you have to figure out in advance just what 
particular views you want to enable.  And, there can be a 
near infinite number of these.
But once we're dealing with XML files containing a single page's worth
of XML, XSL is back on its home turf, using Xpath to deal with complex
parsing of the XML source. Even if the pre-processing did nothing but
chunk the TEI into pages, the stylesheet to extract the text of a given
witness would be fairly simple. As one of my first XSL projects I hacked
a Cocoon-based system to present critical editions for a text I was
working on; the underlying format isn't TEI (I didn't know any better at
the time and was working with texts I originally transcribed in Nota
Bene) and the XSL is far from optimal or bug-free, but it at least shows
that this can be done. The prototype is here:
http://www.library.ualberta.ca:8080/cocoon/quentin/sp-0prol.html . The
number of witnesses is nowhere near 50, but there's rather more than a
page of text here (about 70k of xml). Set your options in the menu and
click the "Render" button: you can select a base witness, compare two
witnesses in parallel columns with differences highlighted, or generate
an app. crit. containing variants from selected witnesses. A text with
about 50 hexameters and 26 witnesses is here:
http://www.library.ualberta.ca:8080/cocoon/quentin/decmet-page.html ,
about 140k of xml (unfortunately, I apparently never finished adapting
the stylesheet for my clumsy verse format, so the app. crit. isn't
coming out right - must get around to fixing that). This is still way
too slow, but there's lots I know now that I didn't when I last worked
on it, and I'm sure I could improve it. Whether the response time would
ever approach Anastasia's for complex texts I don't know, but I wouldn't
rule it out without trying.
(Now, shall we discuss whether Cocoon's XSL-FO support could ever
generate a pdf of a critical edition to compare with XeTeX's output ...)
<...snip...>
...
Another problem is the 
process chain itself.  If your approach means daisychaining 
various processes together, and at the end you discover 
something wrong, fixing can be a real pain: you have to go 
back, figure out which bit of the chain caused the problem 
(which can be real hard, when you have complex interactions) 
then rerun the whole thing before you can view the result.  
With Anastasia the problem is either in the tcl scripts (in 
which case, you can fix and see the results instantly)  or in 
the source xml (which will mean you need to reindex): there 
is no somewhere-in-between.
Here we're dealing with personal preferences, I think. I find Cocoon's
pipeline structure very handy for debugging, since it's easy to inspect
the output of every stage of the process: just add label attributes to
the pipeline, and then pass those labels on the query string in a field
called "cocoon-view". In the prototype mentioned above, you can add
"cocoon-view=source", or "step1", or "step2" to the query string to see
the source xml or the output of the two intermediate transformations.
This is so easy that I often start off by breaking a process into more
transformations than is necessary, until I've debugged it. In debugging
a TCL script doing the same complex work, you'd presumably set
breakpoints to do the same thing.
Peter Baker has summed up the lessons I've drawn from this thread very
well. My motivation for defending XSL in such tedious detail is that it
is my pet technology, which I freely admit is not always a good ground
for choosing. But it shouldn't be dismissed out of hand: the main
limiting factor on what we can do, at least in my environment, is not
processing power or storage space but my development time. XSL is so
useful for so many things that I expect I'll get a higher return on the
time I invest if I find a way to do things with XSL than if I try to
learn a new technology, even if XSL isn't the best possible tool. I'm
also, I suspect, operating from different assumptions about what is
"good enough" than Peter R. is, since I'm interested in Latin texts,
where the range of orthographic variation is much narrower (and less
interesting) than in the vernacular texts that Anastasia has been used
on. It may well be that an XSL-based system could get an acceptable
level of performance for a Latin text that it couldn't match for a
vernacular text of comparable length and number of witnesses.
My conclusion: I think a system comparable to Anastasia could be built
using XSL; if such a system existed, I'd use it in preference to
Anastasia because it would fit better with my technology environment;
but now that something as good as Anastasia exists and is available
under an open-source license, the only real reason to develop such a
system would be for the fun of it.
Peter
Peter Binkley, Ph.D., MLIS
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley@ualberta.ca

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

RE: [dm-l] what xslt can't do..?