what xslt can't do..?

List overview All Threads
Download

newer

older

RE: [dm-l] what xslt can't do..?

Peter Robinson

6 Apr 2005 6 Apr '05

3:13 p.m.

Peter Baker says...

...

I must say, though, Peter, that this isn't the first time you've mentioned things that XSL can't do. I'm not sure I believe that it can't be done. Someday, when we've all got lots of time, we'll have to have a programming shootout. You can offer up a problem that you think XSL can't handle, and you code it in C (or whatever it is you use), and I'll use XSLT, and we'll see who gets there first.

indeed, I have become rather a bore on this subject. And I have several times pointed out that XSLT has great difficulty with what is conceptually a simple task: just show me the text on one page. I have several times set the challenge: somebody, try and do with XSLT what we do for many thousands of pages (in the most basic form; just reproduce what we do for a single page of the Hengwrt Chaucer, for example). I have had some extremely lengthy explanations of how it *could* be done, and I am certainly prepared to believe these might actually work. In the same way, I am sure that with sufficient ingenuity one could use a lawnmower to heat a house. Just, one might find easier ways to do it.

So, here is a simple example, then: <div> <head>The whole text and all the texts</head> <div> <pb n="1"/> <head>First text</head> <p n="1">some text starts here and goes ita<hi rend="italic">lic an<pb n="2"/>d then</i> we get a pagebreak</p> <p n="2">so the text finishes</p> <p n="3"> with yet another page <pb n="3"/> and another page start </p> </div> <div> <head>Second text</head> <pb n="4"/> <p n="1">here my new text on the next page etc etc</p> <pb n="5"/> <p n="2">here my new text on the next page etc etc</p> </div> </div>

So, Peter, here is your challenge. Just make us an xslt file that pulls out just the text on each of the five pages. It would be a nice bonus if each page were to have a header which told us what text there is on each page (for example: page 3 contains 'First text, para 3 -- Second text, head'; page 5 contains just 'Second text, para 5'). To make it slightly more of a challenge: you should do this direct from this sample, and not from a transform of this sample.

By the way, notice the problematics about the final page break. Should it be..

<head>Second text</head> <pb n="4"/> <p n="1">here OR <head>Second text<pb n="4"/></head> <p n="1">here OR <head>Second text</head> <p n="1"><pb n="4"/>here

Answers on the back of a postage stamp please.

all the best Another Peter

Show replies by date

Abdullah Alger

6 Apr 6 Apr

4:02 p.m.

For the sake of this conversation, I thought that it would be interesting to look at http://www.dpawson.co.uk/xsl/sect2/nono.html . This page is titled Things XSLT can't do'.

Abdullah Alger

Quoting Peter Robinson peter.robinson@dmu.ac.uk:

...

The Digital Medievalist List (see end of message for contact information and project URLs).

Peter Baker says...

...
I must say, though, Peter, that this isn't the first time you've mentioned things that XSL can't do. I'm not sure I believe that it can't be done. Someday, when we've all got lots of time, we'll have to have a programming shootout. You can offer up a problem that you think XSL can't handle, and you code it in C (or whatever it is you use), and I'll use XSLT, and we'll see who gets there first.

indeed, I have become rather a bore on this subject. And I have several times pointed out that XSLT has great difficulty with what is conceptually a simple task: just show me the text on one page. I have several times set the challenge: somebody, try and do with XSLT what we do for many thousands of pages (in the most basic form; just reproduce what we do for a single page of the Hengwrt Chaucer, for example). I have had some extremely lengthy explanations of how it *could* be done, and I am certainly prepared to believe these might actually work. In the same way, I am sure that with sufficient ingenuity one could use a lawnmower to heat a house. Just, one might find easier ways to do it.

So, here is a simple example, then:

<div> <head>The whole text and all the texts</head> <div> <pb n="1"/> <head>First text</head> <p n="1">some text starts here and goes ita<hi rend="italic">lic an<pb n="2"/>d then</i> we get a pagebreak</p> <p n="2">so the text finishes</p> <p n="3"> with yet another page <pb n="3"/> and another page start </p> </div> <div> <head>Second text</head> <pb n="4"/> <p n="1">here my new text on the next page etc etc</p> <pb n="5"/> <p n="2">here my new text on the next page etc etc</p> </div> </div>

So, Peter, here is your challenge. Just make us an xslt file that pulls out just the text on each of the five pages. It would be a nice bonus if each page were to have a header which told us what text there is on each page (for example: page 3 contains 'First text, para 3 -- Second text, head'; page 5 contains just 'Second text, para 5'). To make it slightly more of a challenge: you should do this direct from this sample, and not from a transform of this sample.

By the way, notice the problematics about the final page break. Should it be..

<head>Second text</head> <pb n="4"/> <p n="1">here OR <head>Second text<pb n="4"/></head> <p n="1">here OR <head>Second text</head> <p n="1"><pb n="4"/>here

Answers on the back of a postage stamp please.

all the best Another Peter

Digital Medievalist Project Homepage: http://www.digitalmedievalist.org Journal (December 2004-): http://www.digitalmedievalist.org/journal.cfm RSS (announcements) server: http://www.digitalmedievalist.org/rss/rss2.cfm Wiki: http://sql.uleth.ca/dmorgwiki/index.php Change membership options: http://listserv.uleth.ca/mailman/listinfo/dm-l Submit RSS announcement: http://www.digitalmedievalist.org/newitem.cfm Contact editorial Board: digitalmedievalist@uleth.ca dm-l mailing list dm-l@uleth.ca http://listserv.uleth.ca/mailman/listinfo/dm-l

Peter Baker

8:38 p.m.

Peter Robinson wrote:

...

So, Peter, here is your challenge. Just make us an xslt file that pulls out just the text on each of the five pages. It would be a nice bonus if each page were to have a header which told us what text there is on each page (for example: page 3 contains 'First text, para 3 -- Second text, head'; page 5 contains just 'Second text, para 5'). To make it slightly more of a challenge: you should do this direct from this sample, and not from a transform of this sample. ... Answers on the back of a postage stamp please.

Now Peter, I'm perfectly happy to take on this problem, but I think we both know that your demand that the transform be done in one pass makes it not so much "slightly more of a challenge" as "impossible." But what would be the problem with doing the transform in two passes, or three, or four? One could wrap them all up in some Python or Java, and the user wouldn't know or care. Assuming we can get past this little issue, why yes, I think it can be done, and I'll be happy to give it a try. Notice another qualification in my note: "when we've all got lots of time." It won't be done tomorrow (term's on, and I have obligations outstanding), but I'll post the solution (or a link to it) to this list when it's ready.

Peter's observation about heating one's house with a lawnmower, together with the page that Abdullah links to, brings up interesting questions about programming languages. I don't see Peter's analogy as quite apt: XSLT is for transforming XML. It is a heater, not a lawnmower: the question is whether it is up to the job. "Things XSLT can't do" addresses that problem--or appears to. Some of the "no-nos" are just silly ("Can I nest elements within attributes"), while others deal with what appear to be real limitations (can't "Change the value of a variable"). Any specialized language has limitations, of course (don't try to use XSLT to crack the human genome), but many of the queries at the site are from people who are still thinking in Java or C and haven't quite grokked the XSLT way of doing things.

I started using XSLT a couple of years ago to generate pages for my Old English Aerobics anthology, and, coming from Java, I frankly found it limiting and infuriating. For reasons I still don't quite understand, I then started to use it to develop an XML-based high-level hinting language for TrueType fonts. You've got to know a little about TrueType to understand the nuttiness of the idea: it was transforming XML into assembly language. On countless occasions I've thought, "XSLT can't do this"; then after a few days (or weeks) of walking in the park or taking the kids to the playground I've realized that, yes, it really can. The latest passage was realizing that it really could parse an attribute that looked like

value="control-value(lc-vert-stem) * (minimum-distance / 1.23)"

and do it elegantly and without invoking any extension functions. The language has repeatedly exceeded my expectations.

By the way, "Things XSLT can't do" comes in the context of an XSL/XSLT/FO information site. It isn't an anti-XSLT screed.

Peter

James Cummings

7 Apr 7 Apr

5:07 a.m.

I've always been happy to agree with Peter(Robinson) that XSLT has some major limitations. I, like Peter(Baker) think that a solution can be found within XSLT for this problem. My approach would be, like Peter(Baker) suggests, to do multiple transforms. Increasingly I'm doing this not as a set of discrete files (though the modularity of that is very attractive), but through using variables in XSLT2 which can then have further operations applied to them. i.e. creating temporary trees since you can continue to work on so-called result tree fragments in XSLT2. But Peter Binkley has solved the specific problem in his post.

However, as a general approach, I like to see the first step as a conversion of one hierarchy to another, so converting every element in the document to milestone like elements indicating start and end. The priviledging that set of milestones that are most important to you for that transformation.

But the XML could have been encoded as such to begin with couldn't it? So perhaps the problem isn't that it can be difficult in XSLT to transform from one arbitrary hierarchy to another, but that the choice of hierarchy should be suited to the purpose. If the aim is to display pages, then priviledge that hierarchy and break the other ones?

I've knocked up a quick XSLT1 stylesheet that takes any XML document as an input, and wraps it in a document element and creates all other elements (except <head> and <pb/> ;-) ) as milestone elements indicating their start and end.

So, passing the text.xml through it, I get: -------------- <document> <div start="true"/> <head>The whole text and all the texts</head> <div start="true"/> <pb n="1"/> <head>First text</head> <p start="true" n="1"/>some text starts here and goes ita<hi start="true" rend="italic"/>lic an<pb n="2"/>d then<hi end="true" rend="italic"/> we get a pagebreak<p end="true" n="1"/> <p start="true" n="2"/>so the text finishes<p end="true" n="2"/> <p start="true" n="3"/> with yet another page <pb n="3"/> and another page start <p end="true" n="3"/> <div end="true"/> <div start="true"/> <head>Second text</head> <pb n="4"/> <p start="true" n="1"/>here my new text on the next page etc etc<p end="true" n="1"/> <pb n="5"/> <p start="true" n="2"/>here my new text on the next page etc etc<p end="true" n="2"/> <div end="true"/> <div end="true"/> </document> -------------- Then one could fairly easily use an xsl:for-each-group solution to group elements by <pb>. For what it is worth (and it could be done in a more sophisticated and efficient manner) the XSLT for milestoning a document that I knocked up was: -------------- <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" indent="no" encoding="utf-8"/> <xsl:template match="/"> <document> <xsl:call-template name="element-start"/> xsl:apply-templates/ <xsl:call-template name="element-end"/> </document> </xsl:template> <xsl:template match="*" priority="-1"> <xsl:call-template name="element-start"/> xsl:apply-templates/ <xsl:call-template name="element-end"/> </xsl:template> <xsl:template match="head"> <head> xsl:apply-templates/ </head> </xsl:template> <xsl:template match="pb"> <pb> <xsl:for-each select="@*"> <xsl:attribute name="{name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> </pb> </xsl:template> <xsl:template name="element-start"> <xsl:element name="{name()}"> <xsl:attribute name="start">true</xsl:attribute> <xsl:for-each select="@*"> <xsl:attribute name="{name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> </xsl:element> </xsl:template> <xsl:template name="element-end"> <xsl:element name="{name()}"> <xsl:attribute name="end">true</xsl:attribute> <xsl:for-each select="@*"> <xsl:attribute name="{name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> </xsl:element> </xsl:template> </xsl:stylesheet> --------------

-James

-- Dr James Cummings, Oxford Text Archive, University of Oxford James dot Cummings at oucs dot ox dot ac dot uk

7428

Age (days ago)

7429

Last active (days ago)

dm-l@uleth.ca

3 comments

4 participants

tags (0)

participants (4)

Abdullah Alger
James Cummings
Peter Baker
Peter Robinson