Discussion:
Using Saxon 2.0 with FOP, XEP, Antenna House
Jesper Tverskov
2013-10-15 13:31:33 UTC
Permalink
Hi list

FOP, XEP and Antenna House come out of the box with XSLT 1.0
processors, Xalan, Saxon, MSXML.

I would like to use a Saxon XSLT 2.0 processor, but have a problem.
Let me use Antenna House as example.

If I make a bat file, and first do the XML to FO transformation using
a Saxon XSLT 2.0 processor, output FO to a file, and then in the same
bat file use AH to transform the FO file to pdf, it works but it is
much, much, much slower, a crawl compared to running.

When using MSXML and AH, I have a feeling that AH takes over the FO
file from XSLT from memory.

Now, instead of specifying the two processors in the bat file, in AH
you can just specify AH in the bat file, and use a configuration file
to setup Saxon. The XSLT line in the config file could look like this,
based on AH documentation :

<xslt-settings msxml="false" command="SaxonHE9.5N/bin/Transform
-o:&#34;%3&#34; &#34;%1&#34; &#34;%2&#34;"/>

It works. But the result is so slow that it is my guess that the
config file is simply doing what is similar to two separate
transformations in the bat file.

Conclusion: It looks to me that Antenna House at the command line is
optimized to use MSXML and that any other XSLT processor will be much,
much, much slower.

Are the above true for AH or have I overlooked something? And what
about FOP and XEP? Are they also optimized to work much faster with
the build in XSLT processor, at least at the command line?

Regards
Jesper Tverskov

--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Michael Kay
2013-10-15 14:03:58 UTC
Permalink
Certainly in the case of FOP, it's possible to pass the data between Saxon and FOP at the level of SAX events. I used to package this in the product, but abandoned it because it wasn't possible to do the integration in a way that worked across FOP releases.

I suspect that any slowness you experience is more to do with the Java VM startup cost than with the extra cost of XML parsing and serialization; but that's conjecture.

Michael Kay
Saxonica
Post by Jesper Tverskov
Hi list
FOP, XEP and Antenna House come out of the box with XSLT 1.0
processors, Xalan, Saxon, MSXML.
I would like to use a Saxon XSLT 2.0 processor, but have a problem.
Let me use Antenna House as example.
If I make a bat file, and first do the XML to FO transformation using
a Saxon XSLT 2.0 processor, output FO to a file, and then in the same
bat file use AH to transform the FO file to pdf, it works but it is
much, much, much slower, a crawl compared to running.
When using MSXML and AH, I have a feeling that AH takes over the FO
file from XSLT from memory.
Now, instead of specifying the two processors in the bat file, in AH
you can just specify AH in the bat file, and use a configuration file
to setup Saxon. The XSLT line in the config file could look like this,
<xslt-settings msxml="false" command="SaxonHE9.5N/bin/Transform
-o:&#34;%3&#34; &#34;%1&#34; &#34;%2&#34;"/>
It works. But the result is so slow that it is my guess that the
config file is simply doing what is similar to two separate
transformations in the bat file.
Conclusion: It looks to me that Antenna House at the command line is
optimized to use MSXML and that any other XSLT processor will be much,
much, much slower.
Are the above true for AH or have I overlooked something? And what
about FOP and XEP? Are they also optimized to work much faster with
the build in XSLT processor, at least at the command line?
Regards
Jesper Tverskov
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Jesper Tverskov
2013-10-16 11:08:31 UTC
Permalink
Hi Michael

I'm not sure that I understand your answer.

When I make my XSLT+FO transformation at the command line:

MSXML+Antenna House = 1.3s
Saxon 9n + Antenna House = 5s
Saxon 9j + Antenna House = 2.5s

The above figures are a little hard to understand considering that the
XSLT step only takes a fraction of the FO step. Somehow AH must be
optimized to use MSXML at the command line. I don't know if it is
possible to overwin the "Java VM startup" problem as long as I have to
use the command line interface of AH?

Can I conclude that if I want to use an XSLT 2.0 processor, and if
speed matters, one must use the .Net or Java interface of AH or of
other FO processors? I can just as well forget about the command line
interface?

Regards
Jesper Tverskov

http://www.xmlplease.com
Post by Michael Kay
Certainly in the case of FOP, it's possible to pass the data between Saxon and FOP at the level of SAX events. I used to package this in the product, but abandoned it because it wasn't possible to do the integration in a way that worked across FOP releases.
I suspect that any slowness you experience is more to do with the Java VM startup cost than with the extra cost of XML parsing and serialization; but that's conjecture.
Michael Kay
Saxonica
Post by Jesper Tverskov
Hi list
FOP, XEP and Antenna House come out of the box with XSLT 1.0
processors, Xalan, Saxon, MSXML.
I would like to use a Saxon XSLT 2.0 processor, but have a problem.
Let me use Antenna House as example.
If I make a bat file, and first do the XML to FO transformation using
a Saxon XSLT 2.0 processor, output FO to a file, and then in the same
bat file use AH to transform the FO file to pdf, it works but it is
much, much, much slower, a crawl compared to running.
When using MSXML and AH, I have a feeling that AH takes over the FO
file from XSLT from memory.
Now, instead of specifying the two processors in the bat file, in AH
you can just specify AH in the bat file, and use a configuration file
to setup Saxon. The XSLT line in the config file could look like this,
<xslt-settings msxml="false" command="SaxonHE9.5N/bin/Transform
-o:&#34;%3&#34; &#34;%1&#34; &#34;%2&#34;"/>
It works. But the result is so slow that it is my guess that the
config file is simply doing what is similar to two separate
transformations in the bat file.
Conclusion: It looks to me that Antenna House at the command line is
optimized to use MSXML and that any other XSLT processor will be much,
much, much slower.
Are the above true for AH or have I overlooked something? And what
about FOP and XEP? Are they also optimized to work much faster with
the build in XSLT processor, at least at the command line?
Regards
Jesper Tverskov
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Michael Kay
2013-10-16 11:42:49 UTC
Permalink
Post by Jesper Tverskov
Hi Michael
I'm not sure that I understand your answer.
MSXML+Antenna House = 1.3s
Saxon 9n + Antenna House = 5s
Saxon 9j + Antenna House = 2.5s
The above figures are a little hard to understand considering that the
XSLT step only takes a fraction of the FO step. Somehow AH must be
optimized to use MSXML at the command line. I don't know if it is
possible to overwin the "Java VM startup" problem as long as I have to
use the command line interface of AH?
I'm afraid I know nothing about AntennaHouse or what facilities it provides for invoking different XSLT processors. However, a startup cost of 2.5s for starting up a Java VM doesn't surprise me in the slightest. I'm slightly more surprised by the 5s for .NET, since I was under the impression .NET startup costs were lower than Java.

Are you sure they are invoking MSXML via a command line interface? I would think it's more likely they are invoking it directly via an API call, in which case it will be much faster than any processor invoked using via a command line exec.

Michael Kay
Saxonica


--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Jesper Tverskov
2013-10-16 12:07:57 UTC
Permalink
Very interesting. So you tell me that when the AH command line
interface get this line in a bat file:

AHFCmd -d input.xml -i AHFSettings.xml -s stylesheet.xsl -o output.pdf

the AHFCmd.exe is in reality a "faked" or "improved" command line
interface? Probably a .Net or Java API. and that when I modify that
using the AH config file to change XSLT processor, I'm back at a
"standard" command line interface, and a very slow speed?

I will go and ask AH, if that is the explanation, thanks.

Regards
Jesper Tverskov
Post by Michael Kay
Post by Jesper Tverskov
Hi Michael
I'm not sure that I understand your answer.
MSXML+Antenna House = 1.3s
Saxon 9n + Antenna House = 5s
Saxon 9j + Antenna House = 2.5s
The above figures are a little hard to understand considering that the
XSLT step only takes a fraction of the FO step. Somehow AH must be
optimized to use MSXML at the command line. I don't know if it is
possible to overwin the "Java VM startup" problem as long as I have to
use the command line interface of AH?
I'm afraid I know nothing about AntennaHouse or what facilities it provides for invoking different XSLT processors. However, a startup cost of 2.5s for starting up a Java VM doesn't surprise me in the slightest. I'm slightly more surprised by the 5s for .NET, since I was under the impression .NET startup costs were lower than Java.
Are you sure they are invoking MSXML via a command line interface? I would think it's more likely they are invoking it directly via an API call, in which case it will be much faster than any processor invoked using via a command line exec.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Michael Kay
2013-10-16 13:15:24 UTC
Permalink
Post by Jesper Tverskov
Very interesting. So you tell me that when the AH command line
AHFCmd -d input.xml -i AHFSettings.xml -s stylesheet.xsl -o output.pdf
the AHFCmd.exe is in reality a "faked" or "improved" command line
interface?
Not necessarily "faked". It might just be a very small executable that links to an already-loaded DLL - running that would be far faster that initializing a Java VM.

Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Jesper Tverskov
2013-10-17 11:53:03 UTC
Permalink
Thanks for helping me understand how speed is determined in the full
process from XML to XSLFO to PDF.

In the case of the fast speed of Antenna House at the command line, I
must conclude that it is MSXML that does the difference.

The .Net version of Saxon 9 takes 4s to transform to xslfo
at the command line (in my case the xslfo is very small, later to be
transformed to a pdf of a couple of pages).

The msxsl.exe version of MSXML takes 0.4s to transform to xslfo at the
command line. Much, much, much faster!

I can only conclude that if different FO-processors are equally fast,
the one using MSXML will be much, much faster at the command line for
a small pdf job of a couple of pages.

If the pdf is huge on the other hand, thousands of page with a lot of
images, the extremely fast startup time of MSXML is irrelevant. Then
the speed is determined by the FO-processor.

Since my client only have this one, small pdf, created at the
command line 7-800 at a time (a very small number by the way), it is
meaningless to upgrade to XSLT 2.0 as long as the command line
interface is used.

In my case XSLT 2.0 is only relevant if I drop the command line and
use the .NET or Java API instead, but my hunch is that it will only
narrow the time gap..

Does it make sense? Or have I misunderstood something?

Regards
Jesper Tverskov
http://www.xmlplease.com
Post by Michael Kay
Post by Jesper Tverskov
Very interesting. So you tell me that when the AH command line
AHFCmd -d input.xml -i AHFSettings.xml -s stylesheet.xsl -o output.pdf
the AHFCmd.exe is in reality a "faked" or "improved" command line
interface?
Not necessarily "faked". It might just be a very small executable that links to an already-loaded DLL - running that would be far faster that initializing a Java VM.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Michael Kay
2013-10-17 13:25:20 UTC
Permalink
Post by Jesper Tverskov
Thanks for helping me understand how speed is determined in the full
process from XML to XSLFO to PDF.
In the case of the fast speed of Antenna House at the command line, I
must conclude that it is MSXML that does the difference.
The .Net version of Saxon 9 takes 4s to transform to xslfo
at the command line (in my case the xslfo is very small, later to be
transformed to a pdf of a couple of pages).
The msxsl.exe version of MSXML takes 0.4s to transform to xslfo at the
command line. Much, much, much faster!
If you want to know how much of the time is start-up time and how much is actually spent doing the transformation, then run Saxon from the command line with -repeat:50 and -t, to get an average time. This will give you an indication of how long the task would take if you were able to run it in a VM that was already "warmed up".

On the other hand, if you don't have any other way to do the integration other than using the command line, then this information is (sadly) of little use to you!

But if you're creating a batch of 800 of these small xsl-fo files that need transforming, then it might be a lot better to write them to disk, and then transform the whole lot in a single Saxon run (e.g. by using the capability on the Saxon command line to transform a whole directory at once.)

Michael Kay
Saxonica


--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Jesper Tverskov
2013-10-18 09:34:30 UTC
Permalink
Hi Michael

When I do a Saxon transformation to XSLFO at the command line, my file takes:

a) Stopwatch time: 7.5s.

The "-t" tells me that:

b) Stylesheet compilation time: 2398ms
c) Building tree input file: 86ms
d) Execution time: 531ms.

Does it mean that:

e) start-up time = a - (b+c+d)

Am I right that when doing 1000 transformations with the .NET or Java
Interface, I only have 1 start-up time and 1 compilation time? And
1000 times c + d?

Also I have tested the loading of a directory and output to directory
feature at the command line in Saxon. It works well, but I don't
think, I can get a FO-processor to take over such an output directory
as input to the FO transformation from within the bat file.

Regards
Jesper Tverskov
http://www.xmlplease.com
Post by Michael Kay
Post by Jesper Tverskov
Thanks for helping me understand how speed is determined in the full
process from XML to XSLFO to PDF.
In the case of the fast speed of Antenna House at the command line, I
must conclude that it is MSXML that does the difference.
The .Net version of Saxon 9 takes 4s to transform to xslfo
at the command line (in my case the xslfo is very small, later to be
transformed to a pdf of a couple of pages).
The msxsl.exe version of MSXML takes 0.4s to transform to xslfo at the
command line. Much, much, much faster!
If you want to know how much of the time is start-up time and how much is actually spent doing the transformation, then run Saxon from the command line with -repeat:50 and -t, to get an average time. This will give you an indication of how long the task would take if you were able to run it in a VM that was already "warmed up".
On the other hand, if you don't have any other way to do the integration other than using the command line, then this information is (sadly) of little use to you!
But if you're creating a batch of 800 of these small xsl-fo files that need transforming, then it might be a lot better to write them to disk, and then transform the whole lot in a single Saxon run (e.g. by using the capability on the Saxon command line to transform a whole directory at once.)
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Michael Kay
2013-10-18 19:59:57 UTC
Permalink
Post by Jesper Tverskov
Hi Michael
a) Stopwatch time: 7.5s.
b) Stylesheet compilation time: 2398ms
c) Building tree input file: 86ms
d) Execution time: 531ms.
e) start-up time = a - (b+c+d)
No, because the VM warms up gradually: it loads classes as they are needed, and the hotspot compiler gradually speeds up the most frequently executed code, learning as it goes. So there's still a lot of "warm-up" happening after Saxon starts the clock. To get timings that eliminate warm-up overhead, you need to use the -repeat option - for something that takes 7s, I would start with -repeat:20.
Post by Jesper Tverskov
Am I right that when doing 1000 transformations with the .NET or Java
Interface, I only have 1 start-up time and 1 compilation time? And
1000 times c + d?
Yes to the first question. To the second, it will probably be rather faster because the first transformation always takes much longer than subsequent ones.
Post by Jesper Tverskov
Also I have tested the loading of a directory and output to directory
feature at the command line in Saxon. It works well, but I don't
think, I can get a FO-processor to take over such an output directory
as input to the FO transformation from within the bat file.
Other possibilities are to organize the processing using XProc, xmlsh, or ant; or to write custom control logic in Java; or to write it in XSLT, with a stylesheet that uses the collection() function to process many input files.

Michael Kay
Saxonica


--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Jesper Tverskov
2013-10-19 12:14:21 UTC
Permalink
Hi Michael

I have made some C# code with 1000 input file names in an array (just
the same file for this test), I then made 1000 transformations in a
for-each loop and generated 1000 xslfo.xml files. "-repeat:1000"
predicted 8ms on average.

It took around 17s to create 1000 xmlfo files in .NET API
It took around 17s to create 3 xmlfo files at the command line.

I now need to get the FO-processor into the loop.

Please allow me one more question:

I have noticed that FO processors like FOP and Antenna House, both use
the XSLT processor indirectly, build-in so to speak, probably in order
to avoid the need for specifying both processors at the command line
and getting two start-up times?

For the same reason if speed counts at the command line, both
processors should be made with the same programming language?

Regards
Jesper
Post by Michael Kay
Post by Jesper Tverskov
Hi Michael
a) Stopwatch time: 7.5s.
b) Stylesheet compilation time: 2398ms
c) Building tree input file: 86ms
d) Execution time: 531ms.
e) start-up time = a - (b+c+d)
No, because the VM warms up gradually: it loads classes as they are needed, and the hotspot compiler gradually speeds up the most frequently executed code, learning as it goes. So there's still a lot of "warm-up" happening after Saxon starts the clock. To get timings that eliminate warm-up overhead, you need to use the -repeat option - for something that takes 7s, I would start with -repeat:20.
Post by Jesper Tverskov
Am I right that when doing 1000 transformations with the .NET or Java
Interface, I only have 1 start-up time and 1 compilation time? And
1000 times c + d?
Yes to the first question. To the second, it will probably be rather faster because the first transformation always takes much longer than subsequent ones.
Post by Jesper Tverskov
Also I have tested the loading of a directory and output to directory
feature at the command line in Saxon. It works well, but I don't
think, I can get a FO-processor to take over such an output directory
as input to the FO transformation from within the bat file.
Other possibilities are to organize the processing using XProc, xmlsh, or ant; or to write custom control logic in Java; or to write it in XSLT, with a stylesheet that uses the collection() function to process many input files.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--
Tony Graham
2013-10-19 14:44:42 UTC
Permalink
...
Post by Michael Kay
Post by Jesper Tverskov
Also I have tested the loading of a directory and output to directory
feature at the command line in Saxon. It works well, but I don't
think, I can get a FO-processor to take over such an output directory
as input to the FO transformation from within the bat file.
Other possibilities are to organize the processing using XProc, xmlsh, or
ant; or to write custom control logic in Java; or to write it in XSLT,
with a stylesheet that uses the collection() function to process many
input files.
Or you could turn things inside out and run the formatter as an extension
function inside the XSLT transformation and call it once for each of the
many files, hopefully without too much of a start-up cost each time.

We have an extension function [1] for running an FO processor within the
XSLT transformation for making decisions based on formatted sizes before
running the formatter 'for real' on the final result of the
transformation, and it wouldn't be very different to write the PDF output
to a file rather than returning an area tree.

Regards,


Tony Graham ***@mentea.net
Consultant http://www.mentea.net
Mentea 13 Kelly's Bay Beach, Skerries, Co. Dublin, Ireland
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
XML, XSL-FO and XSLT consulting, training and programming
Chair, Print and Page Layout Community Group @ W3C

[1] http://www.w3.org/community/ppl/wiki/FOPRunXSLTExt



--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-***@lists.mulberrytech.com>
--~--

Loading...