Server IP : 184.154.167.98 / Your IP : 3.133.155.253 Web Server : Apache System : Linux pink.dnsnetservice.com 4.18.0-553.22.1.lve.1.el8.x86_64 #1 SMP Tue Oct 8 15:52:54 UTC 2024 x86_64 User : puertode ( 1767) PHP Version : 8.2.26 Disable Function : NONE MySQL : OFF | cURL : ON | WGET : ON | Perl : ON | Python : ON | Sudo : ON | Pkexec : ON Directory : /usr/share/doc/libxslt-devel/tutorial2/ |
Upload File : |
<?xml version="1.0" encoding="iso-8859-2"?> <!DOCTYPE article SYSTEM "file:///usr/share/docbook/docbook-xml-4.3/docbookx.dtd"> <article id="libxslt"> <articleinfo> <author><firstname>Panos</firstname><surname>Louridas</surname></author> <copyright> <year>2004</year> <holder>Panagiotis Louridas</holder> </copyright> <legalnotice> <para>Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: </para> <para>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. </para> <para>THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</para> </legalnotice> </articleinfo> <title>libxslt: An Extended Tutorial</title> <sect1><title>Introduction</title> <para>The Extensible Stylesheet Language Transformations (XSLT) specification defines an XML template language for transforming XML documents. An XSLT engine reads an XSLT file and an XML document and transforms the document accordingly.</para> <para>We want to perform a series of XSLT transformations to a series of documents. An obvious solution is to use the operating system's pipe mechanism and start a series of transformation processes, each one taking as input the output of the previous transformation. It would be interesting, though, and perhaps more efficient if we could do our job within a single process.</para> <para>libxslt is a library for doing XSLT transformations. It is built on libxml, which is a library for handling XML documents. libxml and libxslt are used by the GNOME project. Although developed in the *NIX world, both libxml and libxslt have been ported to the MS-Windows platform. In principle an application using libxslt should be easily portable between the two systems. In practice, however, there arise various wrinkles. These do not have anything to do with libxml or libxslt per se, but rather with the different compilation and linking procedures of each system.</para> <para>The presented solution is an extension of <ulink url="http://xmlsoft.org/XSLT/tutorial/libxslttutorial.html">John Fleck's libxslt tutorial</ulink>, but the present tutorial tries to be self-contained. It develops a minimal libxslt application (libxslt_pipes) that can perform a series of transformations to a series of files in a pipe-like manner. An invocation might be:</para> <para> <userinput> libxslt_pipes --out results.xml foo.xsl bar.xsl doc1.xml doc2.xml </userinput> </para> <para>The <filename>foo.xsl</filename> stylesheet will be applied to <filename> doc1.xml</filename> and the <filename>bar.xsl</filename> stylesheet will be applied to the resulting document; then the two stylesheets will be applied in the same sequence to <filename>bar.xsl</filename>. The results are sent to <filename>results.xml</filename> (if no output is specified they are sent to standard output).</para> <para>The application is compiled in both *NIX systems and MS-Windows, where by *NIX systems we mean Linux, BSD, and other members of the family. The gcc suite is used in the *NIX platform and the Microsoft compiler and linker are used in the MS-Windows platform.</para> </sect1> <sect1><title>Setting the Scene</title> <para> We need to include the necessary libraries: <programlisting> <![CDATA[ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <libxslt/transform.h> #include <libxslt/xsltutils.h> ]]> </programlisting> </para> <para>The first group of include directives includes general C libraries. The libraries we need to make libxslt work are in the second group. The <filename>transform.h</filename> header file declares the API that does the bulk of the actual processing. The <filename>xsltutils.h</filename> header file declares the API for some generic utility functions of the XSLT engine; among other things, saving to a file, which is what we need it for.</para> <para> If our input files contain entities through external subsets, we need to tell libxslt to load them. The global variable <function>xmlLoadExtDtdDefaultValue</function>, defined in <filename>libxml/globals.h</filename>, is responsible for that. As the variable is defined outside our program we must specify external linkage: <programlisting> extern int xmlLoadExtDtdDefaultValue; </programlisting> </para> <para> The program is called from the command line. We anticipate that the user may not call it the right way, so we define a function for describing its usage: <programlisting> static void usage(const char *name) { printf("Usage: %s [options] stylesheet [stylesheet ...] file [file ...]\n", name); printf(" --out file: send output to file\n"); printf(" --param name value: pass a (parameter,value) pair\n"); } </programlisting> </para> </sect1> <sect1><title>Program Start</title> <para>We need to define a few variables that are used throughout the program: <programlisting> int main(int argc, char **argv) { int arg_indx; const char *params[16 + 1]; int params_indx = 0; int stylesheet_indx = 0; int file_indx = 0; int i, j, k; FILE *output_file = stdout; xsltStylesheetPtr *stylesheets = (xsltStylesheetPtr *) calloc(argc, sizeof(xsltStylesheetPtr)); xmlDocPtr *files = (xmlDocPtr *) calloc(argc, sizeof(xmlDocPtr)); int return_value = 0; </programlisting> </para> <para>The <varname>arg_indx</varname> integer is an index used to iterate over the program arguments. The <varname>params</varname> string array is used to collect the XSLT parameters. In XSLT, additional information may be passed to the processor via parameters. The user of the program specifies these in key-value pairs in the command line following the <userinput>--param</userinput> command line argument. We accept up to 8 such key-value pairs, which we track with the <varname>params_indx</varname> integer. libxslt expects the parameters array to be null-terminated, so we have to allocate one extra place (16 + 1) for it. The <varname>file_indx</varname> is an index to iterate over the files to be processed. The <varname>i</varname>, <varname>j</varname>, <varname>k</varname> integers are additional indices for iteration purposes, and <varname>return_value</varname> is the value the program returns to the operating system. We expect the result of the transformation to be the standard output in most cases, but the user may wish otherwise via the <option>--out</option> command line option, so we need to keep track of the situation with the <varname>output_file</varname> file pointer.</para> <para>In libxslt, XSLT stylesheets are internally stored in <structname>xsltStylesheet</structname> structures; similarly, in libxml XML documents are stored in <structname>xmlDoc</structname> structures. <type>xsltStylesheetPtr</type> and <type>xmlDocPtr</type> are simply typedefs of pointers to them. The user may specify any number of stylesheets that will be applied to the documents one after the other. To save time we parse the stylesheets and the documents as we read them from the command line and keep the parsed representation of them. The parsed results are kept in arrays. These are dynamically allocated and sized to the number of arguments; this wastes some space, but not much (the size of <type>xmlStyleSheetPtr</type> and <type>xmlDocPtr</type> is the size of a pointer) and simplifies code later on. The array memory is allocated with <function>calloc</function> to ensure contents are initialised to zero. </para> </sect1> <sect1><title>Arguments Collection</title> <para>If the program gets no arguments at all, we print the usage description, set the program return value to 1 and exit. Instead of returning directly we go to (literally) to the end of the program text where some housekeeping takes place.</para> <para> <programlisting> <![CDATA[ if (argc <= 1) { usage(argv[0]); return_value = 1; goto finish; } /* Collect arguments */ for (arg_indx = 1; arg_indx < argc; arg_indx++) { if (argv[arg_indx][0] != '-') break; if ((!strcmp(argv[arg_indx], "-param")) || (!strcmp(argv[arg_indx], "--param"))) { arg_indx++; params[params_indx++] = argv[arg_indx++]; params[params_indx++] = argv[arg_indx]; if (params_indx >= 16) { fprintf(stderr, "too many params\n"); return_value = 1; goto finish; } } else if ((!strcmp(argv[arg_indx], "-o")) || (!strcmp(argv[arg_indx], "--out"))) { arg_indx++; output_file = fopen(argv[arg_indx], "w"); } else { fprintf(stderr, "Unknown option %s\n", argv[arg_indx]); usage(argv[0]); return_value = 1; goto finish; } } params[params_indx] = 0; ]]> </programlisting> </para> <para>If the user passes arguments we have to collect them. This is a matter of iterating over the program argument list while we encounter arguments starting with a dash. The XSLT parameters are put into the <varname>params</varname> array and the <varname>output_file</varname> is set to the user request, if any. After processing all the parameter key-value pairs we set the last element of the <varname>params</varname> array to null. </para> </sect1> <sect1><title>Parsing</title> <para>The rest of the argument list is taken to be stylesheets and files to be transformed. Stylesheets are identified by their suffix, which is expected to be xsl (case sensitive). All other files are assumed to be XML documents, regardless of suffix.</para> <para> <programlisting> <![CDATA[ /* Collect and parse stylesheets and files to be transformed */ for (; arg_indx < argc; arg_indx++) { char *argument = (char *) malloc(sizeof(char) * (strlen(argv[arg_indx]) + 1)); strcpy(argument, argv[arg_indx]); if (strtok(argument, ".")) { char *suffix = strtok(0, "."); if (suffix && !strcmp(suffix, "xsl")) { stylesheets[stylesheet_indx++] = xsltParseStylesheetFile((const xmlChar *)argv[arg_indx]);; } else { files[file_indx++] = xmlParseFile(argv[arg_indx]); } } else { files[file_indx++] = xmlParseFile(argv[arg_indx]); } free(argument); } ]]> </programlisting> </para> <para>Stylesheets are parsed using the <function>xsltParseStylesheetFile</function> function. <function>xsltParseStylesheetFile</function> takes as argument a pointer to an <type>xmlChar</type>, a typedef of an unsigned char; in effect, the filename of the stylesheet. The resulting <type>xsltStylesheetPtr</type> is placed in the <varname>stylesheets</varname> array. In the same vein, XML files are parsed using the <function>xmlParseFile</function> function that takes as argument the file's name; the resulting <type>xmlDocPtr</type> is placed in the <varname>files</varname> array. </para> </sect1> <sect1><title>File Processing</title> <para>All stylesheets are applied to each file one after the other. Stylesheets are applied with the <function>xsltApplyStylesheet</function> function that takes as argument the stylesheet to be applied, the file to be transformed and any parameters we have collected. The in-memory representation of an XML document takes space, which we free using the <function>xmlFreeDoc</function> function. The file is then saved to the specified output.</para> <para> <programlisting> <![CDATA[ /* Process files */ for (i = 0; files[i]; i++) { doc = files[i]; res = doc; for (j = 0; stylesheets[j]; j++) { res = xsltApplyStylesheet(stylesheets[j], doc, params); xmlFreeDoc(doc); doc = res; } if (stylesheets[0]) { xsltSaveResultToFile(output_file, res, stylesheets[j-1]); } else { xmlDocDump(output_file, res); } xmlFreeDoc(res); } fclose(output_file); for (k = 0; stylesheets[k]; k++) { xsltFreeStylesheet(stylesheets[k]); } xsltCleanupGlobals(); xmlCleanupParser(); finish: free(stylesheets); free(files); return(return_value); ]]> </programlisting> </para> <para>To output an XML document we have in memory we use the <function>xlstSaveResultToFile</function> function, where we specify the destination, the document and the stylesheet that has been applied to it. The stylesheet is required so that output-related information contained in the stylesheet, such as the encoding to be used, is used in output. If no transformation has taken place, which will happen when the user specifies no stylesheets at all in the command line, we use the <function>xmlDocDump</function> libxml function that saves the source document to the file without further ado.</para> <para>As parsed stylesheets take up space in memory, we take care to free that memory after use with a call to <function>xmlFreeStyleSheet</function>. When all work is done, we clean up all global variables used by the XSLT library using <function>xsltCleanupGlobals</function>. Likewise, all global memory allocated for the XML parser is reclaimed by a call to <function>xmlCleanupParser</function>. Before returning we deallocate the memory allocated for the holding the pointers to the XML documents and stylesheets.</para> </sect1> <sect1><title>*NIX Compiling and Linking</title> <para>Compiling and linking in a *NIX environment is easy, as the required libraries are almost certain to be already in place (remember that libxml and libxslt are used by the GNOME project, so they are present in most installations). The program can be dynamically linked so that its footprint is minimized, or statically linked, so that it stands by itself, carrying all required code.</para> <para>For dynamic linking the following one liner will do:</para> <para> <userinput>gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 -lxslt -lxml2 -L/usr/lib libxslt_pipes.c</userinput> </para> <para>We assume that the necessary header files are in <filename class="directory">/usr/include/libxml2</filename> and that the required libraries (<filename>libxslt.so</filename>, <filename>libxml2.so</filename>) are in <filename class="directory">/usr/lib</filename>.</para> <para>In general, a program may need to link to additional libraries, depending on the processing it actually performs. A good way to start is to use the <command>xslt-config</command> script. The <option>--help</option> option displays usage information. Running</para> <para> <userinput> xslt-config --cflags </userinput> </para> <para>we get compile flags, while running</para> <para> <userinput> xslt-config --libs </userinput> </para> <para>we get the library settings for the linker.</para> <para>For static linking we must list more libraries than we did for dynamic linking, as the libraries on which the libxsl and libxslt libraries depend are also needed. Using <command>xslt-config</command> on a particular installation we create the following one-liner:</para> <para> <userinput> gcc -o libxslt_pipes -Wall -I/usr/include/libxml2 libxslt_pipes.c -static -L/usr/lib -lxslt -lxml2 -lz -lpthread -lm </userinput> </para> <para>If we get warnings to the effect that some function in statically linked applications requires at runtime the shared libraries used from the glibc version used for linking, that means that the binary is not completely static. Although we statically linked against the GNU C runtime library glibc, glibc uses external libraries to perform some of its functions. Same version libraries must be present on the system we want the application to run. One way to avoid this it to use an alternative C runtime, for example <ulink url="http://www.uclibc.org">uClibc</ulink>, which requires obtaining and building a uClibc toolchain first (if the reason for trying to get a statically linked version of the program is to embed it somewhere, using uClibc might be a good idea anyway). </para> </sect1> <sect1 id="windows-build"><title>MS-Windows Compiling and Linking</title> <para>Compiling and linking in MS-Windows requires some attention. First, the MS-Windows ports must be downloaded and installed in the programming workstation. The ports are available in <ulink url="http://www.zlatkovic.com/libxml.en.html">Igor Zlatkoviļæ½'s site</ulink>. We need the ports for iconv, zlib, libxml, and libxslt. In contrast to *NIX environments, we cannot assume that the libraries needed will be present in other computers where the program will be used. One solution is to distribute the program along with the necessary dynamic libraries. Another solution is to statically link the program so that only a single executable file will have to be distributed.</para> <para>We assume that we have decompressed the downloaded ports and have placed the required contents of their <filename class="directory">include</filename> directories in an <filename class="directory">include</filename> directory in our file system. The required contents include everything apart from the <filename class="directory">libexslt</filename> directory of the libxslt port, as we are not using EXLST (an initiative to provide extensions to XSLT) in this project. In order to compile the program we have to make sure that all necessary header files are included. When using the Microsoft compiler this translates to adding the required <option>/I</option> switches in the command line. If using a Visual Studio product the same effect is attained by specifying additional include directories in the compilation options. In the end, if the headers have been copied in <filename class="directory">C:\include</filename> the command line must contain <option>/I"C:\include" /I"C:\include\libslt" /I"C:\include\libxml"</option>.</para> <para>This being a C program, it needs to be compiled against an implementation of the C libraries. Microsoft provides various implementations. The ports, however, have been compiled against the <filename>msvcrt.dll</filename> implementation, so it is wise to use the same runtime in our project, lest we wish to come against unexpected runtime crashes. The <filename>msvcrt.dll</filename> is a multi-threaded implementation and is specified by giving <option>/MD</option> as a compiler option. Unfortunately, the correspondence between the <option>/MD</option> switch and <filename>msvcrt.dll</filename> breaks after version 6 of the Microsoft compiler. In version 7 and later (i.e., Visual Studio .NET), <option>/MD</option> links against a different DLL; in version 7.1 this is <filename>msvcrt71.dll</filename>. The end result of this bit of esoterica is that if you try to dynamically link your application with a compiler whose version is greater than 6, your program is likely to crash unexpectedly. Alternatively, you may wish to compile all iconv, zlib, libxml and libxslt yourself, using the new runtime library. This is not a tall order, and some details are given <link linkend="windows-ports-build">below</link>.</para> <para>There are three kinds of libraries in MS-Windows. Dynamically Linked Libraries (DLLs), like <filename>msvcrt.dll</filename> we met above, are used for dynamic linking; an application links to them at runtime, so the application does not include the code contained in them. Static libraries are used for static linking; an application adds the libraries' code to its own code at link time. Import libraries are used when building an application that uses DLLs. For the application to be built, the linker must somehow find the definitions of the functions that will be provided in runtime by the DLLs, otherwise it will complain about unresolved references. Import libraries contain function stubs that, for each DLL function we want to call, know where to look for it in the DLL. In essence, in order to use a DLL we must link against its corresponding import library. DLLs have a <filename>.dll</filename> suffix; static and import libraries both have a <filename>.lib</filename> suffix. In the MS-Windows ports of libxml and libxslt static libraries are distinguished by their name ending in <filename>_a.lib</filename>, while in the zlib port the import library is <filename>zdll.lib</filename> and the static library is <filename>zlib.lib</filename>. In what follows we assume we have a <filename class="directory">lib</filename> directory in our filesystem where we place the libraries we need for linking.</para> <para>If we want to link dynamically we must make sure the <filename class="directory">lib</filename> directory contains <filename>iconv.lib</filename>, <filename>libxslt.lib</filename>, <filename>libxml2.lib</filename>, and <filename>zdll.lib</filename>. When using the Microsoft linker this translates to adding the required <option>/LIBPATH</option> switch and the necessary libraries in the command line. In Visual Studio we must specify an additional library directory for <filename class="directory">lib</filename> and put the necessary libraries in the additional dependencies. In the end, the command line must include <option>/LIBPATH:"C:\lib" "lib\iconv.lib" "lib\libxslt.lib" "lib\libxml2.lib" "lib\zdll.lib"</option>, provided the libraries' directory is <filename class="directory">C:\lib</filename>. In order for the resulting executable to run, the ports DLLs must be present; one way is to place all DLLs contained in the ports in the home directory of our application, and make sure they are distributed together.</para> <para>If we want to link statically we must make sure the <filename class="directory">lib</filename> directory contains <filename>iconv_a.lib</filename>, <filename>libxslt_a.lib</filename>, <filename>libxml2_a.lib</filename>, and <filename>zlib.lib</filename>. Adding <filename class="directory">lib</filename> as a library directory and putting the necessary libraries in the additional dependencies, we get a command line that should include <option>/LIBPATH:"C:\lib" "lib\iconv_a.lib" "lib\libxslt_a.lib" "lib\libxml2_a.lib" "lib\zlib.lib"</option>. The resulting executable is much bigger than if we linked dynamically; it is, however, self-contained and can be distributed more easily, in theory at least. In practice, however, the executable is not completely static. We saw that the ports are compiled against <filename>msvcrt.dll</filename>, so the program does require that DLL at runtime. Moreover, since when using a version of Microsoft developer tools with a version number greater than 6, we are no longer using <filename>msvcrt.dll</filename>, but another runtime like <filename>msvcrt71.dll</filename>, and we then need that DLL. In contrast to <filename>msvcrt.dll</filename> it may not be present on the target computer, so we may have to copy it along.</para> <sect2 id="windows-ports-build"><title>Building the Ports in MS-Windows</title> <para>The source code of the ports is readily available on the web, one has to check the ports sites. Each port can be built without problems in an MS-Windows environment using Microsoft development tools. The necessary command line tools (compiler, linker, <command>nmake</command>) must be available. This means running a batch file called <command>vcvars32.bat</command> that comes with Visual Studio (its exact location in the directory tree may vary depending on the version of Visual Studio, but a file search will find it anyway). Makefiles for the Microsoft tools are found in all ports. They are distinguished by their suffix, e.g., <filename>Makefile.msvc</filename> or <filename>Makefile.msc</filename>. To build zlib it suffices to run <command>nmake</command> against <filename>Makefile.msc</filename> (i.e., with the <option>/F</option> option); similarly, to build <filename>iconv</filename> it suffices to run <command>nmake</command> against <filename>Makefile.msvc</filename>. Building libxml and libxslt requires an extra configuration step; we must run the <filename>configure.js</filename> configuration script with the <command>cscript</command> command. <filename>configure.js</filename> is found in the <filename class="directory">win32</filename> directory in the distributions. It is written in JScript, Microsoft's implementation of the ECMA 262 language specification (ECMAScript Edition 3), a JavaScript offspring. The configuration string takes a number of parameters detailing our environment and needs; <userinput>cscript configure.js help</userinput> documents them.</para> <para>It is wise to read all documentation files in the source distributions before starting; moreover, pay attention to the dependencies between the ports. If we configure libxml and libxslt to use iconv and zlib we must build these two first and make sure their headers and libraries can be found by the compiler and the linker when building libxml and libxslt.</para> </sect2> </sect1> <sect1><title>zlib, iconv and All That</title> <para>We saw that libxml and libxslt depend on various other libraries, for instance zlib, iconv, and so forth. Taking a look into them gives us clues on the capabilities of libxml and libxslt.</para> <para><ulink url="http://www.zlib.org">zlib</ulink> is a free general purpose lossless data compression library. It is a venerable workhorse; more than <ulink url="http://www.gzip.org/zlib/apps.html">500 applications</ulink> (both commercial and open source) seem to use the library. libxml uses zlib so that it can read from or write to compressed files directly. The <function>xmlParseFile</function> function can transparently parse a compressed document to produce an <structname>xmlDoc</structname>. If we want to create a compressed document with libxml we can use an <structname>xmlTextWriterPtr</structname> (obtained through <function>xmlNewTextWriterDoc</function>), or another related structure from <filename>libxml/xmlwriter.h</filename>, with compression enabled.</para> <para>XML allows documents to use a variety of different character encodings. <ulink url="http://www.gnu.org/software/libiconv">iconv</ulink> is a free library for converting between different character encodings. libxml provides a set of default converters for some encodings: UTF-8, UTF-16 (little endian and big endian), ISO-8859-1, ASCII, and HTML (a specific handler for the conversion of UTF-8 to ASCII with HTML predefined entities like &copy; for the copyright sign). However, when compiled with iconv support, libxml and libxslt can handle the full range of encodings provided by iconv; these should cover most needs.</para> <para>libxml and libxslt can be used in multi-threaded applications. In MS-Windows they are linked against <filename>MSVCRT.DLL</filename> (or one of its descendants, as we saw <link linkend="windows-build">above</link>). In *NIX the pthreads (POSIX threads) library is used.</para> </sect1> <sect1><title>The Complete Program</title> <para> The complete program listing is given below. The program is also <ulink url="libxslt_pipes.c">available online</ulink>. </para> <para> <programlisting> <xi:include href="libxslt_pipes.c" parse="text" xmlns:xi="http://www.w3.org/2003/XInclude"/> </programlisting> </para> </sect1> </article>