public inbox for docbook-tools-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Ismael Olea <ismael@olea.org>
To: docbook-tools-discuss <docbook-tools-discuss@sources.redhat.com>
Subject: XHTML support for xmlo
Date: Tue, 11 Jan 2005 09:49:00 -0000	[thread overview]
Message-ID: <1105436926.32289.118.camel@lisergia> (raw)

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

Hi Tim:

I've just extended xmlto to convert XHTML files into XSL:FO/PDF/txt.
I've packaged an xhtml2fo stylesheet from Antennahouse.

Included with this message you get the patch and modified spec file. You
can get RPM's from my website:

http://www.olea.org/paquetes-rpm/xmlto-0.0.18-4_1olea.src.rpm
http://www.olea.org/paquetes-rpm/xmlto-0.0.18-4_1olea.i386.rpm
http://www.olea.org/paquetes-rpm/xhtml2fo-style-xsl-20050106-1.src.rpm
http://www.olea.org/paquetes-rpm/xhtml2fo-style-xsl-20050106-1.noarch.rpm

The new stylesheet is not perfect but offers a great feature and my test
doesn't show any operation problem, so IMHO is ready for the use.

The integration had been relatively easy cause the nice xmlto
architecture. Now will be a bit more easy to add new source formats.

Honestly, would be beautiful to see then published in future Fedora Core
releases :-)





-- 

        A. Ismael Olea González
 
        mailto:ismael@olea.org  http://www.olea.org
        http://aduaneros.olea.org, la ONG sin futuro.
 
        El mundo debe empezar a tener miedo a un planeta OLEA


[-- Attachment #2: xmlto-xhtml2fo.patch --]
[-- Type: text/x-patch, Size: 5676 bytes --]

diff -Naur xmlto-0.0.18-orig/format/xhtml1/dvi xmlto-0.0.18/format/xhtml1/dvi
--- xmlto-0.0.18-orig/format/xhtml1/dvi	1970-01-01 01:00:00.000000000 +0100
+++ xmlto-0.0.18/format/xhtml1/dvi	2005-01-10 23:47:40.000000000 +0100
@@ -0,0 +1,13 @@
+case "$1" in
+stylesheet)
+  if [ "$VERBOSE" -ge 1 ]
+  then
+    echo >&2 "Convert to XSL-FO"
+  fi
+  echo "http://www.antennahouse.com/XSLsample/sample-xsl-xhtml2fo/xhtml2fo.xsl"
+  ;;
+post-process)
+  # Get the FO format script to do the rest
+  sh "$(dirname "$0")/../fo/$(basename "$0")" "$1"
+  ;;
+esac
diff -Naur xmlto-0.0.18-orig/format/xhtml1/fo xmlto-0.0.18/format/xhtml1/fo
--- xmlto-0.0.18-orig/format/xhtml1/fo	1970-01-01 01:00:00.000000000 +0100
+++ xmlto-0.0.18/format/xhtml1/fo	2005-01-10 23:48:44.000000000 +0100
@@ -0,0 +1,12 @@
+case "$1" in
+stylesheet)
+  if [ "$VERBOSE" -ge 1 ]
+  then
+    echo >&2 "Convert to XSL-FO"
+  fi
+  echo "http://www.antennahouse.com/XSLsample/sample-xsl-xhtml2fo/xhtml2fo.xsl"
+  ;;
+post-process)
+  cp "$XSLT_PROCESSED" "$OUTPUT_DIR/$(basename ${XSLT_PROCESSED%.*}).fo"
+  ;;
+esac
diff -Naur xmlto-0.0.18-orig/format/xhtml1/pdf xmlto-0.0.18/format/xhtml1/pdf
--- xmlto-0.0.18-orig/format/xhtml1/pdf	1970-01-01 01:00:00.000000000 +0100
+++ xmlto-0.0.18/format/xhtml1/pdf	2005-01-10 23:49:22.000000000 +0100
@@ -0,0 +1,13 @@
+case "$1" in
+stylesheet)
+  if [ "$VERBOSE" -ge 1 ]
+  then
+    echo >&2 "Convert to XSL-FO"
+  fi
+  echo "http://www.antennahouse.com/XSLsample/sample-xsl-xhtml2fo/xhtml2fo.xsl"
+  ;;
+post-process)
+  # Get the FO format script to do the rest
+  sh "$(dirname "$0")/../fo/$(basename "$0")" "$1"
+  ;;
+esac
diff -Naur xmlto-0.0.18-orig/format/xhtml1/ps xmlto-0.0.18/format/xhtml1/ps
--- xmlto-0.0.18-orig/format/xhtml1/ps	1970-01-01 01:00:00.000000000 +0100
+++ xmlto-0.0.18/format/xhtml1/ps	2005-01-10 23:49:32.000000000 +0100
@@ -0,0 +1,13 @@
+case "$1" in
+stylesheet)
+  if [ "$VERBOSE" -ge 1 ]
+  then
+    echo >&2 "Convert to XSL-FO"
+  fi
+  echo "http://www.antennahouse.com/XSLsample/sample-xsl-xhtml2fo/xhtml2fo.xsl"
+  ;;
+post-process)
+  # Get the FO format script to do the rest
+  sh "$(dirname "$0")/../fo/$(basename "$0")" "$1"
+  ;;
+esac
diff -Naur xmlto-0.0.18-orig/format/xhtml1/txt xmlto-0.0.18/format/xhtml1/txt
--- xmlto-0.0.18-orig/format/xhtml1/txt	1970-01-01 01:00:00.000000000 +0100
+++ xmlto-0.0.18/format/xhtml1/txt	2005-01-10 23:50:43.000000000 +0100
@@ -0,0 +1,27 @@
+if [ -x /usr/bin/w3m ]
+then
+  CONVERT=/usr/bin/w3m
+  ARGS="-T text/html -dump"
+elif [ -x /usr/bin/lynx ]
+then
+  CONVERT=/usr/bin/lynx
+  ARGS="-force_html -dump -nolist -width=72"
+elif [ -x /usr/bin/links ]
+then
+  CONVERT=/usr/bin/links
+  ARGS="-dump"
+else
+  echo >&2 "No way to convert HTML to text found."
+  exit 1
+fi
+
+case "$1" in
+post-process)
+  if [ "$VERBOSE" -ge 1 ]
+  then
+    echo >&2 "Convert HTML to ASCII"
+  fi
+  ${CONVERT} ${ARGS} ${POSTARGS} ${XSLT_PROCESSED} > \
+   "$OUTPUT_DIR/$(basename ${XSLT_PROCESSED%.*}).txt"
+  ;;
+esac
diff -Naur xmlto-0.0.18-orig/Makefile.am xmlto-0.0.18/Makefile.am
--- xmlto-0.0.18-orig/Makefile.am	2003-10-30 23:50:36.000000000 +0100
+++ xmlto-0.0.18/Makefile.am	2005-01-10 23:40:14.000000000 +0100
@@ -19,6 +19,11 @@
 	format/fo/dvi \
 	format/fo/pdf \
 	format/fo/ps \
+	format/xhtml1/fo \
+	format/xhtml1/pdf \
+	format/xhtml1/ps \
+	format/xhtml1/dvi \
+	format/xhtml1/txt \
 	xmlto.mak
 
 EXTRA_DIST = xmlto.spec \
@@ -38,6 +43,11 @@
 	format/fo/dvi \
 	format/fo/pdf \
 	format/fo/ps \
+	format/xhtml1/fo \
+	format/xhtml1/pdf \
+	format/xhtml1/ps \
+	format/xhtml1/dvi \
+	format/xhtml1/txt \
 	doc/xmlto.xml \
 	doc/xmlif.xml \
 	xmlto.mak \
diff -Naur xmlto-0.0.18-orig/Makefile.in xmlto-0.0.18/Makefile.in
--- xmlto-0.0.18-orig/Makefile.in	2004-01-21 12:07:48.000000000 +0100
+++ xmlto-0.0.18/Makefile.in	2005-01-10 23:40:14.000000000 +0100
@@ -184,6 +184,11 @@
 	format/fo/dvi \
 	format/fo/pdf \
 	format/fo/ps \
+	format/xhtml1/fo \
+	format/xhtml1/pdf \
+	format/xhtml1/ps \
+	format/xhtml1/dvi \
+	format/xhtml1/txt \
 	xmlto.mak
 
 EXTRA_DIST = xmlto.spec \
@@ -203,6 +208,11 @@
 	format/fo/dvi \
 	format/fo/pdf \
 	format/fo/ps \
+	format/xhtml1/fo \
+	format/xhtml1/pdf \
+	format/xhtml1/ps \
+	format/xhtml1/dvi \
+	format/xhtml1/txt \
 	doc/xmlto.xml \
 	doc/xmlif.xml \
 	xmlto.mak \
diff -Naur xmlto-0.0.18-orig/xmlto.in xmlto-0.0.18/xmlto.in
--- xmlto-0.0.18-orig/xmlto.in	2004-01-02 13:03:24.000000000 +0100
+++ xmlto-0.0.18/xmlto.in	2005-01-11 02:04:03.000000000 +0100
@@ -247,15 +247,28 @@
   exit 1
 fi
 
+
+[ ! -e "$INPUT_FILE" ] && exit 1
+
 # Decide what source format this is.  Default to DocBook.
-rootel=$(head -n 2 "$INPUT_FILE" | \
-     sed -e 's/^<?[^?>]*?>//g' -e 's/^<![^>]*>//g' -e 's/^<\([^ ]*\).*$/\1/')
+#rootel=$(head -n 2 "$INPUT_FILE" | \
+#     sed -e 's/^<?[^?>]*?>//g' -e 's/^<![^>]*>//g' -e 's/^<\([^ ]*\).*$/\1/')
+
+# Seems reasonable fix the file command and teach it to identify the DTD/Schema but this is faster to write:
+rootel=$(echo "xpath *" | xmllint --shell $INPUT_FILE  2> /dev/null |head -n 3 |tail -n 1 | cut -f 4 -d " " )
+
 case $(echo $rootel) in
+html)
+	SOURCE_FORMAT="xhtml1"
+	;;
 fo:root)
 	SOURCE_FORMAT="fo"
 	;;
+article|book|part|refentry|set)
+	SOURCE_FORMAT="docbook"
+	;;
 esac
-[ ! -e "$INPUT_FILE" ] && exit 1
+[ "$VERBOSE" -ge 1 ] && echo >&2 "Source format: ${SOURCE_FORMAT}"
 
 # If the destination format is an absolute pathname then it's a
 # user-defined format script.  Otherwise it's one of ours.

[-- Attachment #3: xmlto.spec --]
[-- Type: text/plain, Size: 4396 bytes --]

%{!?tetex:%define tetex 1}

Summary: A tool for converting XML files to various formats.
Name: xmlto
Version: 0.0.18
Release: 4_1olea
License: GPL
Group: Applications/System
URL: http://cyberelk.net/tim/xmlto/
Source0: ftp://cyberelk.net/tim/data/xmlto/stable/%{name}-%{version}.tar.bz2
Patch0: xmlto-xhtml2fo.patch
BuildRoot: %{_tmppath}/%{name}-%{version}-buildroot

BuildRequires: docbook-xsl >= 1.56.0
BuildRequires: libxslt

# We rely heavily on the DocBook XSL stylesheets!
Requires: docbook-xsl >= 1.56.0

# For full functionality, we need passivetex.
%if %{tetex}
Requires: passivetex >= 1.11
%endif
Requires: libxslt
Requires: docbook-dtds
Requires: libxml
Requires: xhtml2fo-style-xsl

%description
This is a package for converting XML files to various formats using XSL
stylesheets.

%prep
%setup -q
%patch -p1

%build
%configure
make
make check

%install
rm -rf %{buildroot}
%makeinstall

%clean
rm -rf %{buildroot}

%files
%defattr(-,root,root)
%{_bindir}/*
%{_mandir}/*/*
%{_datadir}/xmlto

%changelog
* Tue Jan 11 2005 Ismael Olea <ismael@olea.org>
- Added xhtml2fo stylesheet.

* Thu Jul  1 2004 Tim Waugh <twaugh@redhat.com> 0.0.18-4
- Magic encoding is enabled again (bug #126921).

* Tue Jun 15 2004 Elliot Lee <sopwith@redhat.com>
- rebuilt

* Fri Feb 13 2004 Elliot Lee <sopwith@redhat.com>
- rebuilt

* Wed Jan 21 2004 Tim Waugh <twaugh@redhat.com> 0.0.18-1
- 0.0.18.

* Mon Dec  1 2003 Tim Waugh <twaugh@redhat.com> 0.0.17-1
- 0.0.17.

* Tue Nov 18 2003 Tim Waugh <twaugh@redhat.com> 0.0.16-1
- 0.0.16.

* Tue Oct  7 2003 Tim Waugh <twaugh@redhat.com> 0.0.15-1
- 0.0.15.

* Tue Sep 23 2003 Florian La Roche <Florian.LaRoche@redhat.de>
- allow compiling without tetex(passivetex) dependency

* Tue Jun 17 2003 Tim Waugh <twaugh@redhat.com> 0.0.14-3
- Rebuilt.

* Wed Jun 04 2003 Elliot Lee <sopwith@redhat.com>
- rebuilt

* Fri May 23 2003 Tim Waugh <twaugh@redhat.com> 0.0.14-1
- 0.0.14.

* Sun May 11 2003 Tim Waugh <twaugh@redhat.com> 0.0.13-1
- 0.0.13.

* Wed Jan 22 2003 Tim Powers <timp@redhat.com>
- rebuilt

* Fri Jan  3 2003 Tim Waugh <twaugh@redhat.com> 0.0.12-2
- Disable magic encoding detection, since the stylesheets don't handle
  it well at all (bug #80732).

* Thu Dec 12 2002 Tim Waugh <twaugh@redhat.com> 0.0.12-1
- 0.0.12.

* Wed Oct 16 2002 Tim Waugh <twaugh@redhat.com> 0.0.11-1
- 0.0.11.
- xmlto.mak no longer needed.
- CVS patch no longer needed.
- Update docbook-xsl requirement.
- Ship xmlif.
- Run tests.
- No longer a noarch package.

* Tue Jul  9 2002 Tim Waugh <twaugh@redhat.com> 0.0.10-4
- Ship xmlto.mak.

* Thu Jun 27 2002 Tim Waugh <twaugh@redhat.com> 0.0.10-3
- Some db2man improvements from CVS.

* Fri Jun 21 2002 Tim Powers <timp@redhat.com> 0.0.10-2
- automated rebuild

* Tue Jun 18 2002 Tim Waugh <twaugh@redhat.com> 0.0.10-1
- 0.0.10.
- No longer need texinputs patch.

* Tue Jun 18 2002 Tim Waugh <twaugh@redhat.com> 0.0.9-3
- Fix TEXINPUTS problem with ps and dvi backends.

* Thu May 23 2002 Tim Powers <timp@redhat.com> 0.0.9-2
- automated rebuild

* Wed May  1 2002 Tim Waugh <twaugh@redhat.com> 0.0.9-1
- 0.0.9.
- The nonet patch is no longer needed.

* Fri Apr 12 2002 Tim Waugh <twaugh@redhat.com> 0.0.8-3
- Don't fetch entities over the network.

* Thu Feb 21 2002 Tim Waugh <twaugh@redhat.com> 0.0.8-2
- Rebuild in new environment.

* Tue Feb 12 2002 Tim Waugh <twaugh@redhat.com> 0.0.8-1
- 0.0.8.

* Fri Jan 25 2002 Tim Waugh <twaugh@redhat.com> 0.0.7-2
- Require the DocBook DTDs.

* Mon Jan 21 2002 Tim Waugh <twaugh@redhat.com> 0.0.7-1
- 0.0.7 (bug #58624, bug #58625).

* Wed Jan 16 2002 Tim Waugh <twaugh@redhat.com> 0.0.6-1
- 0.0.6.

* Wed Jan 09 2002 Tim Powers <timp@redhat.com> 0.0.5-4
- automated rebuild

* Wed Jan  9 2002 Tim Waugh <twaugh@redhat.com> 0.0.5-3
- 0.0.6pre2.

* Wed Jan  9 2002 Tim Waugh <twaugh@redhat.com> 0.0.5-2
- 0.0.6pre1.

* Tue Jan  8 2002 Tim Waugh <twaugh@redhat.com> 0.0.5-1
- 0.0.5.

* Mon Dec 17 2001 Tim Waugh <twaugh@redhat.com> 0.0.4-2
- 0.0.4.
- Apply patch from CVS to fix silly typos.

* Sat Dec  8 2001 Tim Waugh <twaugh@redhat.com> 0.0.3-1
- 0.0.3.

* Wed Dec  5 2001 Tim Waugh <twaugh@redhat.com>
- Built for Red Hat Linux.

* Fri Nov 23 2001 Tim Waugh <twaugh@redhat.com>
- Initial spec file.

                 reply	other threads:[~2005-01-11  9:49 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1105436926.32289.118.camel@lisergia \
    --to=ismael@olea.org \
    --cc=docbook-tools-discuss@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).