From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1818 invoked by alias); 22 Feb 2010 10:49:26 -0000 Received: (qmail 1805 invoked by uid 22791); 22 Feb 2010 10:49:25 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from dinopsis.arachsys.com (HELO dinopsis.arachsys.com) (91.203.57.6) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 22 Feb 2010 10:49:19 +0000 Received: from 193-115-241-84.worldsendstudios.mezzonet.net ([193.115.241.84] helo=[192.168.2.134]) by dinopsis.arachsys.com with asmtp (Exim 3.35 #1) id 1NjVqX-0000XH-00; Mon, 22 Feb 2010 10:49:01 +0000 Subject: Re: [SAXParser] org.xml.sax.SAXParseException: not a name start character: "U+26" Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: multipart/mixed; boundary=Apple-Mail-2-765731484 From: Chris Burdess In-Reply-To: <4B768EBE.4030200@redhat.com> Date: Mon, 22 Feb 2010 10:49:00 -0000 Cc: java@gcc.gnu.org, classpath@gnu.org, Emmanuel Engelhart Reply-To: dog@gnu.org Message-Id: <2EB0F850-53BF-43B2-887F-6195D7C7B8D8@bluezoo.org> References: <4B768BDA.9050405@engelhart.org> <4B768EBE.4030200@redhat.com> To: Andrew Haley Mailing-List: contact java-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: java-owner@gcc.gnu.org X-SW-Source: 2010-02/txt/msg00010.txt.bz2 --Apple-Mail-2-765731484 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Content-length: 2811 Andrew Haley wrote: > On 02/13/2010 11:24 AM, Emmanuel Engelhart wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >>=20 >> Hi, >>=20 >> not sure to be on the right ML to report this behavior, please help me >> to find the right place to do it if not... and sorry for the noise. >>=20 >> I use gcj on a LTS Ubuntu: >> gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1 >>=20 >> My sample code may be downloaded here as Test.java: >> https://bugzilla.wikimedia.org/attachment.cgi?id=3D7115 >>=20 >> and looks like that: >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D >> import javax.xml.parsers.SAXParser; >> import javax.xml.parsers.SAXParserFactory; >> import org.xml.sax.helpers.DefaultHandler; >>=20 >> public class Test { >>=20 >> public static void main(String argv[]) { >>=20 >> try { >>=20 >> SAXParserFactory factory =3D SAXParserFactory.newInstance(); >> SAXParser saxParser =3D factory.newSAXParser(); >>=20 >> DefaultHandler handler =3D new DefaultHandler() {}; >> saxParser.parse("test.xml", handler); >>=20 >> } catch (Exception e) { >> e.printStackTrace(); >> } >> } >> } >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D >>=20 >> I compile it like following: >> gcj -o test --main=3DTest Test.java >>=20 >> My XML file "test.xml" may be downloaded here: >> https://bugzilla.wikimedia.org/attachment.cgi?id=3D7114 >>=20 >> By running the the binary I get the following error: >> $ ./test >> org.xml.sax.SAXParseException: not a name start character: "U+26" >> at gnu.xml.stream.SAXParser.parse(libgcj.so.10) >> at javax.xml.parsers.SAXParser.parse(libgcj.so.10) >> at javax.xml.parsers.SAXParser.parse(libgcj.so.10) >> at Test.main(test) >> Caused by: javax.xml.stream.XMLStreamException: not a name start >> character: "U+26" >> at gnu.xml.stream.XMLParser.error(libgcj.so.10) >> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10) >> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10) >> at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10) >> at gnu.xml.stream.XMLParser.next(libgcj.so.10) >> at gnu.xml.stream.SAXParser.parse(libgcj.so.10) >> ...3 more >>=20 >> Although the XML should be valid. >>=20 >> Has someone an idea to explain this behavior? >>=20 >> This "bug" impacts the Mediawiki mwdumper SW. T >> The related bug (with more details) is available here: >> https://bugzilla.wikimedia.org/show_bug.cgi?id=3D22137 >=20 > Mmmm, looks like it probably is a real bug. If you remind me next > week I'll have a look. I reproduced and fixed this bug, see the attached patch. If someone could v= alidate and commit it for me please do so, I don't have a complete working = build environment at the moment. --=20 Chris Burdess --Apple-Mail-2-765731484 Content-Disposition: attachment; filename=patch Content-Type: application/octet-stream; name="patch" Content-Transfer-Encoding: 7bit Content-length: 759 Index: gnu/xml/stream/XMLParser.java =================================================================== RCS file: /sources/classpath/classpath/gnu/xml/stream/XMLParser.java,v retrieving revision 1.36 diff -u -r1.36 XMLParser.java --- gnu/xml/stream/XMLParser.java 5 Feb 2009 20:46:23 -0000 1.36 +++ gnu/xml/stream/XMLParser.java 22 Feb 2010 10:45:18 -0000 @@ -3128,7 +3128,10 @@ break; // whitespace case 0x26: // '&' reset(); - read(tmpBuf, 0, i); + int off = 0; + do { + off += read(tmpBuf, off, i - off); + } while (off < i); // character reference? mark(3); c = readCh(); // & --Apple-Mail-2-765731484--