From: Chris Burdess <dog@bluezoo.org>
To: Andrew Haley <aph@redhat.com>
Cc: java@gcc.gnu.org, classpath@gnu.org,
Emmanuel Engelhart <emmanuel@engelhart.org>
Subject: Re: [SAXParser] org.xml.sax.SAXParseException: not a name start character: "U+26"
Date: Mon, 22 Feb 2010 10:49:00 -0000 [thread overview]
Message-ID: <2EB0F850-53BF-43B2-887F-6195D7C7B8D8@bluezoo.org> (raw)
In-Reply-To: <4B768EBE.4030200@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]
Andrew Haley wrote:
> On 02/13/2010 11:24 AM, Emmanuel Engelhart wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>>
>> not sure to be on the right ML to report this behavior, please help me
>> to find the right place to do it if not... and sorry for the noise.
>>
>> I use gcj on a LTS Ubuntu:
>> gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1
>>
>> My sample code may be downloaded here as Test.java:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7115
>>
>> and looks like that:
>> ================================
>> import javax.xml.parsers.SAXParser;
>> import javax.xml.parsers.SAXParserFactory;
>> import org.xml.sax.helpers.DefaultHandler;
>>
>> public class Test {
>>
>> public static void main(String argv[]) {
>>
>> try {
>>
>> SAXParserFactory factory = SAXParserFactory.newInstance();
>> SAXParser saxParser = factory.newSAXParser();
>>
>> DefaultHandler handler = new DefaultHandler() {};
>> saxParser.parse("test.xml", handler);
>>
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> }
>> }
>> ================================
>>
>> I compile it like following:
>> gcj -o test --main=Test Test.java
>>
>> My XML file "test.xml" may be downloaded here:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7114
>>
>> By running the the binary I get the following error:
>> $ ./test
>> org.xml.sax.SAXParseException: not a name start character: "U+26"
>> at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>> at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>> at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>> at Test.main(test)
>> Caused by: javax.xml.stream.XMLStreamException: not a name start
>> character: "U+26"
>> at gnu.xml.stream.XMLParser.error(libgcj.so.10)
>> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>> at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>> at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10)
>> at gnu.xml.stream.XMLParser.next(libgcj.so.10)
>> at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>> ...3 more
>>
>> Although the XML should be valid.
>>
>> Has someone an idea to explain this behavior?
>>
>> This "bug" impacts the Mediawiki mwdumper SW. T
>> The related bug (with more details) is available here:
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=22137
>
> Mmmm, looks like it probably is a real bug. If you remind me next
> week I'll have a look.
I reproduced and fixed this bug, see the attached patch. If someone could validate and commit it for me please do so, I don't have a complete working build environment at the moment.
--
Chris Burdess
[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 759 bytes --]
Index: gnu/xml/stream/XMLParser.java
===================================================================
RCS file: /sources/classpath/classpath/gnu/xml/stream/XMLParser.java,v
retrieving revision 1.36
diff -u -r1.36 XMLParser.java
--- gnu/xml/stream/XMLParser.java 5 Feb 2009 20:46:23 -0000 1.36
+++ gnu/xml/stream/XMLParser.java 22 Feb 2010 10:45:18 -0000
@@ -3128,7 +3128,10 @@
break; // whitespace
case 0x26: // '&'
reset();
- read(tmpBuf, 0, i);
+ int off = 0;
+ do {
+ off += read(tmpBuf, off, i - off);
+ } while (off < i);
// character reference?
mark(3);
c = readCh(); // &
prev parent reply other threads:[~2010-02-22 10:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-13 11:24 Emmanuel Engelhart
2010-02-13 11:36 ` Andrew Haley
2010-02-22 10:49 ` Chris Burdess [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2EB0F850-53BF-43B2-887F-6195D7C7B8D8@bluezoo.org \
--to=dog@bluezoo.org \
--cc=aph@redhat.com \
--cc=classpath@gnu.org \
--cc=dog@gnu.org \
--cc=emmanuel@engelhart.org \
--cc=java@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).