public inbox for java@gcc.gnu.org
 help / color / mirror / Atom feed
From: Chris Burdess <dog@bluezoo.org>
To: Andrew Haley <aph@redhat.com>
Cc: java@gcc.gnu.org,  classpath@gnu.org,
	 Emmanuel Engelhart <emmanuel@engelhart.org>
Subject: Re: [SAXParser] org.xml.sax.SAXParseException: not a name start character:   "U+26"
Date: Mon, 22 Feb 2010 10:49:00 -0000	[thread overview]
Message-ID: <2EB0F850-53BF-43B2-887F-6195D7C7B8D8@bluezoo.org> (raw)
In-Reply-To: <4B768EBE.4030200@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]

Andrew Haley wrote:
> On 02/13/2010 11:24 AM, Emmanuel Engelhart wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Hi,
>> 
>> not sure to be on the right ML to report this behavior, please help me
>> to find the right place to do it if not... and sorry for the noise.
>> 
>> I use gcj on a LTS Ubuntu:
>> gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1
>> 
>> My sample code may be downloaded here as Test.java:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7115
>> 
>> and looks like that:
>> ================================
>> import javax.xml.parsers.SAXParser;
>> import javax.xml.parsers.SAXParserFactory;
>> import org.xml.sax.helpers.DefaultHandler;
>> 
>> public class Test {
>> 
>>    public static void main(String argv[]) {
>> 
>> 	try {
>> 
>> 	    SAXParserFactory factory = SAXParserFactory.newInstance();
>> 	    SAXParser saxParser = factory.newSAXParser();
>> 
>> 	    DefaultHandler handler = new DefaultHandler() {};
>> 	    saxParser.parse("test.xml", handler);
>> 
>> 	} catch (Exception e) {
>> 	    e.printStackTrace();
>> 	}
>>    }
>> }
>> ================================
>> 
>> I compile it like following:
>> gcj -o test --main=Test Test.java
>> 
>> My XML file "test.xml" may be downloaded here:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7114
>> 
>> By running the the binary I get the following error:
>> $ ./test
>> org.xml.sax.SAXParseException: not a name start character: "U+26"
>>   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>>   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>>   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>>   at Test.main(test)
>> Caused by: javax.xml.stream.XMLStreamException: not a name start
>> character: "U+26"
>>   at gnu.xml.stream.XMLParser.error(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.next(libgcj.so.10)
>>   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>>   ...3 more
>> 
>> Although the XML should be valid.
>> 
>> Has someone an idea to explain this behavior?
>> 
>> This "bug" impacts the Mediawiki mwdumper SW. T
>> The related bug (with more details) is available here:
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=22137
> 
> Mmmm, looks like it probably is a real bug.  If you remind me next
> week I'll have a look.

I reproduced and fixed this bug, see the attached patch. If someone could validate and commit it for me please do so, I don't have a complete working build environment at the moment.
-- 
Chris Burdess

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 759 bytes --]

Index: gnu/xml/stream/XMLParser.java
===================================================================
RCS file: /sources/classpath/classpath/gnu/xml/stream/XMLParser.java,v
retrieving revision 1.36
diff -u -r1.36 XMLParser.java
--- gnu/xml/stream/XMLParser.java	5 Feb 2009 20:46:23 -0000	1.36
+++ gnu/xml/stream/XMLParser.java	22 Feb 2010 10:45:18 -0000
@@ -3128,7 +3128,10 @@
                 break; // whitespace
               case 0x26: // '&'
                 reset();
-                read(tmpBuf, 0, i);
+                int off = 0;
+                do {
+                  off += read(tmpBuf, off, i - off);
+                } while (off < i);
                 // character reference?
                 mark(3);
                 c = readCh(); // &

      reply	other threads:[~2010-02-22 10:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-13 11:24 Emmanuel Engelhart
2010-02-13 11:36 ` Andrew Haley
2010-02-22 10:49   ` Chris Burdess [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2EB0F850-53BF-43B2-887F-6195D7C7B8D8@bluezoo.org \
    --to=dog@bluezoo.org \
    --cc=aph@redhat.com \
    --cc=classpath@gnu.org \
    --cc=dog@gnu.org \
    --cc=emmanuel@engelhart.org \
    --cc=java@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).