public inbox for java@gcc.gnu.org
 help / color / mirror / Atom feed
* [SAXParser] org.xml.sax.SAXParseException: not a name start character:  "U+26"
@ 2010-02-13 11:24 Emmanuel Engelhart
  2010-02-13 11:36 ` Andrew Haley
  0 siblings, 1 reply; 3+ messages in thread
From: Emmanuel Engelhart @ 2010-02-13 11:24 UTC (permalink / raw)
  To: java

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

not sure to be on the right ML to report this behavior, please help me
to find the right place to do it if not... and sorry for the noise.

I use gcj on a LTS Ubuntu:
gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1

My sample code may be downloaded here as Test.java:
https://bugzilla.wikimedia.org/attachment.cgi?id=7115

and looks like that:
================================
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;

public class Test {

    public static void main(String argv[]) {

	try {

	    SAXParserFactory factory = SAXParserFactory.newInstance();
	    SAXParser saxParser = factory.newSAXParser();

	    DefaultHandler handler = new DefaultHandler() {};
 	    saxParser.parse("test.xml", handler);

	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}
================================

I compile it like following:
gcj -o test --main=Test Test.java

My XML file "test.xml" may be downloaded here:
https://bugzilla.wikimedia.org/attachment.cgi?id=7114

By running the the binary I get the following error:
$ ./test
org.xml.sax.SAXParseException: not a name start character: "U+26"
   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
   at Test.main(test)
Caused by: javax.xml.stream.XMLStreamException: not a name start
character: "U+26"
   at gnu.xml.stream.XMLParser.error(libgcj.so.10)
   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
   at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10)
   at gnu.xml.stream.XMLParser.next(libgcj.so.10)
   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
   ...3 more

Although the XML should be valid.

Has someone an idea to explain this behavior?

This "bug" impacts the Mediawiki mwdumper SW. T
The related bug (with more details) is available here:
https://bugzilla.wikimedia.org/show_bug.cgi?id=22137

Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkt2i9gACgkQn3IpJRpNWtPy7ACeMKYcXxFh5l1T28KCA2uen5Qs
DOwAoJsTg6aHlEQWFJX1yFR29IJx50/i
=HxoD
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [SAXParser] org.xml.sax.SAXParseException: not a name start character:   "U+26"
  2010-02-13 11:24 [SAXParser] org.xml.sax.SAXParseException: not a name start character: "U+26" Emmanuel Engelhart
@ 2010-02-13 11:36 ` Andrew Haley
  2010-02-22 10:49   ` Chris Burdess
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Haley @ 2010-02-13 11:36 UTC (permalink / raw)
  To: java; +Cc: classpath

On 02/13/2010 11:24 AM, Emmanuel Engelhart wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> 
> not sure to be on the right ML to report this behavior, please help me
> to find the right place to do it if not... and sorry for the noise.
> 
> I use gcj on a LTS Ubuntu:
> gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1
> 
> My sample code may be downloaded here as Test.java:
> https://bugzilla.wikimedia.org/attachment.cgi?id=7115
> 
> and looks like that:
> ================================
> import javax.xml.parsers.SAXParser;
> import javax.xml.parsers.SAXParserFactory;
> import org.xml.sax.helpers.DefaultHandler;
> 
> public class Test {
> 
>     public static void main(String argv[]) {
> 
> 	try {
> 
> 	    SAXParserFactory factory = SAXParserFactory.newInstance();
> 	    SAXParser saxParser = factory.newSAXParser();
> 
> 	    DefaultHandler handler = new DefaultHandler() {};
>  	    saxParser.parse("test.xml", handler);
> 
> 	} catch (Exception e) {
> 	    e.printStackTrace();
> 	}
>     }
> }
> ================================
> 
> I compile it like following:
> gcj -o test --main=Test Test.java
> 
> My XML file "test.xml" may be downloaded here:
> https://bugzilla.wikimedia.org/attachment.cgi?id=7114
> 
> By running the the binary I get the following error:
> $ ./test
> org.xml.sax.SAXParseException: not a name start character: "U+26"
>    at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>    at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>    at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>    at Test.main(test)
> Caused by: javax.xml.stream.XMLStreamException: not a name start
> character: "U+26"
>    at gnu.xml.stream.XMLParser.error(libgcj.so.10)
>    at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>    at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>    at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10)
>    at gnu.xml.stream.XMLParser.next(libgcj.so.10)
>    at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>    ...3 more
> 
> Although the XML should be valid.
> 
> Has someone an idea to explain this behavior?
> 
> This "bug" impacts the Mediawiki mwdumper SW. T
> The related bug (with more details) is available here:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=22137

Mmmm, looks like it probably is a real bug.  If you remind me next
week I'll have a look.

Andrew.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [SAXParser] org.xml.sax.SAXParseException: not a name start character:   "U+26"
  2010-02-13 11:36 ` Andrew Haley
@ 2010-02-22 10:49   ` Chris Burdess
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Burdess @ 2010-02-22 10:49 UTC (permalink / raw)
  To: Andrew Haley; +Cc: java, classpath, Emmanuel Engelhart

[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]

Andrew Haley wrote:
> On 02/13/2010 11:24 AM, Emmanuel Engelhart wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Hi,
>> 
>> not sure to be on the right ML to report this behavior, please help me
>> to find the right place to do it if not... and sorry for the noise.
>> 
>> I use gcj on a LTS Ubuntu:
>> gcj (Ubuntu 4.4.1-5ubuntu2) 4.4.1
>> 
>> My sample code may be downloaded here as Test.java:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7115
>> 
>> and looks like that:
>> ================================
>> import javax.xml.parsers.SAXParser;
>> import javax.xml.parsers.SAXParserFactory;
>> import org.xml.sax.helpers.DefaultHandler;
>> 
>> public class Test {
>> 
>>    public static void main(String argv[]) {
>> 
>> 	try {
>> 
>> 	    SAXParserFactory factory = SAXParserFactory.newInstance();
>> 	    SAXParser saxParser = factory.newSAXParser();
>> 
>> 	    DefaultHandler handler = new DefaultHandler() {};
>> 	    saxParser.parse("test.xml", handler);
>> 
>> 	} catch (Exception e) {
>> 	    e.printStackTrace();
>> 	}
>>    }
>> }
>> ================================
>> 
>> I compile it like following:
>> gcj -o test --main=Test Test.java
>> 
>> My XML file "test.xml" may be downloaded here:
>> https://bugzilla.wikimedia.org/attachment.cgi?id=7114
>> 
>> By running the the binary I get the following error:
>> $ ./test
>> org.xml.sax.SAXParseException: not a name start character: "U+26"
>>   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>>   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>>   at javax.xml.parsers.SAXParser.parse(libgcj.so.10)
>>   at Test.main(test)
>> Caused by: javax.xml.stream.XMLStreamException: not a name start
>> character: "U+26"
>>   at gnu.xml.stream.XMLParser.error(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readNmtoken(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.readCharData(libgcj.so.10)
>>   at gnu.xml.stream.XMLParser.next(libgcj.so.10)
>>   at gnu.xml.stream.SAXParser.parse(libgcj.so.10)
>>   ...3 more
>> 
>> Although the XML should be valid.
>> 
>> Has someone an idea to explain this behavior?
>> 
>> This "bug" impacts the Mediawiki mwdumper SW. T
>> The related bug (with more details) is available here:
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=22137
> 
> Mmmm, looks like it probably is a real bug.  If you remind me next
> week I'll have a look.

I reproduced and fixed this bug, see the attached patch. If someone could validate and commit it for me please do so, I don't have a complete working build environment at the moment.
-- 
Chris Burdess

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 759 bytes --]

Index: gnu/xml/stream/XMLParser.java
===================================================================
RCS file: /sources/classpath/classpath/gnu/xml/stream/XMLParser.java,v
retrieving revision 1.36
diff -u -r1.36 XMLParser.java
--- gnu/xml/stream/XMLParser.java	5 Feb 2009 20:46:23 -0000	1.36
+++ gnu/xml/stream/XMLParser.java	22 Feb 2010 10:45:18 -0000
@@ -3128,7 +3128,10 @@
                 break; // whitespace
               case 0x26: // '&'
                 reset();
-                read(tmpBuf, 0, i);
+                int off = 0;
+                do {
+                  off += read(tmpBuf, off, i - off);
+                } while (off < i);
                 // character reference?
                 mark(3);
                 c = readCh(); // &

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-02-22 10:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-13 11:24 [SAXParser] org.xml.sax.SAXParseException: not a name start character: "U+26" Emmanuel Engelhart
2010-02-13 11:36 ` Andrew Haley
2010-02-22 10:49   ` Chris Burdess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).