public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19 9:06 Joseph S. Myers
0 siblings, 0 replies; 5+ messages in thread
From: Joseph S. Myers @ 2001-03-19 9:06 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR java/2319; it has been noted by GNATS.
From: "Joseph S. Myers" <jsm28@cam.ac.uk>
To: <tromey@redhat.com>
Cc: <gcc-gnats@gcc.gnu.org>, <gcc-bugs@gcc.gnu.org>
Subject: Re: java/2319: invalid UTF-8 sequences should be rejected
Date: Mon, 19 Mar 2001 17:00:47 +0000 (GMT)
On 19 Mar 2001 tromey@redhat.com wrote:
> Currently the compiler accepts invalid UTF-8 sequences
> when reading a file. Instead we ought to diagnose
> such sequences as errors.
Also note that the invalid sequences that should be rejected include
over-long sequences and UTF-8 encodings that would map to values in the
UTF-16 surrogate range.
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html
--
Joseph S. Myers
jsm28@cam.ac.uk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-06-20 9:23 tromey
0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-06-20 9:23 UTC (permalink / raw)
To: gcc-bugs, gcc-prs, java-prs, tromey, tromey
Synopsis: invalid UTF-8 sequences should be rejected
State-Changed-From-To: analyzed->closed
State-Changed-By: tromey
State-Changed-When: Wed Jun 20 09:23:10 2001
State-Changed-Why:
I've checked in the fix.
The fix only works if the built-in UTF-8 decoder is used.
If the system decoder is used, we're at its mercy.
The glibc UTF-8 decoder (on my RH 6.2 box) does not
give an error for invalid or overlong sequences.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=2319&database=gcc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-06-19 13:50 tromey
0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-06-19 13:50 UTC (permalink / raw)
To: gcc-bugs, gcc-prs, java-prs, nobody, tromey, tromey
Synopsis: invalid UTF-8 sequences should be rejected
Responsible-Changed-From-To: unassigned->tromey
Responsible-Changed-By: tromey
Responsible-Changed-When: Tue Jun 19 13:50:19 2001
Responsible-Changed-Why:
I'm handling it.
State-Changed-From-To: open->analyzed
State-Changed-By: tromey
State-Changed-When: Tue Jun 19 13:50:19 2001
State-Changed-Why:
I submitted a patch.
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=2319&database=gcc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19 9:16 Tom Tromey
0 siblings, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2001-03-19 9:16 UTC (permalink / raw)
To: nobody; +Cc: gcc-prs
The following reply was made to PR java/2319; it has been noted by GNATS.
From: Tom Tromey <tromey@redhat.com>
To: "Joseph S. Myers" <jsm28@cam.ac.uk>
Cc: <gcc-gnats@gcc.gnu.org>, <gcc-bugs@gcc.gnu.org>
Subject: Re: java/2319: invalid UTF-8 sequences should be rejected
Date: 19 Mar 2001 10:19:18 -0700
>>>>> "Joseph" == Joseph S Myers <jsm28@cam.ac.uk> writes:
Joseph> Also note that the invalid sequences that should be rejected
Joseph> include over-long sequences and UTF-8 encodings that would map
Joseph> to values in the UTF-16 surrogate range.
I agree, with the sole exception that I think we should accept the
Java form of \0. Java represents this as a two-byte sequence and it
seems reasonable that a Java compiler would accept this form.
Tom
^ permalink raw reply [flat|nested] 5+ messages in thread
* java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19 8:36 tromey
0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-03-19 8:36 UTC (permalink / raw)
To: gcc-gnats
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 882 bytes --]
>Number: 2319
>Category: java
>Synopsis: invalid UTF-8 sequences should be rejected
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: unassigned
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 19 08:36:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator: Tom Tromey
>Release: unknown-1.0
>Organization:
>Environment:
>Description:
Currently the compiler accepts invalid UTF-8 sequences
when reading a file. Instead we ought to diagnose
such sequences as errors.
Try compiling this Latin-1 encoded program with
--encoding=UTF-8 to see the problem:
public class Hello
{
public static void main ( String []arguments)
{
System.out.println ("Liberté, égalité, fraternité !");
}
}
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-06-20 9:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-19 9:06 java/2319: invalid UTF-8 sequences should be rejected Joseph S. Myers
-- strict thread matches above, loose matches on Subject: below --
2001-06-20 9:23 tromey
2001-06-19 13:50 tromey
2001-03-19 9:16 Tom Tromey
2001-03-19 8:36 tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).