Re: java/2319: invalid UTF-8 sequences should be rejected

public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed

* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19  9:06 Joseph S. Myers
  0 siblings, 0 replies; 5+ messages in thread
From: Joseph S. Myers @ 2001-03-19  9:06 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR java/2319; it has been noted by GNATS.

From: "Joseph S. Myers" <jsm28@cam.ac.uk>
To: <tromey@redhat.com>
Cc: <gcc-gnats@gcc.gnu.org>,  <gcc-bugs@gcc.gnu.org>
Subject: Re: java/2319: invalid UTF-8 sequences should be rejected
Date: Mon, 19 Mar 2001 17:00:47 +0000 (GMT)

 On 19 Mar 2001 tromey@redhat.com wrote:

 > Currently the compiler accepts invalid UTF-8 sequences
 > when reading a file.  Instead we ought to diagnose
 > such sequences as errors.

 Also note that the invalid sequences that should be rejected include
 over-long sequences and UTF-8 encodings that would map to values in the
 UTF-16 surrogate range.

 http://www.cl.cam.ac.uk/~mgk25/unicode.html
 http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html

 -- 
 Joseph S. Myers
 jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-06-20  9:23 tromey
  0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-06-20  9:23 UTC (permalink / raw)
  To: gcc-bugs, gcc-prs, java-prs, tromey, tromey

Synopsis: invalid UTF-8 sequences should be rejected

State-Changed-From-To: analyzed->closed
State-Changed-By: tromey
State-Changed-When: Wed Jun 20 09:23:10 2001
State-Changed-Why:
    I've checked in the fix.
    The fix only works if the built-in UTF-8 decoder is used.
    If the system decoder is used, we're at its mercy.
    The glibc UTF-8 decoder (on my RH 6.2 box) does not
    give an error for invalid or overlong sequences.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=2319&database=gcc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-06-19 13:50 tromey
  0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-06-19 13:50 UTC (permalink / raw)
  To: gcc-bugs, gcc-prs, java-prs, nobody, tromey, tromey

Synopsis: invalid UTF-8 sequences should be rejected

Responsible-Changed-From-To: unassigned->tromey
Responsible-Changed-By: tromey
Responsible-Changed-When: Tue Jun 19 13:50:19 2001
Responsible-Changed-Why:
    I'm handling it.
State-Changed-From-To: open->analyzed
State-Changed-By: tromey
State-Changed-When: Tue Jun 19 13:50:19 2001
State-Changed-Why:
    I submitted a patch.

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=2319&database=gcc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19  9:16 Tom Tromey
  0 siblings, 0 replies; 5+ messages in thread
From: Tom Tromey @ 2001-03-19  9:16 UTC (permalink / raw)
  To: nobody; +Cc: gcc-prs

The following reply was made to PR java/2319; it has been noted by GNATS.

From: Tom Tromey <tromey@redhat.com>
To: "Joseph S. Myers" <jsm28@cam.ac.uk>
Cc: <gcc-gnats@gcc.gnu.org>, <gcc-bugs@gcc.gnu.org>
Subject: Re: java/2319: invalid UTF-8 sequences should be rejected
Date: 19 Mar 2001 10:19:18 -0700

 >>>>> "Joseph" == Joseph S Myers <jsm28@cam.ac.uk> writes:

 Joseph> Also note that the invalid sequences that should be rejected
 Joseph> include over-long sequences and UTF-8 encodings that would map
 Joseph> to values in the UTF-16 surrogate range.

 I agree, with the sole exception that I think we should accept the
 Java form of \0.  Java represents this as a two-byte sequence and it
 seems reasonable that a Java compiler would accept this form.

 Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* java/2319: invalid UTF-8 sequences should be rejected
@ 2001-03-19  8:36 tromey
  0 siblings, 0 replies; 5+ messages in thread
From: tromey @ 2001-03-19  8:36 UTC (permalink / raw)
  To: gcc-gnats

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

>Number:         2319
>Category:       java
>Synopsis:       invalid UTF-8 sequences should be rejected
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 19 08:36:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Tom Tromey
>Release:        unknown-1.0
>Organization:
>Environment:

>Description:
Currently the compiler accepts invalid UTF-8 sequences
when reading a file.  Instead we ought to diagnose
such sequences as errors.

Try compiling this Latin-1 encoded program with
--encoding=UTF-8 to see the problem:

public class Hello
{
  public static void main ( String []arguments)
  {
    System.out.println ("LibertÃ©, Ã©galitÃ©, fraternitÃ© !");
  }
}
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-06-20  9:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-19  9:06 java/2319: invalid UTF-8 sequences should be rejected Joseph S. Myers
  -- strict thread matches above, loose matches on Subject: below --
2001-06-20  9:23 tromey
2001-06-19 13:50 tromey
2001-03-19  9:16 Tom Tromey
2001-03-19  8:36 tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).