From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mauve-discuss-return-522-listarch-mauve-discuss=sources.redhat.com@sources.redhat.com>
Received: (qmail 5197 invoked by alias); 10 Apr 2003 03:05:11 -0000
Mailing-List: contact mauve-discuss-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:mauve-discuss-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/mauve-discuss/>
List-Post: <mailto:mauve-discuss@sources.redhat.com>
List-Help: <mailto:mauve-discuss-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: mauve-discuss-owner@sources.redhat.com
Received: (qmail 5186 invoked from network); 10 Apr 2003 03:05:10 -0000
Received: from unknown (HELO ms-smtp-01.southeast.rr.com) (24.93.67.82)
  by sources.redhat.com with SMTP; 10 Apr 2003 03:05:10 -0000
Received: from mail5.nc.rr.com (fe5 [24.93.67.52])
	by ms-smtp-01.southeast.rr.com (8.12.5/8.12.2) with ESMTP id h3A30nhC004278;
	Wed, 9 Apr 2003 23:00:50 -0400 (EDT)
Received: from lyta.haphazard.org ([66.57.9.48]) by mail5.nc.rr.com  with Microsoft SMTPSVC(5.5.1877.757.75);
	 Wed, 9 Apr 2003 23:02:26 -0400
Received: by lyta.haphazard.org (Postfix, from userid 500)
	id ACB0ABE69E; Wed,  9 Apr 2003 23:06:09 -0400 (EDT)
To: Steve Murry <stmurr@unx.sas.com>
Cc: mauve-discuss@sources.redhat.com
Subject: Re: problems with "InputStreamReader.read" tests in "java.io.Utf8Encoding"
References: <200303271409.JAA21042@login006.unx.sas.com>
From: Brian Jones <cbj@gnu.org>
Date: Thu, 10 Apr 2003 03:05:00 -0000
In-Reply-To: <200303271409.JAA21042@login006.unx.sas.com>
Message-ID: <m3znmzqcf2.fsf@lyta.haphazard.org>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2003-q2/txt/msg00014.txt.bz2

Steve Murry <stmurr@unx.sas.com> writes:

> (I'm reposting my earlier email due to format problems. Sorry 'bout that) 
> 
> 
> Can someone check the 'negative' testcases within
> java.io.Utf8Encoding.mojo to see if these are valid tests or not?
> Specifically, I'm referring to the 9 testcases with data values that
> are declared 'test5_bytes' through 'test13_bytes'.  The testcases
> expect a CharEncodingException when decoding illegal UTF-8 byte
> strings.  In some cases the UTF-8 data is incorrect and in others it
> represents codepoints that have not yet been assigned (at least for
> Unicode 3.2).  As I read the Sun API description for
> InputStreamReader.read, I would expect either
> MalformedInputException or UTFDataFormatException to be thrown
> instead (the API description doesn't seem very precise in this
> area).  In fact, most of the platforms that we have run these
> testcases against do not throw any type of exception at all!  Only
> the IBM JREs throw the expected CharConversionException.  We also
> found the following paragraph in one of the Sun bug descriptions:
> 
> >This is a bug in the tests. The specification of
> >java.io.InputStreamReader does not require that an implementation
> >throw IOExceptions on malformed input when decoding bytes in the
> >UTF-8 charset. That our implementation has done this historically
> >is a bug that was fixed as part of 4503732.
> 
> I'm not sure I agree with this statement myself (BTW - I believe
> they are referring to one of their own internal tests, not Mauve),
> but I'm just trying to get Mauve's take on all of this.  I would
> appreciate anyone's thoughts.

I'm not a Character/UTF-8/Unicode expert so I don't think I can answer
your question.

Brian
-- 
Brian Jones <cbj@gnu.org>