public inbox for mauve-discuss@sourceware.org
 help / color / mirror / Atom feed
* new test cases (long)
@ 2003-02-08 16:15 Raif S. Naffah
  2003-02-14 11:57 ` Mark Wielaard
  0 siblings, 1 reply; 6+ messages in thread
From: Raif S. Naffah @ 2003-02-08 16:15 UTC (permalink / raw)
  To: Mauve

[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

hello there,

pls find 2 new classes for review.

the tests are to ensure that the mandated (as per public Javadoc 1.3.1
and 1.4.1) minimal character encodings are supported by the bytecode
interpreter.

running a minimal Mauve with sun's jdk1.4.1, yields:

gnu.testlet.java.lang.String.getBytes14
- ----
PASS: gnu.testlet.java.lang.String.getBytes14: String.getBytes("ISO8859_15") (number 1)
gnu.testlet.java.lang.String.getBytes13
- ----
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("ASCII") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("Cp1252") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("ISO8859_1") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UTF8") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UTF-16") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeBig") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeBigUnmarked") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeLittle") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeLittleUnmarked") (number 1)
0 of 10 tests failed


with gcj/gij version 3.4 20030126 (experimental), the results are as
follow:

gnu.testlet.java.lang.String.getBytes14
- ----
FAIL: gnu.testlet.java.lang.String.getBytes14: String.getBytes("ISO8859_15") (number 1)
gnu.testlet.java.lang.String.getBytes13
- ----
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("ASCII") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("Cp1252") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("ISO8859_1") (number 1)
PASS: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UTF8") (number 1)
FAIL: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UTF-16") (number 1)
FAIL: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeBig") (number 1)
FAIL: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeBigUnmarked") (number 1)
FAIL: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeLittle") (number 1)
FAIL: gnu.testlet.java.lang.String.getBytes13: String.getBytes("UnicodeLittleUnmarked") (number 1)
6 of 10 tests failed


cheers;
rsn


RCS file: /cvs/mauve/mauve/ChangeLog,v
retrieving revision 1.425
diff -u -w -b -B -r1.425 ChangeLog
- --- ChangeLog	7 Feb 2003 20:24:25 -0000	1.425
+++ ChangeLog	8 Feb 2003 16:02:43 -0000
@@ -1,3 +1,8 @@
+2003-02-09  Raif S. Naffah <raif@fl.net.au>
+
+	* gnu/testlet/java/lang/String/getBytes13: new test
+	* gnu/testlet/java/lang/String/getBytes14: new test
+
 2003-02-07  Mark Wielaard  <mark@klomp.org>
 
 	* gnu/testlet/java/text/CollationElementIterator/jdk11.java (test):

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Que du magnifique

iD8DBQE+RS20+e1AKnsTRiERA8DyAJoDFtGja3aU9Ms2K38k2EcytCkXcgCfUWWu
tPnvTAYt4L44qG+72q7Mv94=
=FK8Q
-----END PGP SIGNATURE-----

[-- Attachment #2: getBytes13.java --]
[-- Type: text/x-java, Size: 2684 bytes --]

// Tags: JDK1.3,JDK1.4

// Copyright (C) 2003 Free Software Foundation, Inc.

// This file is part of Mauve.

// Mauve is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2, or (at your option)
// any later version.

// Mauve is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with Mauve; see the file COPYING.  If not, write to
// the Free Software Foundation, 59 Temple Place - Suite 330,
// Boston, MA 02111-1307, USA.  */

package gnu.testlet.java.lang.String;

import gnu.testlet.Testlet;
import gnu.testlet.TestHarness;
import java.io.UnsupportedEncodingException;

public class getBytes13 implements Testlet
{
  protected static final byte[] ABC1 = new byte[] {97, 98, 99};
  protected static final byte[] ABC2 = new byte[] {-2, -1,  0, 97,  0, 98,  0, 99};
  protected static final byte[] ABC3 = new byte[] { 0, 97,  0, 98,  0, 99};
  protected static final byte[] ABC4 = new byte[] {-1, -2, 97,  0, 98,  0, 99,  0};
  protected static final byte[] ABC5 = new byte[] {97,  0, 98,  0, 99,  0};

  public void test (TestHarness harness)
  {
    harness.checkPoint("getBytes13");

    test1Encoding (harness, "ASCII",                 "abc", ABC1);
    test1Encoding (harness, "Cp1252",                "abc", ABC1);
    test1Encoding (harness, "ISO8859_1",             "abc", ABC1);
    test1Encoding (harness, "UTF8",                  "abc", ABC1);
    test1Encoding (harness, "UTF-16",                "abc", ABC2);
    test1Encoding (harness, "UnicodeBig",            "abc", ABC2);
    test1Encoding (harness, "UnicodeBigUnmarked",    "abc", ABC3);
    test1Encoding (harness, "UnicodeLittle",         "abc", ABC4);
    test1Encoding (harness, "UnicodeLittleUnmarked", "abc", ABC5);
  }


  protected void
  test1Encoding (TestHarness h, String encoding, String s, byte[] ba)
  {
    String signature = "String.getBytes(\""+encoding+"\")";
    try
      {
        h.check (areEqual(s.getBytes(encoding), ba), signature);
      }
    catch (UnsupportedEncodingException x)
      {
        h.debug (x);
	h.fail (signature);
      }
  }

  static boolean areEqual (byte[] a, byte[] b)
  {
    if (a == null || b == null)
      return false;
    if (a.length != b.length)
      return false;
    for (int i = 0; i < a.length; i++)
      if (a[i] != b[i])
        return false;
    return true;
  }
}

[-- Attachment #3: getBytes14.java --]
[-- Type: text/x-java, Size: 1112 bytes --]

// Tags: JDK1.4

// Copyright (C) 2003 Free Software Foundation, Inc.

// This file is part of Mauve.

// Mauve is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2, or (at your option)
// any later version.

// Mauve is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with Mauve; see the file COPYING.  If not, write to
// the Free Software Foundation, 59 Temple Place - Suite 330,
// Boston, MA 02111-1307, USA.  */

package gnu.testlet.java.lang.String;

import gnu.testlet.Testlet;
import gnu.testlet.TestHarness;

public class getBytes14 extends getBytes13 implements Testlet
{
  public void test (TestHarness harness)
  {
    harness.checkPoint("getBytes14");

    test1Encoding (harness, "ISO8859_15", "abc", ABC1);
  }
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new test cases (long)
  2003-02-08 16:15 new test cases (long) Raif S. Naffah
@ 2003-02-14 11:57 ` Mark Wielaard
  2003-02-14 23:58   ` Raif S. Naffah
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Wielaard @ 2003-02-14 11:57 UTC (permalink / raw)
  To: raif; +Cc: Mauve

Hi Raif,

On Sat, 2003-02-08 at 17:17, Raif S. Naffah wrote:
> the tests are to ensure that the mandated (as per public Javadoc 1.3.1
> and 1.4.1) minimal character encodings are supported by the bytecode
> interpreter.
> [...]
> +	* gnu/testlet/java/lang/String/getBytes14: new test

Here you test for "ISO8859_15". I looked here:
http://java.sun.com/j2se/1.4.1/docs/api/java/nio/charset/Charset.html
but couldn't see where it said this is a required character set.
Is it really required or just nice to have since the Sun implementation
supports it? (Which might still be a good reason to add them to Mauve,
but then I would like to label them explicitly as such.)

Also you seem to test (in getBytes13) for the "historical names" for
which I couldn't find a definition. Do you know where they are
specified? InputStreamReader and OutputStreamWriter getEncoding() are
supposed to return them but they don't document what they actually look
like.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new test cases (long)
  2003-02-14 11:57 ` Mark Wielaard
@ 2003-02-14 23:58   ` Raif S. Naffah
  2003-02-16 11:54     ` Mark Wielaard
  0 siblings, 1 reply; 6+ messages in thread
From: Raif S. Naffah @ 2003-02-14 23:58 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Mauve

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

hello Mark,

On Friday 14 February 2003 22:57, Mark Wielaard wrote:
> Hi Raif,
>
> On Sat, 2003-02-08 at 17:17, Raif S. Naffah wrote:
> > the tests are to ensure that the mandated (as per public Javadoc
> > 1.3.1 and 1.4.1) minimal character encodings are supported by the
> > bytecode interpreter.
> > [...]
> > +	* gnu/testlet/java/lang/String/getBytes14: new test
>
> Here you test for "ISO8859_15". I looked here:
> http://java.sun.com/j2se/1.4.1/docs/api/java/nio/charset/Charset.html
> but couldn't see where it said this is a required character set.

yes it is not listed there.  but i refer you to 
<.../j2sdk1.4.1/docs/guide/intl/encoding.doc.html> page of the public 
documentation of sun's jdk-1.4.1; 2nd paragraph:

"Sun's Java 2 Software Development Kit, Standard Edition, v. 1.4.1 for 
all platforms (SolarisTM operating environment, Linux, and Microsoft 
Windows) and the Java 2 Runtime Environment, Standard Edition, v. 1.4.1 
for Solaris and Linux support all encodings shown on this page..."

and further down the same page, a table giving the "Basic Encoding Set 
(contained in lib/rt.jar) - Supported by java.nio, java.io and 
java.lang APIs."  in the "Canonical Name for java.io and java.lang 
API," column, next to ISO-8859-15 row entry, there is a reference to 
"extended encoding set."  i took this to mean the value of the 
canonical name to be taken from the second set; ie. the extended 
encoding set."

there are two possible deductions from this page:

a. "ISO-8859-15" is a MUST encoding in java.nio, as well as in java.io 
and java.lang, but in the last two the canonical name is as stated in 
the "extended set" i.e. "ISO8859_15" (ISO 8859-15, Latin alphabet No. 9 
(and hence the supporting classes are in charsets.jar rather than in 
rt.jar).

b. "ISO-8859-15" is only a MUST encoding in java.nio, but not in java.io 
nor java.lang.

i adopted the first.

there is of course the 3rd possibility of the writer(s) of these 
documentation pages being in contradiction.

the code itself (for the sun's jdk 1.4.1_01) does support ISO-8859-15, 
which can be thought of as the lithmus test.


> Is it really required or just nice to have since the Sun
> implementation supports it? (Which might still be a good reason to
> add them to Mauve, but then I would like to label them explicitly as
> such.)

my interpretation of it was that is is a MUST.


> Also you seem to test (in getBytes13) for the "historical names" for
> which I couldn't find a definition.

the relevant javadoc page in sun'd jdk 1.3.1_06 
<.../jdk1.3.1/docs/guide/intl/encoding.doc.html> lists the required 
encodings:

"...Sun's Java 2 Runtime Environment, Standard Edition, v. 1.3.1 for 
Windows comes in two different versions: US-only and international. The 
US-only version only supports the encodings shown in the first table. 
The international version (which includes the lib\i18n.jar file) 
supports all encodings shown on this page."

it then proceeds to list the "Basic Encoding Set" (contained in rt.jar) 
where those names are defined.

the only difference is the Latin Alphabet #9.


>... Do you know where they are
> specified? InputStreamReader and OutputStreamWriter getEncoding() are
> supposed to return them but they don't document what they actually
> look like.

the references sun cites are:

* The Unicode standard 
<http://www.unicode.org/unicode/standard/standard.html>, and
* The Unicode FAQ <http://www.unicode.org/unicode/faq>.


cheers;
rsn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Que du magnifique

iD8DBQE+TYNP+e1AKnsTRiERA/LLAKCrscVg83sy882JsHImp/ybGSipCgCgquAV
4VLd68TlTPVpPV5w296qSCc=
=X4EK
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new test cases (long)
  2003-02-14 23:58   ` Raif S. Naffah
@ 2003-02-16 11:54     ` Mark Wielaard
  2003-02-16 16:44       ` Raif S. Naffah
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Wielaard @ 2003-02-16 11:54 UTC (permalink / raw)
  To: raif; +Cc: Mauve

Hi Raif,

Thanks for all the pointers. The character encoding names seem to be
confusing whatever way you look at it. What is and isn't a canonical
name, for what package, what the (historical) alias is, etc is difficult
to decipher.

Also note that the 1.4 docs and 1.4.1 encoding docs actually list
different canonical names... Duh...

Anyway I think the best thing todo is to add all canonical, historical
and/or alias must support character names to the getBytes() tests, at
least for the names that are documented on all these different
(versions) of the API/Spec pages. The distinction between names used for
java.lang/io and java.nio seems to only confuse matters and
implementations that only support some names for some of the library
classes will probably confuse users enormously.

So getBytes14 now tests US-ASCII, windows-1252, ISO-8859-1, ISO-8859-15,
ISO8859_15, UTF-16BE and UTF-16LE. Together with the getBytes13 tests
this should catch all the encoding names that people will probaly always
expect to be available in a normal class library implementation.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new test cases (long)
  2003-02-16 11:54     ` Mark Wielaard
@ 2003-02-16 16:44       ` Raif S. Naffah
  2003-02-16 17:21         ` Mark Wielaard
  0 siblings, 1 reply; 6+ messages in thread
From: Raif S. Naffah @ 2003-02-16 16:44 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Mauve

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

hello Mark,

On Sunday 16 February 2003 22:53, Mark Wielaard wrote:
> Hi Raif,
>
> Thanks for all the pointers. The character encoding names seem to be
> confusing whatever way you look at it. What is and isn't a canonical
> name, for what package, what the (historical) alias is, etc is
> difficult to decipher.

i agree it's somewhat confusing but we can reduce this by sticking to 
the documentation and the behaviour of the implementation (sun's jdk 
that is).


> Also note that the 1.4 docs and 1.4.1 encoding docs actually list
> different canonical names... Duh...

where exactly does the 1.4 and the 1.4.1 differ?


> Anyway I think the best thing todo is to add all canonical,
> historical and/or alias must support character names to the
> getBytes() tests, at least for the names that are documented on all
> these different (versions) of the API/Spec pages. The distinction
> between names used for java.lang/io and java.nio seems to only
> confuse matters and
> implementations that only support some names for some of the library
> classes will probably confuse users enormously.

another alternative is to stick to the distinction the javadocs makes 
wrt. to the following aspects:

* specific packages use specific, albeit sometimes, different 
encoding/charset names;
* some names are "canonical" others are "aliases,"
* some names are a MUST (Basic), others (the international version of 
the JDK) are a MAY (Extended).

this way, gnu.testlet.java.lang.String.getBytes can be the test point 
for java.lang.* API encoding names, and something like (a new) 
gnu.testlet.java.nio.charset.Charset.isSupported test would emulate the 
same for the java.nio.* API encoding names.


> So getBytes14 now tests US-ASCII, windows-1252, ISO-8859-1,
> ISO-8859-15, ISO8859_15, UTF-16BE and UTF-16LE. Together with the
> getBytes13 tests this should catch all the encoding names that people
> will probaly always expect to be available in a normal class library
> implementation.

if my comments above are acceptable, i can revise the getBytes classes 
to handle distinctly the last 2 points (canonical v/s alias, and basic 
v/s extended), and write a new test case for java.nio.* API 
conformance.  the pass/fail requirements can then be controlled with an 
'xfails' file.


cheers;
rsn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Que du magnifique

iD8DBQE+T8C++e1AKnsTRiERA2J5AJ4o4bgsAhoGEBXexLTKsGHeWor+JACg86St
6zTwX4j9FqzGExtFXFu3HDQ=
=n5zY
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new test cases (long)
  2003-02-16 16:44       ` Raif S. Naffah
@ 2003-02-16 17:21         ` Mark Wielaard
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Wielaard @ 2003-02-16 17:21 UTC (permalink / raw)
  To: raif; +Cc: Mauve

Hi Raif,

On Sun, 2003-02-16 at 17:47, Raif S. Naffah wrote:
> i agree it's somewhat confusing but we can reduce this by sticking to 
> the documentation and the behaviour of the implementation (sun's jdk 
> that is).

I am not convinced that what the documentation says is always precisely
what the Sun implementation does and/or that what the Sun implementation
does is what the documentation (should) say...

> > Also note that the 1.4 docs and 1.4.1 encoding docs actually list
> > different canonical names... Duh...
> 
> where exactly does the 1.4 and the 1.4.1 differ?

I didn't scan the documents very carefully but immediatly notived that
http://java.sun.com/j2se/1.4/docs/guide/intl/encoding.doc.html
says that the canonocal name for what is called "Windows Latin-1" is
Cp1252 for the java.nio API, but that
http://java.sun.com/j2se/1.4.1/docs/guide/intl/encoding.doc.html
says it is "windows-1252" for java.nio, but Cp1252 for java.lang/io.

> another alternative is to stick to the distinction the javadocs makes 
> wrt. to the following aspects:
> 
> * specific packages use specific, albeit sometimes, different 
> encoding/charset names;
> * some names are "canonical" others are "aliases,"
> * some names are a MUST (Basic), others (the international version of 
> the JDK) are a MAY (Extended).
> 
> this way, gnu.testlet.java.lang.String.getBytes can be the test point 
> for java.lang.* API encoding names, and something like (a new) 
> gnu.testlet.java.nio.charset.Charset.isSupported test would emulate the 
> same for the java.nio.* API encoding names.
> [...]
> if my comments above are acceptable, i can revise the getBytes classes 
> to handle distinctly the last 2 points (canonical v/s alias, and basic 
> v/s extended), and write a new test case for java.nio.* API 
> conformance.  the pass/fail requirements can then be controlled with an 
> 'xfails' file.

Sure. Having more tests so that one can test how/what character set
names are actually supported by the class library implementation would
be very welcome. But I don't know if following the Sun (canonical)
naming convention (especially differences between java.lang/io and
java.nio names) makes much sense here since it looks very confusing for
users.

One of the documents above point to the IANA Charset Registry
http://www.iana.org/assignments/character-sets (rfc2278)
These define official names and aliases (and it look automatically
parsable which is a plus). I would take this document to create some
automatically generated tests (having a script so that new revisions of
the registry can be used to regenerate the tests).

It still makes sense to look at the (historical) character set names
that Sun defines but which aren't in the IANA Charset Registry and
create tests that at least make sure that an implementation can alias
those names to something in the official IANA Charset Registry (and for
the tests now in getBytes13 and getBytes14 this looks like they must be
supported by all platforms).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-02-16 17:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-08 16:15 new test cases (long) Raif S. Naffah
2003-02-14 11:57 ` Mark Wielaard
2003-02-14 23:58   ` Raif S. Naffah
2003-02-16 11:54     ` Mark Wielaard
2003-02-16 16:44       ` Raif S. Naffah
2003-02-16 17:21         ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).