Character Test

public inbox for mauve-discuss@sourceware.org
 help / color / mirror / Atom feed

* Character Test
@ 1999-01-07 20:18 Aaron M. Renn
       [not found] ` < 36957A24.B7BB761F@urbanophile.com >
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Aaron M. Renn @ 1999-01-07 20:18 UTC (permalink / raw)
  To: mauve-discuss; +Cc: abies

I convertered Artur's Character test to the Mauve framework (a simple task)
and checked into the gnu/testlet/java/lang/Character directory as
unicode.java.  A word of warning: This test is very thorough and takes a
very long time to run.  It is possible we might not want to run it at all by
default.  Running debug it printed over 150MB of output before I killed the
job to avoid filling up my filesystem.

For the record, it runs 3,578,944 tests spanning the entire Unicode
database.  The JDK failed 746 of them.  (I might kick off the
Japhar/Classpath combo before bed and hope it finishes by the time I get
home from work tomorrow :-)  I'm using the Unicode 2.1.2 database file,
which I also checked into the archive.

-- 
Aaron M. Renn (arenn@urbanophile.com) http://www.urbanophile.com/arenn/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* File access (was: Character Test)
       [not found] ` < 36957A24.B7BB761F@urbanophile.com >
@ 1999-01-08  7:41   ` Anthony Green
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Green @ 1999-01-08  7:41 UTC (permalink / raw)
  To: arenn; +Cc: mauve-discuss, abies

Aaron wrote:
> I convertered Artur's Character test to the Mauve framework (a simple task)
> and checked into the gnu/testlet/java/lang/Character directory as
> unicode.java.

Cool! Will try this today.

> I'm using the Unicode 2.1.2 database file, which I also checked into
> the archive.

I was thinking about File usage the other day.  I'd like to suggest
that we introduce an abstract interface to test input data (like
UnicodeData.txt).  So rather than having testlets open files
themselves, they would pass the file name to the TestHarness, and the
harness would return a Reader.

PersonalJava has optional File support.  JavaCard has no File suport.
For systems with no File support you would build the test input data
into the final image.  Their special TestHarness would look up the
filename in some kind of dictionary and return a Reader on the in-core
image.

This would require:

- Tagging testlets with file info.  Something simple like:
	// Input: UnicodeData.txt

- A script for collecting Input info, and creating Java classes with
	the file contents as static data.

- Other bits of glue for treating these classes as Uses ones, etc.

Using this scheme could be a simple configure time option.

The only constraint for testlet authors would be that they are forced
the special harness method to access test input data.

Does this sound reasonable?

AG

-- 
Anthony Green                                               Cygnus Solutions
                                                       Sunnyvale, California

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-07 20:18 Character Test Aaron M. Renn
       [not found] ` < 36957A24.B7BB761F@urbanophile.com >
@ 1999-01-08 10:33 ` Tom Tromey
  1999-01-09  6:06   ` Artur Biesiadowski
  1999-01-09  7:29 ` Artur Biesiadowski
  2 siblings, 1 reply; 11+ messages in thread
From: Tom Tromey @ 1999-01-08 10:33 UTC (permalink / raw)
  To: Aaron M. Renn; +Cc: mauve-discuss, abies

Aaron> A word of warning: This test is very thorough and takes a very
Aaron> long time to run.

I think this code is also moderately buggy.  For instance, it says
that Character.digit(\u00b2,6) should be 2.  However, that is not the
case.  \u00b2 is not a digit per the table in the JCL book.

The test gives me nearly 1000 errors for my implementation of
Character.  However, I actually think my implementation is correct (or
if it is not, it has very few bugs).

Artur, could you fix the bugs?  Patches against the Mauve version of
your test would be the easiest way for us...

Until the changes are made, I'd advise against changing your
implementation based on the output of this test.

Aaron> I'm using the Unicode 2.1.2 database file, which I also checked
Aaron> into the archive.

Is there a particular reason you didn't use the 2.1.5 version of the
database?  That is the latest version (as of sometime last month; I
haven't looked in a few weeks).

Some care is required here.  JDK conformance actually depends on using
the correct version of the table.  For JDK 1.0, I believe a very old
Unicode data table was used.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-08 10:33 ` Character Test Tom Tromey
@ 1999-01-09  6:06   ` Artur Biesiadowski
  1999-01-14 14:45     ` Tom Tromey
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Artur Biesiadowski @ 1999-01-09  6:06 UTC (permalink / raw)
  To: tromey, mauve

Tom Tromey wrote:

> I think this code is also moderately buggy.  For instance, it says
> that Character.digit(\u00b2,6) should be 2.  However, that is not the
> case.  \u00b2 is not a digit per the table in the JCL book.
> 
> The test gives me nearly 1000 errors for my implementation of
> Character.  However, I actually think my implementation is correct (or
> if it is not, it has very few bugs).

I'm afraid that we will have to go with these one by one. isDigit
problem is cause because I just used N* for type to determine the digit,
not using dumb sun rule (if name contains DIGIT). But I'm afraid that
I'll have to change that - so report all places where test reports
unneeded bugs. But at the same time remeber that JLS is not perfect -
there are some parts that contradict each other - I'll cite them soon.

> Is there a particular reason you didn't use the 2.1.5 version of the
> database?  That is the latest version (as of sometime last month; I
> haven't looked in a few weeks).
> 
> Some care is required here.  JDK conformance actually depends on using
> the correct version of the table.  For JDK 1.0, I believe a very old
> Unicode data table was used.

I'll say NO. JDK isn't god's implementation. We are interested in
conformance to JLS and to common sense, only then taking JDK in account.
Test is designed to check if Character class is valid to JLS, not if it
has same bugs as JDK1.0.3. There is no duobt that latest version of
Unicode spec is best, so we should use latest possible. If this will
cause some 'good' implementation to fail, then it is ok - this means
that this implementation is out of date compared to latest unicode
standards. And JDK versions has nothing to do with it.  Only important
part is functionality exposed by Character methods - here we should use
different versions of test. But why something implementing 1.0 methods
should not recognize euro sign ? Only because there was a historic
version of sun's product that was releases before euro introduction ?

Artur

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-07 20:18 Character Test Aaron M. Renn
       [not found] ` < 36957A24.B7BB761F@urbanophile.com >
  1999-01-08 10:33 ` Character Test Tom Tromey
@ 1999-01-09  7:29 ` Artur Biesiadowski
  1999-01-14 14:20   ` Tom Tromey
  1999-01-14 16:10   ` Tom Tromey
  2 siblings, 2 replies; 11+ messages in thread
From: Artur Biesiadowski @ 1999-01-09  7:29 UTC (permalink / raw)
  To: mauve

"Aaron M. Renn" wrote:
> 
> I convertered Artur's Character test to the Mauve framework (a simple task)
> and checked into the gnu/testlet/java/lang/Character directory as
> unicode.java.  A word of warning: This test is very thorough and takes a
> very long time to run.  It is possible we might not want to run it at all by
> default.  Running debug it printed over 150MB of output before I killed the
> job to avoid filling up my filesystem.

Attached is small diff to SimpleTestHarness which cause the test to run
100x faster (at least).
Second file is update unicode.java - you have hacked a bit too much of
code, so most errors what not reported in right manner.

I think that it works good now as far as performance and verbosity is
concerned. Now we can begin to chase JLS incompatibilities.

Single most important one for JDK is if SUPERSCRIPT/SUBSCRIPT numbers
should be treated as digits. 

JLS says it should not be, because they do not contain word DIGIT in
them. But they do have digit value in unicode spec. JLS is known to be
buggy - there is even a site somewhere which tracks all the errors in
JLS. But on the other hand it is only source of 'truth'. So, we can
change a rule to ignore unicode digit value if entry has not DIGIT in
its name.

Next problem for jdk is that it reports roman numerals (category Nl) as
java and unicode indetifier starts. It is clearly stated in api that
char needs to be
a letter     
a currency symbol (such as "$")
a connecting punctuation character (such as "_")

And there is a link to isLetter method. JDK isLetter returns false for
roman numerals, but at the same time jdk allows it to be start of
identifiers. It seems like different people made different methods.

Third problem is that TIBETAN DIGIT HALF * are not reported as digits. I
don't know what this letters are, but JLS rules set them as digits.

First problem is my fault (I used unicode spec instead of JLS), second
and third are clearly JDK faults (if something looks like V then it is V
isn't it ?). 

Rest of my code works with JDK (as long as deprecated methods are not
used). So if anybody disagree with the test, please include specific
cases, so we can get them one by one.

BTW, deprecated methods are quite funny - they are implemented randomly
with another methods which they should be, not with JLS or even API
spec. They could have changed API to say that it is not longer JLS
method, but they continue to include 1.0 api, with method being replaced
by another one. I wonder how they could pass their own JCK ? Maybe
deprecated methods are not needed to pass JCK ?

Artur
--- OldHarness.java	Sat Jan  9 15:14:49 1999
+++ SimpleTestHarness.java	Sat Jan  9 15:32:31 1999
@@ -39,16 +39,21 @@

   public void check (boolean result)
     {
+      if (! result)
+	{
       String d = (description
 		  + ((last_check == null) ? "" : (": " + last_check))
 		  + " (number " + (count + 1) + ")");
-      if (! result)
-	{
+
 	  System.out.println("FAIL: " + d);
 	  ++failures;
 	}
       else if (verbose)
 	{
+      String d = (description
+		  + ((last_check == null) ? "" : (": " + last_check))
+		  + " (number " + (count + 1) + ")");
+
 	  System.out.println("PASS: " + d);
 	}
       ++count;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-09  7:29 ` Artur Biesiadowski
@ 1999-01-14 14:20   ` Tom Tromey
  1999-01-14 16:10   ` Tom Tromey
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Tromey @ 1999-01-14 14:20 UTC (permalink / raw)
  To: Artur Biesiadowski; +Cc: mauve

>>>>> "Artur" == Artur Biesiadowski <abies@pg.gda.pl> writes:

Artur> Attached is small diff to SimpleTestHarness which cause the
Artur> test to run 100x faster (at least).

I checked in a variant of this change.

Artur> Second file is update unicode.java - you have hacked a bit too
Artur> much of code, so most errors what not reported in right manner.

I checked this in with a few changes to make it compile.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-09  6:06   ` Artur Biesiadowski
@ 1999-01-14 14:45     ` Tom Tromey
  1999-01-14 15:53     ` Tom Tromey
  1999-01-14 18:00     ` Tom Tromey
  2 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 1999-01-14 14:45 UTC (permalink / raw)
  To: Artur Biesiadowski; +Cc: mauve

>>>>> "Artur" == Artur Biesiadowski <abies@pg.gda.pl> writes:

>> Some care is required here.  JDK conformance actually depends on
>> using the correct version of the table.  For JDK 1.0, I believe a
>> very old Unicode data table was used.

Artur> I'll say NO. JDK isn't god's implementation. We are interested
Artur> in conformance to JLS and to common sense, only then taking JDK
Artur> in account.

As far as I know, there is no JLS corresponding to JDK 1.2.  Instead
we must rely on the JDK 1.2 docs from Sun.  If we just look at JLS
then we have to use the tables specified there -- probably not what
you intend.

If you mean we should rely on documentation and not implementations,
then I agree.  Sun's implementation does have bugs with respect to its
documentation (and there are cases where the docs are clearly right).

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-09  6:06   ` Artur Biesiadowski
  1999-01-14 14:45     ` Tom Tromey
@ 1999-01-14 15:53     ` Tom Tromey
  1999-01-14 18:00     ` Tom Tromey
  2 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 1999-01-14 15:53 UTC (permalink / raw)
  To: Artur Biesiadowski; +Cc: mauve

>> The test gives me nearly 1000 errors for my implementation of
>> Character.  However, I actually think my implementation is correct
>> (or if it is not, it has very few bugs).

Artur> I'm afraid that we will have to go with these one by one.

FYI I've been going through the problems and committing fixes.  And
yes, some of the problems were bugs in my Character implementation
(sigh).

Note that JDK 1.1 -vs- 1.2 does make a difference even if you only
look at docs.  1.2 says that 7f-9f are ignorable control characters,
but 1.1 omits these.

Currently the test is 1.2 based.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-09  7:29 ` Artur Biesiadowski
  1999-01-14 14:20   ` Tom Tromey
@ 1999-01-14 16:10   ` Tom Tromey
  1 sibling, 0 replies; 11+ messages in thread
From: Tom Tromey @ 1999-01-14 16:10 UTC (permalink / raw)
  To: Artur Biesiadowski; +Cc: mauve

>>>>> "Artur" == Artur Biesiadowski <abies@pg.gda.pl> writes:

Artur> Single most important one for JDK is if SUPERSCRIPT/SUBSCRIPT
Artur> numbers should be treated as digits.

Artur> JLS says it should not be, because they do not contain word
Artur> DIGIT in them. But they do have digit value in unicode
Artur> spec. JLS is known to be buggy - there is even a site somewhere
Artur> which tracks all the errors in JLS. But on the other hand it is
Artur> only source of 'truth'. So, we can change a rule to ignore
Artur> unicode digit value if entry has not DIGIT in its name.

In this case I've decided to go with what the Java docs say.  All the
documentation I have (JLS, JCL book, and JDK 1.2 docs) seem to be
consistent on the treatment of digit/isDigit.

Artur> Next problem for jdk is that it reports roman numerals
Artur> (category Nl) as java and unicode indetifier starts.

I agree this must be a bug in their implementation.

Artur> Third problem is that TIBETAN DIGIT HALF * are not reported as
Artur> digits. I don't know what this letters are, but JLS rules set
Artur> them as digits.

The Tibetan half digits are strange fractional characters.  They don't
have integer values.  I don't know why these don't have numeric values
in the Unicode data table (possibly a bug there; I'll see).  In any
case they aren't digits according to JLS rules, and wouldn't be even
if this were fixed (since the values are not integers).

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Character Test
  1999-01-09  6:06   ` Artur Biesiadowski
  1999-01-14 14:45     ` Tom Tromey
  1999-01-14 15:53     ` Tom Tromey
@ 1999-01-14 18:00     ` Tom Tromey
  2 siblings, 0 replies; 11+ messages in thread
From: Tom Tromey @ 1999-01-14 18:00 UTC (permalink / raw)
  To: Artur Biesiadowski; +Cc: mauve

I've finished checking in my changes to the exhaustive Character test.
These changes bring it in line with what I believe is the correct
behavior (based on my reading of JLS, JCL volume 1, the JDK 1.2 docs,
and the Unicode 2.0 book).

If you think I screwed up, feel free to mention it on the list so we
can all talk about it.  No doubt some decisions I made will be
controversial.

Tom

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: File access (was: Character Test)
@ 1999-01-08 10:54 Aaron M. Renn
  0 siblings, 0 replies; 11+ messages in thread
From: Aaron M. Renn @ 1999-01-08 10:54 UTC (permalink / raw)
  To: Anthony Green; +Cc: mauve-discuss

>I was thinking about File usage the other day.  I'd like to suggest
>that we introduce an abstract interface to test input data (like
>UnicodeData.txt).  So rather than having testlets open files
>themselves, they would pass the file name to the TestHarness, and the
>harness would return a Reader.

This is ok by me. You will notice that I converted Artur's test to use
getSourceDirectory, but I appened a Unix style "path/path/path" filename,
which is really a no-no.  If we can't use getSystemResource to do this, we
should use an equivalent facility in our TestHarness to avoid this
situation.

If we do build the facility into the Harness, we might want to also support
automagic handling of different versions of the file for differnt
versions/subspecifications of Java.  The recently discussed Unicode file
would be an example of this.  We could have a basename for the requested
resource, and try appending various tags to it to generate the file name
similar to how ResourceBundle does it.

--
Aaron M. Renn (arenn@urbanophile.com) http://www.urbanophile.com/arenn/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1999-01-14 18:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-07 20:18 Character Test Aaron M. Renn
     [not found] ` < 36957A24.B7BB761F@urbanophile.com >
1999-01-08  7:41   ` File access (was: Character Test) Anthony Green
1999-01-08 10:33 ` Character Test Tom Tromey
1999-01-09  6:06   ` Artur Biesiadowski
1999-01-14 14:45     ` Tom Tromey
1999-01-14 15:53     ` Tom Tromey
1999-01-14 18:00     ` Tom Tromey
1999-01-09  7:29 ` Artur Biesiadowski
1999-01-14 14:20   ` Tom Tromey
1999-01-14 16:10   ` Tom Tromey
1999-01-08 10:54 File access (was: Character Test) Aaron M. Renn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).