[Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
@ 2015-08-02 13:57 gnugcc at marino dot st
  2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-08-02 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

            Bug ID: 67096
           Summary: libstdc++ testsuite, codecvt: many UTF-8 tests illegal
                    (testing bytes 5 and 6)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gnugcc at marino dot st
  Target Milestone: ---

Using the pending locale patch set for DragonFly[1], I've been running
testsuite frequently.  After an update to libc to support LC_CTYPE better, we
suffered a large test regression.

For example, 22_locale/codecvt/length/wchar_t/4.cc started failing.

I modified the test noting that as long as int_type is limited to < 0x200000,
the test passes.  If it's over 0x200000, it fails.  This test, and many similar
to it tests:
  0x200000
  0x400000
  0x800000
  0x1000000
  0x2000000
  0x4000000
  0x8000000
  0x10000000
  0x20000000
  0x40000000

The reason for the failure is the libc rejects > 4 bytes as illegal, as it
should.

According to wiki[2]: 
"The original specification covered numbers up to 31 bits (the original limit
of the Universal Character Set). In November 2003, UTF-8 was restricted by RFC
3629 to end at U+10FFFF, in order to match the constraints of the UTF-16
character encoding. This removed all 5- and 6-byte sequences, and about half of
the 4-byte sequences."

The test is setting the locale to "en_US.UTF-8" which by definition is limited
to 4-byte numbers.  Thus, testing any number 0x200000 or greater is illegal and
should not be done.  This probably affects several tests that are making the
same mistake.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02678.html
[2] https://en.wikipedia.org/wiki/UTF-8

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
@ 2015-08-02 14:07 ` gnugcc at marino dot st
  2015-08-03 15:23 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-08-02 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

--- Comment #1 from John Marino <gnugcc at marino dot st> ---
Created attachment 36108
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36108&action=edit
modification to test that makes it legal

As an illustration, I've modified the test to stop before 0x200000 which
results in success for transforms that are strict.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
  2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
@ 2015-08-03 15:23 ` redi at gcc dot gnu.org
  2015-09-11 12:11 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-08-03 15:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-08-03
     Ever confirmed|0                           |1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
  2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
  2015-08-03 15:23 ` redi at gcc dot gnu.org
@ 2015-09-11 12:11 ` redi at gcc dot gnu.org
  2015-09-11 13:07 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
                   ` (2 preceding siblings ...)
  2015-09-11 12:11 ` redi at gcc dot gnu.org
@ 2015-09-11 13:07 ` redi at gcc dot gnu.org
  2015-09-11 13:12 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 13:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Author: redi
Date: Fri Sep 11 13:06:42 2015
New Revision: 227686

URL: https://gcc.gnu.org/viewcvs?rev=227686&root=gcc&view=rev
Log:
Fix invalid UTF-8 in wchar_t tests.

2015-09-11  John Marino  <gnugcc@marino.st>
            Jonathan Wakely  <jwakely@redhat.com>

        PR libstdc++/67096
        * testsuite/22_locale/codecvt/in/wchar_t/4.cc: Do not test code points
        above U+10FFFF.
        * testsuite/22_locale/codecvt/in/wchar_t/8.cc: Likewise.
        * testsuite/22_locale/codecvt/in/wchar_t/9.cc: Likewise.
        * testsuite/22_locale/codecvt/length/wchar_t/4.cc: Likewise.
        * testsuite/22_locale/codecvt/out/wchar_t/4.cc: Likewise.
        * testsuite/22_locale/codecvt/unshift/wchar_t/4.cc: Likewise.
        * testsuite/27_io/basic_filebuf/seekoff/wchar_t/1.cc: Likewise.
        * testsuite/27_io/basic_filebuf/seekpos/wchar_t/9874.cc: Likewise.
        * testsuite/27_io/basic_filebuf/underflow/wchar_t/1.cc: Likewise.
        * testsuite/27_io/basic_filebuf/underflow/wchar_t/2.cc: Likewise.
        * testsuite/27_io/basic_filebuf/underflow/wchar_t/3.cc: Likewise.
        * testsuite/27_io/objects/wchar_t/10.cc: Likewise.
        * testsuite/27_io/objects/wchar_t/11.cc: Likewise.
        * testsuite/27_io/objects/wchar_t/12.cc: Likewise.
        * testsuite/27_io/objects/wchar_t/13.cc: Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/4.cc
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/8.cc
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/9.cc
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/length/wchar_t/4.cc
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/4.cc
    trunk/libstdc++-v3/testsuite/22_locale/codecvt/unshift/wchar_t/4.cc
    trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/seekoff/wchar_t/1.cc
    trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/seekpos/wchar_t/9874.cc
    trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/1.cc
    trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/2.cc
    trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/3.cc
    trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/10.cc
    trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/11.cc
    trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/12.cc
    trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/13.cc


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
                   ` (3 preceding siblings ...)
  2015-09-11 13:07 ` redi at gcc dot gnu.org
@ 2015-09-11 13:12 ` redi at gcc dot gnu.org
  2015-09-13 13:49 ` gnugcc at marino dot st
  2015-09-13 16:23 ` gnugcc at marino dot st
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 13:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |6.0

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Should be fixed now, let me know if there are others that need fixing.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
                   ` (4 preceding siblings ...)
  2015-09-11 13:12 ` redi at gcc dot gnu.org
@ 2015-09-13 13:49 ` gnugcc at marino dot st
  2015-09-13 16:23 ` gnugcc at marino dot st
  6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-09-13 13:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

--- Comment #4 from John Marino <gnugcc at marino dot st> ---
Created attachment 36332
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36332&action=edit
codecvt/max_length/wchar/4.cc patch

codecvt/max_length/wchar/4.cc test thinks that 6 is the maximum byte size for
UTF-8 and fails if it detects lengths less than 6 bytes.

Of course, the true limit is 4 bytes, so the test needs to be changed from 6 to
4.

It is related to the tests this PR fixed (same problem, different form).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
  2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
                   ` (5 preceding siblings ...)
  2015-09-13 13:49 ` gnugcc at marino dot st
@ 2015-09-13 16:23 ` gnugcc at marino dot st
  6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-09-13 16:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096

--- Comment #5 from John Marino <gnugcc at marino dot st> ---
Hmmm, thinking about this, I'd bet Linux would FAIL this test.  It probably
does allow 6-bytes (even though it should not) and thus K would return 6.

I don't really have a recommendation -- the standard is pretty clear, 4 bytes
is the maximum for UTF-8, so it probably should fail on Linux.

Or maybe this whole test should just be removed?  What's it really testing that
the other tests haven't already?


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-13 16:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
2015-08-03 15:23 ` redi at gcc dot gnu.org
2015-09-11 12:11 ` redi at gcc dot gnu.org
2015-09-11 13:07 ` redi at gcc dot gnu.org
2015-09-11 13:12 ` redi at gcc dot gnu.org
2015-09-13 13:49 ` gnugcc at marino dot st
2015-09-13 16:23 ` gnugcc at marino dot st

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).