public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
@ 2015-08-02 13:57 gnugcc at marino dot st
2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-08-02 13:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
Bug ID: 67096
Summary: libstdc++ testsuite, codecvt: many UTF-8 tests illegal
(testing bytes 5 and 6)
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: gnugcc at marino dot st
Target Milestone: ---
Using the pending locale patch set for DragonFly[1], I've been running
testsuite frequently. After an update to libc to support LC_CTYPE better, we
suffered a large test regression.
For example, 22_locale/codecvt/length/wchar_t/4.cc started failing.
I modified the test noting that as long as int_type is limited to < 0x200000,
the test passes. If it's over 0x200000, it fails. This test, and many similar
to it tests:
0x200000
0x400000
0x800000
0x1000000
0x2000000
0x4000000
0x8000000
0x10000000
0x20000000
0x40000000
The reason for the failure is the libc rejects > 4 bytes as illegal, as it
should.
According to wiki[2]:
"The original specification covered numbers up to 31 bits (the original limit
of the Universal Character Set). In November 2003, UTF-8 was restricted by RFC
3629 to end at U+10FFFF, in order to match the constraints of the UTF-16
character encoding. This removed all 5- and 6-byte sequences, and about half of
the 4-byte sequences."
The test is setting the locale to "en_US.UTF-8" which by definition is limited
to 4-byte numbers. Thus, testing any number 0x200000 or greater is illegal and
should not be done. This probably affects several tests that are making the
same mistake.
[1] https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02678.html
[2] https://en.wikipedia.org/wiki/UTF-8
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
@ 2015-08-02 14:07 ` gnugcc at marino dot st
2015-08-03 15:23 ` redi at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-08-02 14:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
--- Comment #1 from John Marino <gnugcc at marino dot st> ---
Created attachment 36108
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36108&action=edit
modification to test that makes it legal
As an illustration, I've modified the test to stop before 0x200000 which
results in success for transforms that are strict.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
@ 2015-08-03 15:23 ` redi at gcc dot gnu.org
2015-09-11 12:11 ` redi at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-08-03 15:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-08-03
Ever confirmed|0 |1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
2015-08-03 15:23 ` redi at gcc dot gnu.org
@ 2015-09-11 12:11 ` redi at gcc dot gnu.org
2015-09-11 13:07 ` redi at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 12:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
` (2 preceding siblings ...)
2015-09-11 12:11 ` redi at gcc dot gnu.org
@ 2015-09-11 13:07 ` redi at gcc dot gnu.org
2015-09-11 13:12 ` redi at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 13:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Author: redi
Date: Fri Sep 11 13:06:42 2015
New Revision: 227686
URL: https://gcc.gnu.org/viewcvs?rev=227686&root=gcc&view=rev
Log:
Fix invalid UTF-8 in wchar_t tests.
2015-09-11 John Marino <gnugcc@marino.st>
Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/67096
* testsuite/22_locale/codecvt/in/wchar_t/4.cc: Do not test code points
above U+10FFFF.
* testsuite/22_locale/codecvt/in/wchar_t/8.cc: Likewise.
* testsuite/22_locale/codecvt/in/wchar_t/9.cc: Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/4.cc: Likewise.
* testsuite/22_locale/codecvt/out/wchar_t/4.cc: Likewise.
* testsuite/22_locale/codecvt/unshift/wchar_t/4.cc: Likewise.
* testsuite/27_io/basic_filebuf/seekoff/wchar_t/1.cc: Likewise.
* testsuite/27_io/basic_filebuf/seekpos/wchar_t/9874.cc: Likewise.
* testsuite/27_io/basic_filebuf/underflow/wchar_t/1.cc: Likewise.
* testsuite/27_io/basic_filebuf/underflow/wchar_t/2.cc: Likewise.
* testsuite/27_io/basic_filebuf/underflow/wchar_t/3.cc: Likewise.
* testsuite/27_io/objects/wchar_t/10.cc: Likewise.
* testsuite/27_io/objects/wchar_t/11.cc: Likewise.
* testsuite/27_io/objects/wchar_t/12.cc: Likewise.
* testsuite/27_io/objects/wchar_t/13.cc: Likewise.
Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/4.cc
trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/8.cc
trunk/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/9.cc
trunk/libstdc++-v3/testsuite/22_locale/codecvt/length/wchar_t/4.cc
trunk/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/4.cc
trunk/libstdc++-v3/testsuite/22_locale/codecvt/unshift/wchar_t/4.cc
trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/seekoff/wchar_t/1.cc
trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/seekpos/wchar_t/9874.cc
trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/1.cc
trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/2.cc
trunk/libstdc++-v3/testsuite/27_io/basic_filebuf/underflow/wchar_t/3.cc
trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/10.cc
trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/11.cc
trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/12.cc
trunk/libstdc++-v3/testsuite/27_io/objects/wchar_t/13.cc
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
` (3 preceding siblings ...)
2015-09-11 13:07 ` redi at gcc dot gnu.org
@ 2015-09-11 13:12 ` redi at gcc dot gnu.org
2015-09-13 13:49 ` gnugcc at marino dot st
2015-09-13 16:23 ` gnugcc at marino dot st
6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-09-11 13:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |6.0
--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Should be fixed now, let me know if there are others that need fixing.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
` (4 preceding siblings ...)
2015-09-11 13:12 ` redi at gcc dot gnu.org
@ 2015-09-13 13:49 ` gnugcc at marino dot st
2015-09-13 16:23 ` gnugcc at marino dot st
6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-09-13 13:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
--- Comment #4 from John Marino <gnugcc at marino dot st> ---
Created attachment 36332
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36332&action=edit
codecvt/max_length/wchar/4.cc patch
codecvt/max_length/wchar/4.cc test thinks that 6 is the maximum byte size for
UTF-8 and fails if it detects lengths less than 6 bytes.
Of course, the true limit is 4 bytes, so the test needs to be changed from 6 to
4.
It is related to the tests this PR fixed (same problem, different form).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug libstdc++/67096] libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6)
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
` (5 preceding siblings ...)
2015-09-13 13:49 ` gnugcc at marino dot st
@ 2015-09-13 16:23 ` gnugcc at marino dot st
6 siblings, 0 replies; 8+ messages in thread
From: gnugcc at marino dot st @ 2015-09-13 16:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67096
--- Comment #5 from John Marino <gnugcc at marino dot st> ---
Hmmm, thinking about this, I'd bet Linux would FAIL this test. It probably
does allow 6-bytes (even though it should not) and thus K would return 6.
I don't really have a recommendation -- the standard is pretty clear, 4 bytes
is the maximum for UTF-8, so it probably should fail on Linux.
Or maybe this whole test should just be removed? What's it really testing that
the other tests haven't already?
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-09-13 16:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-02 13:57 [Bug libstdc++/67096] New: libstdc++ testsuite, codecvt: many UTF-8 tests illegal (testing bytes 5 and 6) gnugcc at marino dot st
2015-08-02 14:07 ` [Bug libstdc++/67096] " gnugcc at marino dot st
2015-08-03 15:23 ` redi at gcc dot gnu.org
2015-09-11 12:11 ` redi at gcc dot gnu.org
2015-09-11 13:07 ` redi at gcc dot gnu.org
2015-09-11 13:12 ` redi at gcc dot gnu.org
2015-09-13 13:49 ` gnugcc at marino dot st
2015-09-13 16:23 ` gnugcc at marino dot st
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).