public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/37475] New: codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
@ 2008-09-11 9:34 tsyvarev at ispras dot ru
2008-09-11 9:36 ` [Bug libstdc++/37475] " tsyvarev at ispras dot ru
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: tsyvarev at ispras dot ru @ 2008-09-11 9:34 UTC (permalink / raw)
To: gcc-bugs
The following member functions of the class codecvt<wchar_t, char, mbstate_t>
result in(stateT& state, const externT* from, const externT* from_end, const
externT*& from_next, internT* to, internT* to_limit, internT*& to_next) const
and
result out(stateT& state, const internT* from, const internT* from_end, const
internT*& from_next, externT* to, externT* to_limit, externT*& to_next) const
return "ok" if (to==to_limit) but (from < from_end), that is, when the output
sequence contains no elements but the input sequence is not empty.
However, as appears from the description of the functions' return values
(22.2.1.5.2 p4), "partial" should be returned instead:
ok - completed the conversion
partial - not all source characters converted
error - encountered a character in [from,from_end) that it could not convert
noconv - internT and externT are the same type, and input sequence is identical
to converted sequence
Note that these functions do return "partial" if the output sequence is not
empty but still not large enough to contain all converted characters from the
input sequence, that is, if
0< (to_limit - to) < (from_end - from).
andrew@Ubuntu:/mnt/hgfs/shared/temp/test$ g++ -Wall test.cpp && ./a.out
Calls do_out() function when size of input sequenceis 2, output - 1:
do_out() returns partial.
Calls do_out() function when size of input sequenceis 2, output - 0:
do_out() returns ok.
Calls do_in() function when size of input sequenceis 2, output - 1:
do_in() returns partial.
Calls do_in() function when size of input sequenceis 2, output - 0:
do_in() returns ok.
andrew@Ubuntu:/mnt/hgfs/shared/temp/test$ g++ --version
g++ (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--
Summary: codecvt::do_in/do_out functions return "ok" when the
output sequence has zero length
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tsyvarev at ispras dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <bug-37475-4@http.gcc.gnu.org/bugzilla/>]
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
@ 2024-06-11 15:09 ` redi at gcc dot gnu.org
2024-06-11 15:12 ` redi at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-11 15:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #9 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Kristian Spangsege from comment #7)
> Curiously, this bug does not occur when using the Cygwin or MinGW versions
> of GCC. In these cases, the result is `partial` as it should be. I assume
> this is because on Cygwin and MinGW, libstdc++ uses a generic locale
> implementation that is different from the one ordinarily used on Linux.
It has an explicit check for this condition:
// It is not clear that __from < __from_end implies __ret != ok
// (see DR 382).
if (__ret == ok && __from < __from_end)
__ret = partial;
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
2024-06-11 15:09 ` [Bug libstdc++/37475] " redi at gcc dot gnu.org
@ 2024-06-11 15:12 ` redi at gcc dot gnu.org
2024-06-11 15:22 ` redi at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-11 15:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Whereas the GNU locale has that check inside a loop, which is never entered
when the destination buffer is zero sized, i.e. __to == __to_end
if (__from_next < __from_end && __ret == ok)
{
if (__to_next < __to_end)
{
// XXX Probably wrong for stateful encodings
__tmp_state = __state;
++__from_next;
*__to_next++ = L'\0';
}
else
__ret = partial;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
2024-06-11 15:09 ` [Bug libstdc++/37475] " redi at gcc dot gnu.org
2024-06-11 15:12 ` redi at gcc dot gnu.org
@ 2024-06-11 15:22 ` redi at gcc dot gnu.org
2024-06-11 18:45 ` kristian.spangsege at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-11 15:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This changes the loop to always run if the input is non-empty, and so return
partial if the destination is empty.
--- a/libstdc++-v3/config/locale/gnu/codecvt_members.cc
+++ b/libstdc++-v3/config/locale/gnu/codecvt_members.cc
@@ -131,10 +131,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// mbsnrtowcs is *very* fast but stops if encounters NUL characters:
// in case we store a L'\0' and then continue, in a loop.
// NB: mbsnrtowcs is a GNU extension
- for (__from_next = __from, __to_next = __to;
- __from_next < __from_end && __to_next < __to_end
- && __ret == ok;)
+ __from_next = __from;
+ __to_next = __to;
+ while (__from_next < __from_end)
{
+ if (__to_next >= __to_end)
+ {
+ __ret = partial;
+ break;
+ }
+
const extern_type* __from_chunk_end;
__from_chunk_end = static_cast<const extern_type*>(memchr(__from_next,
'\0',
__from_end
@@ -162,12 +168,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__from_next = __from;
__state = __tmp_state;
__ret = error;
+ break;
}
else if (__from_next && __from_next < __from_chunk_end)
{
// It is unclear what to return in this case (see DR 382).
__to_next += __conv;
__ret = partial;
+ break;
}
else
{
@@ -175,7 +183,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__to_next += __conv;
}
- if (__from_next < __from_end && __ret == ok)
+ if (__from_next < __from_end)
{
if (__to_next < __to_end)
{
@@ -185,7 +193,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*__to_next++ = L'\0';
}
else
- __ret = partial;
+ {
+ __ret = partial;
+ break;
+ }
}
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2024-06-11 15:22 ` redi at gcc dot gnu.org
@ 2024-06-11 18:45 ` kristian.spangsege at gmail dot com
2024-06-12 20:35 ` redi at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: kristian.spangsege at gmail dot com @ 2024-06-11 18:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #12 from Kristian Spangsege <kristian.spangsege at gmail dot com> ---
These changes look good to me.
A couple of points:
* Should the `else`s still be there now that the preceding branches terminate
with `break`?
* `do_out()` has the same problem as `do_in()`.
* If there is already testing of `codecvt`, should a test be added for the
"empty output" case?
* This could break code that expects the incorrect behavior.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2024-06-11 18:45 ` kristian.spangsege at gmail dot com
@ 2024-06-12 20:35 ` redi at gcc dot gnu.org
2024-06-17 12:44 ` kristian.spangsege at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-12 20:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org
URL| |https://gcc.gnu.org/piperma
| |il/gcc-patches/2024-June/65
| |4410.html
Keywords| |patch
--- Comment #13 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Full patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654410.html
(In reply to Kristian Spangsege from comment #12)
> These changes look good to me.
>
> A couple of points:
> * Should the `else`s still be there now that the preceding branches
> terminate with `break`?
Doesn't really matter either way.
> * `do_out()` has the same problem as `do_in()`.
Yes, addressed in the posted patch.
> * If there is already testing of `codecvt`, should a test be added for the
> "empty output" case?
Yes, addressed in the posted patch.
> * This could break code that expects the incorrect behavior.
That's true for every bug fix. We should conform to the standard though.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2024-06-12 20:35 ` redi at gcc dot gnu.org
@ 2024-06-17 12:44 ` kristian.spangsege at gmail dot com
2024-06-17 15:55 ` redi at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: kristian.spangsege at gmail dot com @ 2024-06-17 12:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #14 from Kristian Spangsege <kristian.spangsege at gmail dot com> ---
Is the underscore intended before `Guard` in `explicit _Guard(__c_locale) { }`
for the GLIBC <= 2.2 case?
Besides that, the full patch looks correct to me.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2024-06-17 12:44 ` kristian.spangsege at gmail dot com
@ 2024-06-17 15:55 ` redi at gcc dot gnu.org
2024-06-27 11:07 ` cvs-commit at gcc dot gnu.org
2024-06-27 11:10 ` redi at gcc dot gnu.org
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-17 15:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #15 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Oops, that's definitely not intended! Good catch, thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2024-06-17 15:55 ` redi at gcc dot gnu.org
@ 2024-06-27 11:07 ` cvs-commit at gcc dot gnu.org
2024-06-27 11:10 ` redi at gcc dot gnu.org
8 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-27 11:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
--- Comment #16 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:
https://gcc.gnu.org/g:73ad57c244c283bf6da0c16630212f11b945eda5
commit r15-1693-g73ad57c244c283bf6da0c16630212f11b945eda5
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Tue Jun 11 16:45:43 2024 +0100
libstdc++: Fix std::codecvt<wchar_t, char, mbstate_t> for empty dest
[PR37475]
For the GNU locale model, codecvt::do_out and codecvt::do_in incorrectly
return 'ok' when the destination range is empty. That happens because
detecting incomplete output is done in the loop body, and the loop is
never even entered if to == to_end.
By restructuring the loop condition so that we check the output range
separately, we can ensure that for a non-empty source range, we always
enter the loop at least once, and detect if the destination range is too
small.
The loops also seem easier to reason about if we return immediately on
any error, instead of checking the result twice on every iteration. We
can use an RAII type to restore the locale before returning, which also
simplifies all the other member functions.
libstdc++-v3/ChangeLog:
PR libstdc++/37475
* config/locale/gnu/codecvt_members.cc (Guard): New RAII type.
(do_out, do_in): Return partial if the destination is empty but
the source is not. Use Guard to restore locale on scope exit.
Return immediately on any conversion error.
(do_encoding, do_max_length, do_length): Use Guard.
* testsuite/22_locale/codecvt/in/char/37475.cc: New test.
* testsuite/22_locale/codecvt/in/wchar_t/37475.cc: New test.
* testsuite/22_locale/codecvt/out/char/37475.cc: New test.
* testsuite/22_locale/codecvt/out/wchar_t/37475.cc: New test.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug libstdc++/37475] codecvt::do_in/do_out functions return "ok" when the output sequence has zero length
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2024-06-27 11:07 ` cvs-commit at gcc dot gnu.org
@ 2024-06-27 11:10 ` redi at gcc dot gnu.org
8 siblings, 0 replies; 14+ messages in thread
From: redi at gcc dot gnu.org @ 2024-06-27 11:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37475
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Target Milestone|--- |15.0
Resolution|--- |FIXED
--- Comment #17 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Fixed on trunk (with the typo fixed).
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-06-27 11:10 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-11 9:34 [Bug libstdc++/37475] New: codecvt::do_in/do_out functions return "ok" when the output sequence has zero length tsyvarev at ispras dot ru
2008-09-11 9:36 ` [Bug libstdc++/37475] " tsyvarev at ispras dot ru
2008-09-11 10:32 ` paolo dot carlini at oracle dot com
2008-09-11 17:40 ` [Bug libstdc++/37475] [DR 382] " paolo dot carlini at oracle dot com
2008-09-11 17:40 ` paolo dot carlini at oracle dot com
[not found] <bug-37475-4@http.gcc.gnu.org/bugzilla/>
2024-06-11 15:09 ` [Bug libstdc++/37475] " redi at gcc dot gnu.org
2024-06-11 15:12 ` redi at gcc dot gnu.org
2024-06-11 15:22 ` redi at gcc dot gnu.org
2024-06-11 18:45 ` kristian.spangsege at gmail dot com
2024-06-12 20:35 ` redi at gcc dot gnu.org
2024-06-17 12:44 ` kristian.spangsege at gmail dot com
2024-06-17 15:55 ` redi at gcc dot gnu.org
2024-06-27 11:07 ` cvs-commit at gcc dot gnu.org
2024-06-27 11:10 ` redi at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).