public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char
@ 2009-01-27 19:02 keithw at mit dot edu
2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
2009-02-03 1:31 ` drepper at redhat dot com
0 siblings, 2 replies; 4+ messages in thread
From: keithw at mit dot edu @ 2009-01-27 19:02 UTC (permalink / raw)
To: glibc-bugs
Hello,
POSIX requires that iconv() stop conversion if the output buffer isn't large
enough to hold the entire converted input and return E2BIG. iconv() should stop
"just prior to the input bytes that would cause the output buffer to overflow."
Please see http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html .
This is helpful behavior, since it allows the application to lengthen the
output buffer and then resume processing from where iconv() left off.
GNU libiconv's iconv() seems to handle the E2BIG case correctly.
But glibc's iconv() does not let an application gracefully restart from E2BIG,
because it partially converts as much of an output sequence as it can, then
leaves the input and output pointers in an inconsistent state. In some cases
(such as with a TRANSLIT conversion), iconv() partially advances the output
pointer to reflect the portion of the incomplete multibyte sequence it output,
but does not advance the input pointer.
When an application restarts conversion with a larger buffer, this leads to
garbage in the output.
For example, when converting the UTF-8 registered trademark sign to
ASCII//TRANSLIT, iconv() wants to write out the three-byte sequence "(R)". If
it does not have room at the end of the buffer for this three-byte sequence, it
should not convert the character at all, and leave the output pointer to the
end of the successfully-converted output, and the input pointer to just prior
to the start of the registered trademark character.
Instead, what iconv() actually does is output as much of "(R)" as it can (for
example, "(R"), update the output pointer to reflect this partial output (e.g.,
by two bytes), but then NOT update the input pointer.
I have attached a code sample that demonstrates this behavior.
If iconv() is resumed after E2BIG, it converts the registered trademark sign
again, leading to output like "(R(R)". The application has no way of knowing
how many bytes prior to the output pointer are actually the partial output of
an unsuccessfully-converter multibyte sequence.
The only workaround I have found is to keep increasing the output buffer size
and restarting the conversion from scratch until the entire conversion works in
one go. This is not very efficient and is not what POSIX seems to have
intended.
I'm using the latest CVS glibc (gnu_get_libc_version() reports 2.9.90). I
configured with --enable-add-ons=nptl --enable-kernel=2.6.24 and then added "CPPFLAGS += -fno-stack-protector" to configparms. This is on Linux 2.6.24 and
other versions. I compiled with gcc 4.2.4 and ld 2.18.0.20080103 from Ubuntu.
Thanks for your attention to this,
Keith Winstein
keithw@mit.edu
--
Summary: iconv() incorrectly handles E2BIG condition by partially
processing output char
Product: glibc
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: keithw at mit dot edu
CC: glibc-bugs at sources dot redhat dot com
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://sourceware.org/bugzilla/show_bug.cgi?id=9793
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug libc/9793] iconv() incorrectly handles E2BIG condition by partially processing output char
2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
@ 2009-01-27 19:08 ` keithw at mit dot edu
2009-02-03 1:31 ` drepper at redhat dot com
1 sibling, 0 replies; 4+ messages in thread
From: keithw at mit dot edu @ 2009-01-27 19:08 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From keithw at mit dot edu 2009-01-27 19:08 -------
Created an attachment (id=3691)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=3691&action=view)
Test case for iconv() E2BIG partial transliteration
Here is a test case that demonstrates the E2BIG case. Converting the UTF-8
"registered trademark" symbol into ASCII//TRANSLIT, iconv() wants to write out
"(R)". But here it only has two bytes. The POSIX and GNU libiconv() behavior is
to advance inbuf by zero, advance outbuf by zero, and return E2BIG -- stopping
the conversion prior to the overflow.
But the glibc iconv() behavior is to advance inbuf by 0, advance outbuf by 2,
and write "(R", and return E2BIG. This is an incomplete conversion that the
application has no way of correcting, because of the inconsistent state of the
pointers. If the application restarts iconv() from the current location of
inbuf and outbuf with a larger output buffer, it will get garbage -- like
"(R(R)", since the registered trademark symbol will be converted again,
appended to the original incomplete transliteration.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=9793
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug libc/9793] iconv() incorrectly handles E2BIG condition by partially processing output char
2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
@ 2009-02-03 1:31 ` drepper at redhat dot com
1 sibling, 0 replies; 4+ messages in thread
From: drepper at redhat dot com @ 2009-02-03 1:31 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From drepper at redhat dot com 2009-02-03 01:31 -------
Fixed in cvs.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=9793
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <bug-9793-131@http.sourceware.org/bugzilla/>]
end of thread, other threads:[~2014-07-01 21:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
2009-02-03 1:31 ` drepper at redhat dot com
[not found] <bug-9793-131@http.sourceware.org/bugzilla/>
2014-07-01 21:01 ` fweimer at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).