public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char
@ 2009-01-27 19:02 keithw at mit dot edu
  2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
  2009-02-03  1:31 ` drepper at redhat dot com
  0 siblings, 2 replies; 3+ messages in thread
From: keithw at mit dot edu @ 2009-01-27 19:02 UTC (permalink / raw)
  To: glibc-bugs

Hello,

POSIX requires that iconv() stop conversion if the output buffer isn't large 
enough to hold the entire converted input and return E2BIG. iconv() should stop 
"just prior to the input bytes that would cause the output buffer to overflow." 
Please see http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html .

This is helpful behavior, since it allows the application to lengthen the 
output buffer and then resume processing from where iconv() left off.

GNU libiconv's iconv() seems to handle the E2BIG case correctly.

But glibc's iconv() does not let an application gracefully restart from E2BIG, 
because it partially converts as much of an output sequence as it can, then 
leaves the input and output pointers in an inconsistent state. In some cases 
(such as with a TRANSLIT conversion), iconv() partially advances the output 
pointer to reflect the portion of the incomplete multibyte sequence it output, 
but does not advance the input pointer.

When an application restarts conversion with a larger buffer, this leads to 
garbage in the output.

For example, when converting the UTF-8 registered trademark sign to 
ASCII//TRANSLIT, iconv() wants to write out the three-byte sequence "(R)". If 
it does not have room at the end of the buffer for this three-byte sequence, it 
should not convert the character at all, and leave the output pointer to the 
end of the successfully-converted output, and the input pointer to just prior 
to the start of the registered trademark character.

Instead, what iconv() actually does is output as much of "(R)" as it can (for 
example, "(R"), update the output pointer to reflect this partial output (e.g., 
by two bytes), but then NOT update the input pointer.

I have attached a code sample that demonstrates this behavior.

If iconv() is resumed after E2BIG, it converts the registered trademark sign 
again, leading to output like "(R(R)". The application has no way of knowing 
how many bytes prior to the output pointer are actually the partial output of 
an unsuccessfully-converter multibyte sequence.

The only workaround I have found is to keep increasing the output buffer size 
and restarting the conversion from scratch until the entire conversion works in 
one go. This is not very efficient and is not what POSIX seems to have 
intended.

I'm using the latest CVS glibc (gnu_get_libc_version() reports 2.9.90). I 
configured with --enable-add-ons=nptl --enable-kernel=2.6.24 and then added "CPPFLAGS += -fno-stack-protector" to configparms. This is on Linux 2.6.24 and 
other versions. I compiled with gcc 4.2.4 and ld 2.18.0.20080103 from Ubuntu.

Thanks for your attention to this,
Keith Winstein
keithw@mit.edu

-- 
           Summary: iconv() incorrectly handles E2BIG condition by partially
                    processing output char
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: keithw at mit dot edu
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=9793

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libc/9793] iconv() incorrectly handles E2BIG condition by partially processing output char
  2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
@ 2009-01-27 19:08 ` keithw at mit dot edu
  2009-02-03  1:31 ` drepper at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: keithw at mit dot edu @ 2009-01-27 19:08 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From keithw at mit dot edu  2009-01-27 19:08 -------
Created an attachment (id=3691)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=3691&action=view)
Test case for iconv() E2BIG partial transliteration

Here is a test case that demonstrates the E2BIG case. Converting the UTF-8
"registered trademark" symbol into ASCII//TRANSLIT, iconv() wants to write out
"(R)". But here it only has two bytes. The POSIX and GNU libiconv() behavior is
to advance inbuf by zero, advance outbuf by zero, and return E2BIG -- stopping
the conversion prior to the overflow.

But the glibc iconv() behavior is to advance inbuf by 0, advance outbuf by 2,
and write "(R", and return E2BIG. This is an incomplete conversion that the
application has no way of correcting, because of the inconsistent state of the
pointers. If the application restarts iconv() from the current location of
inbuf and outbuf with a larger output buffer, it will get garbage -- like
"(R(R)", since the registered trademark symbol will be converted again,
appended to the original incomplete transliteration.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9793

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libc/9793] iconv() incorrectly handles E2BIG condition by partially processing output char
  2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
  2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
@ 2009-02-03  1:31 ` drepper at redhat dot com
  1 sibling, 0 replies; 3+ messages in thread
From: drepper at redhat dot com @ 2009-02-03  1:31 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2009-02-03 01:31 -------
Fixed in cvs.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=9793

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-02-03  1:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-27 19:02 [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char keithw at mit dot edu
2009-01-27 19:08 ` [Bug libc/9793] " keithw at mit dot edu
2009-02-03  1:31 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).