From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12224 invoked by alias); 27 Jan 2009 19:02:32 -0000 Received: (qmail 10052 invoked by uid 48); 27 Jan 2009 19:02:17 -0000 Date: Tue, 27 Jan 2009 19:02:00 -0000 From: "keithw at mit dot edu" To: glibc-bugs@sources.redhat.com Message-ID: <20090127190216.9793.keithw@mit.edu> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char X-Bugzilla-Reason: CC Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00125.txt.bz2 Hello, POSIX requires that iconv() stop conversion if the output buffer isn't large enough to hold the entire converted input and return E2BIG. iconv() should stop "just prior to the input bytes that would cause the output buffer to overflow." Please see http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html . This is helpful behavior, since it allows the application to lengthen the output buffer and then resume processing from where iconv() left off. GNU libiconv's iconv() seems to handle the E2BIG case correctly. But glibc's iconv() does not let an application gracefully restart from E2BIG, because it partially converts as much of an output sequence as it can, then leaves the input and output pointers in an inconsistent state. In some cases (such as with a TRANSLIT conversion), iconv() partially advances the output pointer to reflect the portion of the incomplete multibyte sequence it output, but does not advance the input pointer. When an application restarts conversion with a larger buffer, this leads to garbage in the output. For example, when converting the UTF-8 registered trademark sign to ASCII//TRANSLIT, iconv() wants to write out the three-byte sequence "(R)". If it does not have room at the end of the buffer for this three-byte sequence, it should not convert the character at all, and leave the output pointer to the end of the successfully-converted output, and the input pointer to just prior to the start of the registered trademark character. Instead, what iconv() actually does is output as much of "(R)" as it can (for example, "(R"), update the output pointer to reflect this partial output (e.g., by two bytes), but then NOT update the input pointer. I have attached a code sample that demonstrates this behavior. If iconv() is resumed after E2BIG, it converts the registered trademark sign again, leading to output like "(R(R)". The application has no way of knowing how many bytes prior to the output pointer are actually the partial output of an unsuccessfully-converter multibyte sequence. The only workaround I have found is to keep increasing the output buffer size and restarting the conversion from scratch until the entire conversion works in one go. This is not very efficient and is not what POSIX seems to have intended. I'm using the latest CVS glibc (gnu_get_libc_version() reports 2.9.90). I configured with --enable-add-ons=nptl --enable-kernel=2.6.24 and then added "CPPFLAGS += -fno-stack-protector" to configparms. This is on Linux 2.6.24 and other versions. I compiled with gcc 4.2.4 and ld 2.18.0.20080103 from Ubuntu. Thanks for your attention to this, Keith Winstein keithw@mit.edu -- Summary: iconv() incorrectly handles E2BIG condition by partially processing output char Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: libc AssignedTo: drepper at redhat dot com ReportedBy: keithw at mit dot edu CC: glibc-bugs at sources dot redhat dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://sourceware.org/bugzilla/show_bug.cgi?id=9793 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.