From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12349 invoked by alias); 23 Dec 2011 01:59:37 -0000 Received: (qmail 12291 invoked by uid 22791); 23 Dec 2011 01:59:27 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 23 Dec 2011 01:59:01 +0000 From: "ezyang at mit dot edu" To: glibc-bugs@sources.redhat.com Subject: [Bug libc/13541] New: iconv //IGNORE charsets are inconsistent about INBUF* state after EILSEQ Date: Fri, 23 Dec 2011 01:59:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ezyang at mit dot edu X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: drepper.fsp at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2011-12/txt/msg00120.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=13541 Bug #: 13541 Summary: iconv //IGNORE charsets are inconsistent about INBUF* state after EILSEQ Product: glibc Version: 2.14 Status: NEW Severity: normal Priority: P2 Component: libc AssignedTo: drepper.fsp@gmail.com ReportedBy: ezyang@mit.edu Classification: Unclassified The iconv infopage says the following: `EILSEQ' The conversion stopped because of an invalid byte sequence in the input. After the call, `*INBUF' points at the first byte of the invalid byte sequence. However, this is clearly not the case when an //IGNORE target charset is specified: #include #include #include #include int main() { iconv_t i = iconv_open("ascii//IGNORE", "utf-8"); char inbuf[10000]; char outbuf[10000]; char *in = inbuf; char *out = outbuf; int inleft = 10000; int outleft = 10000; int s; memset(inbuf, 0x77, 10000); inbuf[0] = 0xC2; inbuf[1] = 0xA2; s = iconv(i, &in, &inleft, &out, &outleft); printf("s = %d, errno = %d, in[0] = %x, inleft = %d\n", s, errno, (unsigned char)*in, inleft); } Outputs the following: s = -1, errno = 84, in[0] = 77, inleft = 1839 'iconv' appears to have gobbled up another ~8000 bytes after the invalid byte sequence, before returning EILSEQ (84). The documentation here cannot possibly correct, if we want 'IGNORE' to actually do anything. So we have two options: 1. Claim that the semantics of EILSEQ change when the magic //IGNORE flag is specified, and require user code to work around it properly. This is what the '-c' flag in iconv_prog.c does, by magically "converting" these errors into E2BIG errors, and re-running iconv appropriately. 2. Claim that the this API is wrong, and modify the API such that an iconv operating on an //IGNORE character set *never* returns EILSEQ (what one might expect, since IGNORE is supposed to allow us to ignore sequences that are illegal in the target). This would make glibc's iconv implementation consistent with libiconv's. I favor (2), since it makes client code considerably simpler and easier to implement correctly. -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.