From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3F2F93858C83; Sat, 18 Feb 2023 20:48:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F2F93858C83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1676753292; bh=557SaXNkYpNNK5LpVs8alnlqWxYXjAZmKQscsq8ksRI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=K0PYsL6FHP3Nmx0o94uP5KwvuHIf9YlUJtGJqbL93mte86CTQlnAwGpin+u0ttIBl ZyoABaoDzSAbFxzMOJ7d3rs5Av3ufq8Go8Jz8PpGWsxq38p+fSypAP5nu5z51gfi1Z OfC2pKUq0bDi7lLFjNlWbTjBphK2aVQVz9c605LQ= From: "rrt at sc3d dot org" To: glibc-bugs@sourceware.org Subject: [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual Date: Sat, 18 Feb 2023 20:48:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.36 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rrt at sc3d dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D29913 Reuben Thomas changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rrt at sc3d dot org --- Comment #1 from Reuben Thomas --- I'm the maintainer of Recode (formerly GNU Recode), the widely-used charact= er conversion utility. I came across this odd behaviour some years ago, but I only just realised t= hat it is in fact a bug in glibc. My analysis is the same as the reporter's: the POSIX standard says unambiguously that EILSEQ is only returned for invalid input, and when an exact match to the output character set is not possible,= an implementation-dependent conversion is performed. A very simple example using the iconv(1) program: $ hd foo.data 00000000 c2 b4 |..| 00000002 # This is ACUTE ACCENT U+00B4 $ iconv -f UTF-8 -t ISO-8859-15 foo.data iconv: illegal input sequence at position 0 # This is wrong! The input is valid UTF-8 $ iconv -f UTF-8 -t ISO-8859-15//TRANSLIT foo.data ' # This is the output one might expect in the previous case $ iconv -f UTF-8 -t ISO-8859-1 ~/Downloads/foo.data | hd 00000000 b4 |.| 00000001 # As we'd expect, as ACUTE ACCENT exists in ISO-8859-1 As far as I can see from looking at the code, the conversion code from Unic= ode to ISO-8859-15 is handled by iconvdata/8bit-gap.c. When it cannot find an ISO-8859-15 equivalent for the given UCS4 character, it calls STANDARD_TO_LOOP_ERR_HANDLER. This sets the error to __GCONV_ILLEGAL_INPUT, which is eventually converted to EILSEQ. This is wrong! STANDARD_TO_LOOP_ERR_HANDLER should use some other error code. I cannot see= a suitable one in the present set (enum of __GCONV_* in iconv/gconv.h). --=20 You are receiving this mail because: You are on the CC list for the bug.=