public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rrt at sc3d dot org" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual Date: Sat, 18 Feb 2023 20:48:11 +0000 [thread overview] Message-ID: <bug-29913-131-puZJsvMnjP@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-29913-131@http.sourceware.org/bugzilla/> https://sourceware.org/bugzilla/show_bug.cgi?id=29913 Reuben Thomas <rrt at sc3d dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rrt at sc3d dot org --- Comment #1 from Reuben Thomas <rrt at sc3d dot org> --- I'm the maintainer of Recode (formerly GNU Recode), the widely-used character conversion utility. I came across this odd behaviour some years ago, but I only just realised that it is in fact a bug in glibc. My analysis is the same as the reporter's: the POSIX standard says unambiguously that EILSEQ is only returned for invalid input, and when an exact match to the output character set is not possible, an implementation-dependent conversion is performed. A very simple example using the iconv(1) program: $ hd foo.data 00000000 c2 b4 |..| 00000002 # This is ACUTE ACCENT U+00B4 $ iconv -f UTF-8 -t ISO-8859-15 foo.data iconv: illegal input sequence at position 0 # This is wrong! The input is valid UTF-8 $ iconv -f UTF-8 -t ISO-8859-15//TRANSLIT foo.data ' # This is the output one might expect in the previous case $ iconv -f UTF-8 -t ISO-8859-1 ~/Downloads/foo.data | hd 00000000 b4 |.| 00000001 # As we'd expect, as ACUTE ACCENT exists in ISO-8859-1 As far as I can see from looking at the code, the conversion code from Unicode to ISO-8859-15 is handled by iconvdata/8bit-gap.c. When it cannot find an ISO-8859-15 equivalent for the given UCS4 character, it calls STANDARD_TO_LOOP_ERR_HANDLER. This sets the error to __GCONV_ILLEGAL_INPUT, which is eventually converted to EILSEQ. This is wrong! STANDARD_TO_LOOP_ERR_HANDLER should use some other error code. I cannot see a suitable one in the present set (enum of __GCONV_* in iconv/gconv.h). -- You are receiving this mail because: You are on the CC list for the bug.
next prev parent reply other threads:[~2023-02-18 20:48 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-12-16 23:03 [Bug libc/29913] New: " steffen at sdaoden dot eu 2023-02-18 20:48 ` rrt at sc3d dot org [this message] 2023-02-18 21:20 ` [Bug libc/29913] " rrt at sc3d dot org 2023-02-18 22:43 ` steffen at sdaoden dot eu 2023-02-19 0:40 ` bruno at clisp dot org 2023-02-19 0:51 ` bruno at clisp dot org 2023-02-19 1:58 ` steffen at sdaoden dot eu 2023-02-19 10:06 ` rrt at sc3d dot org 2023-02-19 10:15 ` rrt at sc3d dot org 2023-02-19 10:22 ` rrt at sc3d dot org 2023-02-19 22:57 ` steffen at sdaoden dot eu 2023-02-19 23:02 ` steffen at sdaoden dot eu 2023-02-20 20:09 ` steffen at sdaoden dot eu 2023-02-20 20:54 ` steffen at sdaoden dot eu 2023-02-20 21:52 ` steffen at sdaoden dot eu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-29913-131-puZJsvMnjP@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).