public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
@ 2022-12-16 23:03 steffen at sdaoden dot eu
  2023-02-18 20:48 ` [Bug libc/29913] " rrt at sc3d dot org
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2022-12-16 23:03 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

            Bug ID: 29913
           Summary: iconv(3) is not POSIX compliant, and does not conform
                    to linux man-pages manual
           Product: glibc
           Version: 2.36
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: steffen at sdaoden dot eu
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Disclaimer: i have reported this in the past but the issue was closed.

The problem is that without //TRANSLIT GNU iconv(3) fails to perform the

  If iconv( ) encounters a character in the input buffer that is valid, but for
which an identical character does not exist in the target codeset, iconv( )
shall perform an implementation-defined conversion on this character.

POSIX iconv(3) (Vol. 2: System Interfaces, Issue 7) requirement.
Instead GNU libc returns EILSEQ which is wrong as POSIX defined EILSEQ only for

  [EILSEQ] Input conversion stopped due to an input byte that does not belong
to the input codeset.

The Linux man-pages 6.01 manual (2022‐10‐09) says the same.  But GNU libc
_does_ fail for EILSEQ without //TRANSLIT even if the input is valid UTF-8.
As can be seen by running this (shortened variant of a config test program).
I say "Bye!" already here, and hope it gets fixed!

#include <string.h>
#include <errno.h>
#include <stdio.h>
#include <iconv.h>
int main(void){
        char inb[16], oub[16], *inbp, *oubp;
        iconv_t id;
        size_t inl, oul;
        int rv;

        memcpy(inbp = inb, "\341\203\276", sizeof("\341\203\276"));
        inl = sizeof("\341\203\276") -1;
        oul = sizeof oub;
        oubp = oub;

        rv = 1;
        if((id = iconv_open("us-ascii"/*//TRANSLIT"*/, "utf-8")) ==
(iconv_t)-1)
                goto jleave;

        rv = 14;
        if(iconv(id, &inbp, &inl, &oubp, &oul) == (size_t)-1)
{
fprintf(stderr, "error %s %d==%d\n",strerror(errno),errno,errno==EILSEQ);
                goto jleave;
}

fprintf(stderr, "bummer\n");
jleave:
        if(id != (iconv_t)-1)
                iconv_close(id);

        return rv;
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
@ 2023-02-18 20:48 ` rrt at sc3d dot org
  2023-02-18 21:20 ` rrt at sc3d dot org
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rrt at sc3d dot org @ 2023-02-18 20:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

Reuben Thomas <rrt at sc3d dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rrt at sc3d dot org

--- Comment #1 from Reuben Thomas <rrt at sc3d dot org> ---
I'm the maintainer of Recode (formerly GNU Recode), the widely-used character
conversion utility.

I came across this odd behaviour some years ago, but I only just realised that
it is in fact a bug in glibc. My analysis is the same as the reporter's: the
POSIX standard says unambiguously that EILSEQ is only returned for invalid
input, and when an exact match to the output character set is not possible, an
implementation-dependent conversion is performed.

A very simple example using the iconv(1) program:

$ hd foo.data
00000000  c2 b4                                             |..|
00000002
# This is ACUTE ACCENT U+00B4
$ iconv -f UTF-8 -t ISO-8859-15 foo.data
iconv: illegal input sequence at position 0
# This is wrong! The input is valid UTF-8
$ iconv -f UTF-8 -t ISO-8859-15//TRANSLIT foo.data
' # This is the output one might expect in the previous case
$ iconv -f UTF-8 -t ISO-8859-1 ~/Downloads/foo.data | hd
00000000  b4                                                |.|
00000001
# As we'd expect, as ACUTE ACCENT exists in ISO-8859-1

As far as I can see from looking at the code, the conversion code from Unicode
to ISO-8859-15 is handled by iconvdata/8bit-gap.c. When it cannot find an
ISO-8859-15 equivalent for the given UCS4 character, it calls
STANDARD_TO_LOOP_ERR_HANDLER. This sets the error to __GCONV_ILLEGAL_INPUT,
which is eventually converted to EILSEQ.  This is wrong!

STANDARD_TO_LOOP_ERR_HANDLER should use some other error code. I cannot see a
suitable one in the present set (enum of __GCONV_* in iconv/gconv.h).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
  2023-02-18 20:48 ` [Bug libc/29913] " rrt at sc3d dot org
@ 2023-02-18 21:20 ` rrt at sc3d dot org
  2023-02-18 22:43 ` steffen at sdaoden dot eu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rrt at sc3d dot org @ 2023-02-18 21:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #2 from Reuben Thomas <rrt at sc3d dot org> ---
Some thoughts about remedying the defect:

1. I guess that the current behaviour needs to be retained in some form,
because clients will rely on it. In particular, it gives a way to detect when
precise conversion is not possible, which iconv's spec does not.

2. However, the current behaviour is a problem for portable programs like
Recode, that need to work with multiple iconv implementations. And, it's a bug!

3. The simplest "implementation-dependent conversion" would be to act as if
either //IGNORE or //TRANSLIT behaviour had been requested.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
  2023-02-18 20:48 ` [Bug libc/29913] " rrt at sc3d dot org
  2023-02-18 21:20 ` rrt at sc3d dot org
@ 2023-02-18 22:43 ` steffen at sdaoden dot eu
  2023-02-19  0:40 ` bruno at clisp dot org
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-18 22:43 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #3 from Steffen (Daode) Nurpmeso <steffen at sdaoden dot eu> ---
It shall simply put a ? (musl uses *), or maybe a configurable character.
Some libraries then put a ? for each byte, other one for the complete sequence
that is skipped over.  ("Normally" the converter "knows" about the character so
much that the latter strives me a good thing.  Like //TRANSLIT does.)

Yes.  I guess the problem is that in "real life" the problem likely does not
occur in that form.
Or the people work around it somehow.
For example, in "my" Linux distribution, they changed their pkg like

-               bsdtar -c $COMPRESSION -f $TARGET *  &&  bsdtar -t -v -f
$TARGET
+               bsdtar --format=gnutar -c $COMPRESSION -f $TARGET *  &&  bsdtar
-t -v -f $TARGET

because some release balls seem to contain falsely encoded paths.
(So that the -- correct! and _very_ complicated!! -- libarchive character
conversion correctly bails.  But the above is easier to handle than doing
upstream reports, and gives immediate success.  (The bogus path on the disc ..
i do not know.  I did not use those packages once the problem was
circumvented.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (2 preceding siblings ...)
  2023-02-18 22:43 ` steffen at sdaoden dot eu
@ 2023-02-19  0:40 ` bruno at clisp dot org
  2023-02-19  0:51 ` bruno at clisp dot org
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bruno at clisp dot org @ 2023-02-19  0:40 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

Bruno Haible <bruno at clisp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bruno at clisp dot org

--- Comment #4 from Bruno Haible <bruno at clisp dot org> ---
> i have reported this in the past but the issue was closed.

This was in https://sourceware.org/bugzilla/show_bug.cgi?id=22908 . Please mark
this bug as related to #22908.

> POSIX defined EILSEQ only for
>
>  [EILSEQ] Input conversion stopped due to an input byte that does not belong to the input codeset.

This sentence only means that when /input conversion stopped due to an input
byte that does not belong to the input codeset/, the function shall fail with
error EILSEQ. It does *not* forbid the function to fail with error EILSEQ for
other reasons. It also does *not* forbid the function to fail with other error
values for other reasons.

This is not specific to iconv; it holds for all functions specified by POSIX.
See
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/V2_chap01.html
section 1.2.

> The Linux man-pages 6.01 manual (2022‐10‐09) says the same.

Nope, it does not say so. According to your interpretation, where this man page
says "The conversion can stop for four reasons" you would like to add a 5th
case.

According to my interpretation of the man page (and I wrote that man page
originally), "An invalid multibyte sequence is encountered in the input" may
also - depending on the implementation - include the case of input that cannot
be meaningfully converted, neither in a reversible nor in a nonreversible way.

In summary: Please close this ticket as INVALID.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (3 preceding siblings ...)
  2023-02-19  0:40 ` bruno at clisp dot org
@ 2023-02-19  0:51 ` bruno at clisp dot org
  2023-02-19  1:58 ` steffen at sdaoden dot eu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bruno at clisp dot org @ 2023-02-19  0:51 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #5 from Bruno Haible <bruno at clisp dot org> ---
(In reply to Reuben Thomas from comment #2)
> 1. I guess that the current behaviour needs to be retained in some form,
> because clients will rely on it.

Correct. And GNU libiconv (a different implementation of iconv, for systems
that have a deficient iconv implementation) implements the same behaviour.

> 2. However, the current behaviour is a problem for portable programs like
> Recode, that need to work with multiple iconv implementations.

If you need code that works with multiple iconv implementations, take a look at
gnulib/lib/unicodeio.c lines 137..154 or gnulib/lib/striconveh.c lines
950..962. You see that the problem is that replacing unknown or inconvertible
inputs with '?' or '*' or NUL is
- just not yielding practically useful behaviour (especially because the caller
then cannot transform a buffer all at once, a purpose for which the iconv
function was initially designed),
- requiring platform dependent recognition heuristics.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (4 preceding siblings ...)
  2023-02-19  0:51 ` bruno at clisp dot org
@ 2023-02-19  1:58 ` steffen at sdaoden dot eu
  2023-02-19 10:06 ` rrt at sc3d dot org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-19  1:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #6 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
Do not know how to relate (unless you did by noting).

Linux man says

  The conversion can stop for four reasons

then the only thing that may match is

  An invalid multibyte sequence is encountered in the input

and that is not what is going on.
It is not an invalid input.

And no, iconv users surely always have to be prepared for a loop i would say,
just in case the input has a problem and needs to be replaced with a
replacement character.

That gnulib snippet is terrible.  I have such a thing also in order to be able
to perform an iconv test (we pass through what the lib does).
For example, this snippet was in the program i took maintainership over before
2004:


/*
 * Fault-tolerant iconv() function.
 */
static size_t
iconv_ft(iconv_t cd, char **inb, size_t *inbleft, char **outb, size_t
*outbleft)
{
        size_t sz = 0;

        while ((sz = iconv(cd, inb, inbleft, outb, outbleft)) == (size_t)-1
                        && (errno == EILSEQ || errno == EINVAL)) {
                if (*inbleft > 0) {
                        (*inb)++;
                        (*inbleft)--;
                } else {
                        **outb = '\0';
                        break;
                }
                if (*outbleft > 0) {
                        *(*outb)++ = '?';
                        (*outbleft)--;
                } else {
                        **outb = '\0';
                        break;
                }
        }
        return sz;
}

Instead GNU should have reused the EINVAL error for this case.  Or IO, NODATA,
NOENT, NOMSG, NOTSUP, NOSYS, NOTOBACCO.

Anyhow, that gnulib snippet was a shock.  What a mess.

The problem with the GNU approach is that portable software that glues to the
POSIX standard and/or reads the Linux manual has to perform a lot of checks in
order to find out whether the native iconv supports / wants //TRANSLIT to get
the behaviour that the standard describes.

At least in my opinion.
And, as you say, all others but GNU follow this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (5 preceding siblings ...)
  2023-02-19  1:58 ` steffen at sdaoden dot eu
@ 2023-02-19 10:06 ` rrt at sc3d dot org
  2023-02-19 10:15 ` rrt at sc3d dot org
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rrt at sc3d dot org @ 2023-02-19 10:06 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #7 from Reuben Thomas <rrt at sc3d dot org> ---
(In reply to Bruno Haible from comment #4)
> >
> >  [EILSEQ] Input conversion stopped due to an input byte that does not belong to the input codeset.
> 
> This sentence only means that when /input conversion stopped due to an input
> byte that does not belong to the input codeset/, the function shall fail
> with error EILSEQ. It does *not* forbid the function to fail with error
> EILSEQ for other reasons. It also does *not* forbid the function to fail
> with other error values for other reasons.
> 
> This is not specific to iconv; it holds for all functions specified by
> POSIX. See
> https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/
> V2_chap01.html section 1.2.

I have read this section through several times, in particular the sections on
"ERRORS" and "RETURN VALUE" and I can't see anything relevant, sorry; please
could you elaborate?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (6 preceding siblings ...)
  2023-02-19 10:06 ` rrt at sc3d dot org
@ 2023-02-19 10:15 ` rrt at sc3d dot org
  2023-02-19 10:22 ` rrt at sc3d dot org
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rrt at sc3d dot org @ 2023-02-19 10:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #8 from Reuben Thomas <rrt at sc3d dot org> ---
(In reply to Bruno Haible from comment #4)
> 
> According to my interpretation of the man page (and I wrote that man page
> originally), "An invalid multibyte sequence is encountered in the input" may
> also - depending on the implementation - include the case of input that
> cannot be meaningfully converted, neither in a reversible nor in a
> nonreversible way.

Sorry, but this is an unwarranted interpretation. It's unreasonable without
extra explanation to expect the reader to recognize that "invalid" refers to
the wider context of the conversion. The fact that it says "invalid multibyte
sequence" reinforces this impression: if your interpretation were correct, then
iconv would not be expected to return EILSEQ when a single-byte sequence was
not translatable, only when a multibyte sequence is untranslatable.

I'll file a separate bug about the documentation. The glibc manual also, as far
as I can see, does not document the actual (useful!) behaviour.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (7 preceding siblings ...)
  2023-02-19 10:15 ` rrt at sc3d dot org
@ 2023-02-19 10:22 ` rrt at sc3d dot org
  2023-02-19 22:57 ` steffen at sdaoden dot eu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: rrt at sc3d dot org @ 2023-02-19 10:22 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #9 from Reuben Thomas <rrt at sc3d dot org> ---
(In reply to Bruno Haible from comment #5)
> 
> If you need code that works with multiple iconv implementations, take a look
> at gnulib/lib/unicodeio.c lines 137..154 or gnulib/lib/striconveh.c lines
> 950..962. You see that the problem is that replacing unknown or
> inconvertible inputs with '?' or '*' or NUL is
> - just not yielding practically useful behaviour (especially because the
> caller then cannot transform a buffer all at once, a purpose for which the
> iconv function was initially designed),
> - requiring platform dependent recognition heuristics.

For those who need to work with multiple implementations, it looks like this
code could usefully be exposed in its own gnulib API.

Since most of the problems I've had with Recode since taking it over have
arisen from iconv, and coping with different implementations just makes it
worse, I think I will retreat to using GNU libiconv (which Recode used to use)
where at least I only have one implementation to deal with.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (8 preceding siblings ...)
  2023-02-19 10:22 ` rrt at sc3d dot org
@ 2023-02-19 22:57 ` steffen at sdaoden dot eu
  2023-02-19 23:02 ` steffen at sdaoden dot eu
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-19 22:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #10 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
I mean the GNU approach definetely has merits.
If it only would not be automatic, but require //OUCNVERR
or some other hypothetic explicit configuration.

As it stands GNU stands out with its behaviour, and i as
a programmer do not know how to differentiate in between an input
ILSEQ (dramatical!) or and output ILSEQ (email use case might try
different character set).  I can maybe a bit -- if i know for sure
that the iconv i use is the GNU one, which might not be true in
practice (though i know of no other dynamic library that can
replace it, only of libc-built-in and GNU iconv lib choices).
If only it were a dedicated errno value.

For me the need to go //TRANSLIT is a well hm painful GNU-specific
need and way, and it shall be noted it is "transliteration":
something entirely different than "an implementation-defined
conversion on this character" that in reality is either * or ?.
It could do whatever, say turning a hypothetic calligraphic "tiger
protects the house" with a download link for a book of Dostojewski
or something.

How can i test this??
How can i as a programmer write a test that tests my program works
correctly regarding iconv if i have to use //TRANSLIT that may
change behind the lines and "improve" the transliteration because
someone spend time on some character set and found a better one?
I currently use "U+1FA78/f0 9f a9 b9/;DROP OF BLOOD" which right
now works everywhere, but //TRANSLIT may turn it to an embedded
picture of Bela Lugosi?  Nosferatu?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (9 preceding siblings ...)
  2023-02-19 22:57 ` steffen at sdaoden dot eu
@ 2023-02-19 23:02 ` steffen at sdaoden dot eu
  2023-02-20 20:09 ` steffen at sdaoden dot eu
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-19 23:02 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #11 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
iconv could do much more for programmers anyway.
For example email software has to know whether an actual character set is, in
fact, US-ASCII, and the iconv implementation surely knows.
Yet it does not expose an API for this particular thing ("official name").
Like normalize_name(), and i have a dedicated is_ascii like

        /* In reversed MIME preference order */
        static char const * const names[] = {"csASCII", "cp367", "IBM367",
"us",
                        "ISO646-US", "ISO_646.irv:1991", "ANSI_X3.4-1986",
"iso-ir-6",
                        "ANSI_X3.4-1968", "ASCII", "US-ASCII"};

I am pretty sure GNU iconv will map all those names to the thing.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (10 preceding siblings ...)
  2023-02-19 23:02 ` steffen at sdaoden dot eu
@ 2023-02-20 20:09 ` steffen at sdaoden dot eu
  2023-02-20 20:54 ` steffen at sdaoden dot eu
  2023-02-20 21:52 ` steffen at sdaoden dot eu
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-20 20:09 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

Steffen Nurpmeso <steffen at sdaoden dot eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #12 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
Actually i have forgotten about
https://austingroupbugs.net/view.php?id=1007
because the behaviour bugs me.
Sorry.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (11 preceding siblings ...)
  2023-02-20 20:09 ` steffen at sdaoden dot eu
@ 2023-02-20 20:54 ` steffen at sdaoden dot eu
  2023-02-20 21:52 ` steffen at sdaoden dot eu
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-20 20:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #13 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
P.S.:
glibc is wrong wrong wrong!
It should NOT NOT NOT give an ILSEQ for output conversion!

I know mbrtowc does, there this surely comes from; but that sits upon a valid
input character!

Having invalid, broken, illegal input is a dramatic failure!
Not being able to convert valid input to another character set is entirely
different.
(Sebor said de facto the same for the POSIX standard issue, in 2016.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug libc/29913] iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual
  2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
                   ` (12 preceding siblings ...)
  2023-02-20 20:54 ` steffen at sdaoden dot eu
@ 2023-02-20 21:52 ` steffen at sdaoden dot eu
  13 siblings, 0 replies; 15+ messages in thread
From: steffen at sdaoden dot eu @ 2023-02-20 21:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=29913

--- Comment #14 from Steffen Nurpmeso <steffen at sdaoden dot eu> ---
P.P.S.: sorry for the noise!
But now, in order to deal with that (as the GNU approach has its merits,
really), i downloaded GNU libiconv, and in wchar_to_loop_convert() i see

     size_t res = unicode_loop_convert(&wcd->parent,
                                        &inptr,&inleft,
                                        &bufptr,&bufleft);
      if (res == (size_t)(-1)) {
        if (errno == EILSEQ)
          /* Invalid input. */

And so i stop because i wholeheartly agree.

I hope it is ok to assume that matching __GNU_LIBRARY__ and _LIBICONV_VERSION
(unfortunately this is all compile-time only) is all the way to go to get
EILSEQ upon output conversion error?

Thank you.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-02-20 21:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 23:03 [Bug libc/29913] New: iconv(3) is not POSIX compliant, and does not conform to linux man-pages manual steffen at sdaoden dot eu
2023-02-18 20:48 ` [Bug libc/29913] " rrt at sc3d dot org
2023-02-18 21:20 ` rrt at sc3d dot org
2023-02-18 22:43 ` steffen at sdaoden dot eu
2023-02-19  0:40 ` bruno at clisp dot org
2023-02-19  0:51 ` bruno at clisp dot org
2023-02-19  1:58 ` steffen at sdaoden dot eu
2023-02-19 10:06 ` rrt at sc3d dot org
2023-02-19 10:15 ` rrt at sc3d dot org
2023-02-19 10:22 ` rrt at sc3d dot org
2023-02-19 22:57 ` steffen at sdaoden dot eu
2023-02-19 23:02 ` steffen at sdaoden dot eu
2023-02-20 20:09 ` steffen at sdaoden dot eu
2023-02-20 20:54 ` steffen at sdaoden dot eu
2023-02-20 21:52 ` steffen at sdaoden dot eu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).