From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Bug in libiconv?
Date: Tue, 25 Jan 2011 06:36:00 -0000 [thread overview]
Message-ID: <20110124154158.GA15279@calimero.vinschen.de> (raw)
Hi Chuck,
hi everyone else,
In a twisted turn of events, I'm trying to get the orphaned catgets
package to work correctly on Cygwin 1.7. As you might know, the package
is derived from the glibc package. Apart from other portability issues
of this *very* glibc-centric piece of code, I found some problem which
appears to point to two bugs in Cygwin's libiconv2.
For some reason, the iconv conversion seems to be overly dependent on
the usage of setlocale, and the returned value in the fourth parameter
appears to be incorrect, if the output codeset is "WCHAR_T".
Here's a simple testcase:
==== SNIP ====
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <iconv.h>
#include <locale.h>
#include <wchar.h>
iconv_t
open_iconv ()
{
iconv_t cd_towcp = iconv_open ("WCHAR_T", "UTF-8");
if (cd_towcp == (iconv_t) -1)
{
fprintf (stderr, "iconv_open: %d <%s>\n", errno, strerror (errno));
exit (1);
}
return cd_towcp;
}
void
run_iconv (iconv_t cd_towcp, char *input)
{
wchar_t out[256];
char *inbuf = input;
size_t inbytesleft = strlen (inbuf);
char *outbuf = (char *) out;
size_t outbytesleft = sizeof (out);
size_t ret = iconv (cd_towcp, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
if (ret == (size_t) -1)
fprintf (stderr, "iconv: %d <%s>\n", errno, strerror (errno));
printf ("in = <%s>, inbuf = <%s>, inbytesleft = %zd, outbytesleft = %zd\n",
input, inbuf, inbytesleft, outbytesleft);
}
int
main ()
{
iconv_t cd_towcp;
char *finnish = "Liian pitk\303\244 sana"; // Umlaut-a
setlocale (LC_ALL, "C");
cd_towcp = open_iconv ();
setlocale (LC_ALL, "C");
run_iconv (cd_towcp, finnish);
setlocale (LC_ALL, "C.UTF-8");
run_iconv (cd_towcp, finnish);
iconv_close (cd_towcp);
setlocale (LC_ALL, "C.UTF-8");
cd_towcp = open_iconv ();
setlocale (LC_ALL, "C");
run_iconv (cd_towcp, finnish);
setlocale (LC_ALL, "C.UTF-8");
run_iconv (cd_towcp, finnish);
iconv_close (cd_towcp);
return 0;
}
==== SNAP ====
Here are the important details:
- The input string is a fixed finnish UTF-8 sentence containing a
single non-ASCII char.
- The testcase always calls setlocale before calling iconv_open(),
then subsequently it sets setlocale before calling iconv().
- So the application tests to convert a UTF-8 to WCHAR_T string in four
combinations of the current locale, in this order:
- iconv_open "C", iconv "C"
- iconv_open "C", iconv "C.UTF-8"
- iconv_open "C.UTF-8", iconv "C"
- iconv_open "C.UTF-8", iconv "C.UTF-8"
Here's what happens in Linux:
$ gcc -g -o ic ic.c
$ ./ic
in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
Here's what happens on Cygwin:
$ gcc -g -o ic ic.c -liconv
$ ./ic
iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkä sana>, inbuf = <ä sana>, inbytesleft = 7, outbytesleft = 492
in = <Liian pitkä sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
So, AFAICS, there are two problems:
- Even though iconv_open has been opened explicitely with "UTF-8" as
input string, the conversion still depends on the current application
codeset. That dsoesn't make sense.
- Even though the last parameter to iconv is defined in bytes, the
value of outbytesleft after the conversion is the number of remaining
wchar"t's, not the number of remaining bytes. That's contrary to what
POSIX defines, see
http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
Is this analyzes correct? Is there by any chance a newer version of
libiconv2 which does not have these problems?
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
next reply other threads:[~2011-01-24 15:42 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-25 6:36 Corinna Vinschen [this message]
2011-01-25 11:15 ` Charles Wilson
2011-01-25 15:04 ` Corinna Vinschen
2011-01-25 18:58 ` Charles Wilson
2011-01-25 20:11 ` Corinna Vinschen
2011-01-28 22:13 ` Charles Wilson
2011-01-27 5:46 ` Charles Wilson
2011-01-27 16:05 ` Corinna Vinschen
2011-01-27 17:18 ` Charles Wilson
2011-01-27 3:53 ` Charles Wilson
2011-01-27 16:21 ` Corinna Vinschen
2011-01-27 17:39 ` Charles Wilson
2011-01-27 18:05 ` Corinna Vinschen
2011-01-27 20:12 ` cygwin patches for gnulib relocation code [Was: Re: Bug in libiconv?] Charles Wilson
2011-01-28 0:37 ` Eric Blake
2011-01-28 4:45 ` Charles Wilson
2011-01-26 13:39 Bug in libiconv? simrw
2011-01-26 13:50 ` Corinna Vinschen
2011-01-26 17:01 ` Charles Wilson
2011-01-26 22:39 ` Corinna Vinschen
2011-01-27 16:06 simrw
2011-01-29 2:15 Bruno Haible
2011-01-29 12:34 ` Charles Wilson
2011-01-29 13:20 ` Charles Wilson
2011-01-29 17:15 ` Corinna Vinschen
2011-01-29 16:02 ` Corinna Vinschen
2011-01-29 17:51 ` Eric Blake
2011-01-29 18:12 ` Corinna Vinschen
2011-01-29 18:28 ` Eric Blake
2011-01-30 11:34 ` Corinna Vinschen
2011-01-30 11:43 ` Corinna Vinschen
2011-01-30 2:40 ` Corinna Vinschen
2011-02-02 18:58 Bruno Haible
2011-02-02 21:20 ` Corinna Vinschen
2011-02-02 22:57 ` Charles Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110124154158.GA15279@calimero.vinschen.de \
--to=corinna-cygwin@cygwin.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).