Re: Bug in libiconv?

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* Re: Bug in libiconv?
@ 2011-02-02 18:58 Bruno Haible
  2011-02-02 21:20 ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: Bruno Haible @ 2011-02-02 18:58 UTC (permalink / raw)
  To: cygwin

[resent to the cygwin list; please add bug-gnu-libiconv to your replies]

Hi Corinna,

Thanks for your reply <http://cygwin.com/ml/cygwin/2011-01/msg00410.html>

> > Please CC the bug-gnu-libiconv mailing list when discussing possible
> > bugs in GNU libiconv.
>
> Ok

Thanks for giving it a try. But although you CCed bug-gnu-libiconv, your message
did not reach the list (but Charles' one and Eric's one did). I guess this is
because the cygwin.com mail server refuses to deliver to corinna-cygwin,
therefore the spam detection at gnu.org recognized your sending address as a
spammer's one. This makes it hard for me to detect that you replied to me,
since I'm not reading the cygwin mailing list on a regular basis.

> > I don't think defining __STDC_ISO_10646__
> > is compliant with ISO C 99 in this situation.
> > ...
> I don't read that from your above quote.  The core is that the *type*
> wchar_t is a *coded* *representation* of the characters defined in
> 10646.

OK.

> > What is the Cygwin wchar_t[] encoding? Is it UTF-16, like on Win32?
> Yes.
> ...
> yes, for the forseeable future, Cygwin will define wchar_t == UTF-16.

Thanks for confirming it. I've started thinking about how gnulib can
cope with it, now.

> I've put a lot of effort in 2009 and early 2010 to make the wchar_t
> representation in Cygwin and newlib as much Unicode 5.2 compatible as
> possible.  Even the wcrtomb and mbrtowc functions in newlib are capable
> of dealing with UTF-16 surrogates.

I appreciate your effort on internationalization of Cygwin. You went as
far as you could get with the given choice of wchar_t. It's just a fact
that the <wctype.h> functions and wcwidth() cannot work right when wchar_t[]
is UTF-16. And these functions are the only reasons why gnulib and coreutils
code uses wide characters strings at all.

I'm not criticizing the Cygwin choice. Even if Cygwin had chosen to define
'wchar_t' to a 32-bit type, the same problem would have remained for mingw
programs running in UTF-8 or GB18030 locales. (I understand that such
locales exist in Windows 7.)

> I don't quite grok the code at this point:
> 
>   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
>       if (sizeof(wchar_t) == 4) {
>         index = ei_ucs4internal;
>         break;
>       }
>       if (sizeof(wchar_t) == 2) {
>         index = ei_ucs2internal;
>         break;
>       }
>       if (sizeof(wchar_t) == 1) {
>         index = ei_iso8859_1;
>         break;
>       }
>   #endif
> ...
> I *don't* understand that you do the same for Win32.  Old
> Windows versions are using the basic UCS-2 character plane, but newer
> versions, at least since Windows XP are using UTF-16.

Thank you for this remark. I have corrected this in libiconv, and also
added support for Cygwin >= 1.7 at the same place.

> > > the application tests to convert a UTF-8 to WCHAR_T string in four
> > >   combinations of the current locale, in this order:
> > > 
> > >   - iconv_open "C",       iconv "C"
> > >   - iconv_open "C",       iconv "C.UTF-8"
> > >   - iconv_open "C.UTF-8", iconv "C"
> > >   - iconv_open "C.UTF-8", iconv "C.UTF-8"
> ...
> My testcase is a result of trying
> to build a real-life application, gencat from glibc.  For some reason
> gencat thinks it has to set the locale back to "C" in a hardcoded manner.
> 
> This works fine for glibc systems, but the invisible and, IMHO,
> intransparent behaviour of libiconv on other systems makes it pretty
> hard to understand the behaviour of an application when porting it.

I don't see this as a particular "intransparent behaviour of libiconv".
When taking code that was tested only in a single environment (glibc in this
case), you always have to make some effort to make it portable.

> > Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
> > some kind?
>
> Like the underlying Windows functions, Cygwin 1.7 now supports paths of
> up to 32K chars.  The old cygwin_conv_to_posix_path function and it's
> friends are written with the Windows ANSI API in mind, so they only
> support paths of up to MAX_PATH == 260 chars.

Thanks for explaining. I'll try to avoid this function.

> > > The usage of a fixed table instaed of the charset.alias file in
> > > libcharset/lib/localcharset.c, function get_charset_aliases() is
> > > not good, not good at all.
> > 
> > The alternative is to have this table stored in a file charset.alias;
> > but then every package that includes the module 'localcharset' from
> > gnulib (that is, libiconv, gettext, coreutils, and many others) will
> > want to modify this file during "make install". And this causes a lot of
> > headaches to packaging systems. Therefore, on platforms which have
> > widely used packaging systems (Linux, MacOS X, Cygwin), it's better to
> > avoid the need for this file.
> 
> Now I'm puzzled.  If that's the case, why does libiconv request the
> charset.alias file on *any* other system than DARWIN7, VMS, and Windows?
> Especially on Linux?

I "optimized" only the MacOS X, VMS, and Windows OSes. It would have been
more work to optimize all versions of Solaris, FreeBSD, AIX, etc. in the
same way.

charset.alias is requested on Linux, even though it normally does not exist,
so that packagers and users have a chance to modify the behaviour.

> Additionally, the fixed, Windows-centric table in libiconv removes the
> ability of a system to define their own set of aliases.  Also,
> Cygwin/newlib already handles the Windows codepages by itself.

There are a couple of places in gnulib, coreutils, gettext, that do some
decisions based on encoding of the current locale. In these places, I want
to use a single name for each encoding and not have to list all possible
aliases that any system on the world can use for it.

If a system adds new aliases, such as e.g. Solaris uses "PCK" when it means
"Shift_JIS", this needs to be handled in localcharset.c. There is no
system defined API for resolving these aliases.

Even if Cygwin/newlib handles Windows codepage aliases in all places where
it matters for Cygwin, there are still places where it matters for gnulib,
coreutils, gettext.

> > Neither libiconv nor gettext defines or undefines _WIN32 or __WIN32__.
> > But they are prepared to either setting.
>
> Isn't that just covering a PEBKAC?  I mean, there's no good reason to
> define -mwin32 on the command line and the libiconv configure certainly
> doesn't add it.  Whoever squeezed a -mwin32 onto the GCC command line,
> or even defined -D__WIN32__ manually, deserves the result.

But such a user will then write a mail to a mailing list, and it will take
time for me (or someone else) to investigate and answer it. By writing
  #if (defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__
I avoid this potential problem.

Thanks again for your reply and for the hint to the bug in libiconv's code.

Bruno
-- 
In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-02-02 18:58 Bug in libiconv? Bruno Haible
@ 2011-02-02 21:20 ` Corinna Vinschen
  2011-02-02 21:35   ` bug#7971: Acknowledgement (Bug in libiconv?) GNU bug Tracking System
  2011-02-02 22:57   ` Bug in libiconv? Charles Wilson
  0 siblings, 2 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-02-02 21:20 UTC (permalink / raw)
  To: cygwin, bug-gnulib, bug-coreutils

Hi Bruno,

On Feb  2 19:58, Bruno Haible wrote:
> [resent to the cygwin list; please add bug-gnu-libiconv to your replies]

Done.

> Hi Corinna,
> 
> Thanks for your reply <http://cygwin.com/ml/cygwin/2011-01/msg00410.html>
> 
> > > Please CC the bug-gnu-libiconv mailing list when discussing possible
> > > bugs in GNU libiconv.
> >
> > Ok
> 
> Thanks for giving it a try. But although you CCed bug-gnu-libiconv, your message
> did not reach the list (but Charles' one and Eric's one did). I guess this is
> because the cygwin.com mail server refuses to deliver to corinna-cygwin,
> therefore the spam detection at gnu.org recognized your sending address as a
> spammer's one. This makes it hard for me to detect that you replied to me,
> since I'm not reading the cygwin mailing list on a regular basis.

Uh, too bad.  Sorry about that.  I changed to my Red Hat email address
for this discussion.

> > I've put a lot of effort in 2009 and early 2010 to make the wchar_t
> > representation in Cygwin and newlib as much Unicode 5.2 compatible as
> > possible.  Even the wcrtomb and mbrtowc functions in newlib are capable
> > of dealing with UTF-16 surrogates.
> 
> I appreciate your effort on internationalization of Cygwin. You went as
> far as you could get with the given choice of wchar_t. It's just a fact
> that the <wctype.h> functions and wcwidth() cannot work right when wchar_t[]
> is UTF-16. And these functions are the only reasons why gnulib and coreutils
> code uses wide characters strings at all.

Well, as for the wctype functions you see how easy it is to convert
to wint_t and use that as input.  As for wcwidth, you're right.  However,
in Cygwin/newlib there's the wcswidth function which actually converts the
input string to wint_t type characters including surrogate handling and
then calls an internal __wcwidth function which works on wint_t types.
So there is a way to handle this stuff by just using standard functions,
and it isn't even overly complicated.

> I'm not criticizing the Cygwin choice. Even if Cygwin had chosen to define
> 'wchar_t' to a 32-bit type, the same problem would have remained for mingw
> programs running in UTF-8 or GB18030 locales. (I understand that such
> locales exist in Windows 7.)

Right.  However, GB18030 is not supported by Cygwin.

> > ...
> > I *don't* understand that you do the same for Win32.  Old
> > Windows versions are using the basic UCS-2 character plane, but newer
> > versions, at least since Windows XP are using UTF-16.
> 
> Thank you for this remark. I have corrected this in libiconv, and also
> added support for Cygwin >= 1.7 at the same place.

Thanks!

> > > > the application tests to convert a UTF-8 to WCHAR_T string in four
> > > >   combinations of the current locale, in this order:
> > > > 
> > > >   - iconv_open "C",       iconv "C"
> > > >   - iconv_open "C",       iconv "C.UTF-8"
> > > >   - iconv_open "C.UTF-8", iconv "C"
> > > >   - iconv_open "C.UTF-8", iconv "C.UTF-8"
> > ...
> > My testcase is a result of trying
> > to build a real-life application, gencat from glibc.  For some reason
> > gencat thinks it has to set the locale back to "C" in a hardcoded manner.
> > 
> > This works fine for glibc systems, but the invisible and, IMHO,
> > intransparent behaviour of libiconv on other systems makes it pretty
> > hard to understand the behaviour of an application when porting it.
> 
> I don't see this as a particular "intransparent behaviour of libiconv".
> When taking code that was tested only in a single environment (glibc in this
> case), you always have to make some effort to make it portable.

Oh, I meant my gencat experience just as an example.  IMHO this behaviour
is intransparent, no matter what you're trying to port, and where from
you're taking it.

I mean, if you're trying to call iconv for a conversion from some
codeset A to a codeset B, which are both explicitely mentioned when
calling iconv_open, then it is intransparent behaviour that the
conversion fails because you called setlocale with a codeset C.  There
is no apparent connection between the two actions.  The conversion from
A to B could be required for a file operation, while C is the CLI or GUI
charset.  Do you see what I mean?

> > > Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
> > > some kind?
> >
> > Like the underlying Windows functions, Cygwin 1.7 now supports paths of
> > up to 32K chars.  The old cygwin_conv_to_posix_path function and it's
> > friends are written with the Windows ANSI API in mind, so they only
> > support paths of up to MAX_PATH == 260 chars.
> 
> Thanks for explaining. I'll try to avoid this function.

There should be no reason to call cygwin_conv_path functions, unless you
have a direct interaction with native Win32 functions.  So you can most
easily avoid using them at all by using the relocation technique from
Linux, utilizing /proc/self/maps, which in turn drops the requirement for
the DLLMain function.

> > > > The usage of a fixed table instaed of the charset.alias file in
> > > > libcharset/lib/localcharset.c, function get_charset_aliases() is
> > > > not good, not good at all.
> > > 
> > > The alternative is to have this table stored in a file charset.alias;
> > > but then every package that includes the module 'localcharset' from
> > > gnulib (that is, libiconv, gettext, coreutils, and many others) will
> > > want to modify this file during "make install". And this causes a lot of
> > > headaches to packaging systems. Therefore, on platforms which have
> > > widely used packaging systems (Linux, MacOS X, Cygwin), it's better to
> > > avoid the need for this file.
> > 
> > Now I'm puzzled.  If that's the case, why does libiconv request the
> > charset.alias file on *any* other system than DARWIN7, VMS, and Windows?
> > Especially on Linux?
> 
> I "optimized" only the MacOS X, VMS, and Windows OSes. It would have been
> more work to optimize all versions of Solaris, FreeBSD, AIX, etc. in the
> same way.
> 
> charset.alias is requested on Linux, even though it normally does not exist,
> so that packagers and users have a chance to modify the behaviour.

I beg to keep this choice to Cygwin users as well.  It will be empty by
default as well.  The supported codesets are documented in
http://cygwin.com/cygwin-ug-net/setup-locale.html#setup-locale-charsetlist
If some weird alias is required, the user can add it to charset.alias.
That's the optimal solution.

> Even if Cygwin/newlib handles Windows codepage aliases in all places where
> it matters for Cygwin, there are still places where it matters for gnulib,
> coreutils, gettext.

Since gnulib, coreutils and gettext are ported to Cygwin anyway, the
ported versions should live happily in the Cygwin world.  They get what
the system defines, and the system is Cygwin, not Windows.  Everything
else can be added to charset.alias, if required.

> > > Neither libiconv nor gettext defines or undefines _WIN32 or __WIN32__.
> > > But they are prepared to either setting.
> >
> > Isn't that just covering a PEBKAC?  I mean, there's no good reason to
> > define -mwin32 on the command line and the libiconv configure certainly
> > doesn't add it.  Whoever squeezed a -mwin32 onto the GCC command line,
> > or even defined -D__WIN32__ manually, deserves the result.
> 
> But such a user will then write a mail to a mailing list, and it will take
> time for me (or someone else) to investigate and answer it. By writing
>   #if (defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__
> I avoid this potential problem.

Ok.  However, the other variation

   #if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__

should be only used in very rare circumstances.  Usually it just means
that some unnecessary Windowism is used on Cygwin, and that there's
probably a POSIXy equivalent.  If not, kick us here on the list and
we can discuss it.

> Thanks again for your reply and for the hint to the bug in libiconv's code.

You're welcome and thanks for this fruitful discussion.  I'm glad if we
can find a well-working compromise for some of the problems, especially
in the unfortunate UTF-16 case.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#7971: Acknowledgement (Bug in libiconv?)
  2011-02-02 21:20 ` Corinna Vinschen
@ 2011-02-02 21:35   ` GNU bug Tracking System
  2011-02-02 22:57   ` Bug in libiconv? Charles Wilson
  1 sibling, 0 replies; 33+ messages in thread
From: GNU bug Tracking System @ 2011-02-02 21:35 UTC (permalink / raw)
  To: cygwin, bug-gnulib, bug-coreutils

Thank you for filing a new bug report with GNU.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-coreutils@gnu.org

If you wish to submit further information on this problem, please
send it to 7971@debbugs.gnu.org.

Please do not send mail to help-debbugs@gnu.org unless you wish
to report a problem with the Bug-tracking system.

-- 
7971: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7971
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-02-02 21:20 ` Corinna Vinschen
  2011-02-02 21:35   ` bug#7971: Acknowledgement (Bug in libiconv?) GNU bug Tracking System
@ 2011-02-02 22:57   ` Charles Wilson
  1 sibling, 0 replies; 33+ messages in thread
From: Charles Wilson @ 2011-02-02 22:57 UTC (permalink / raw)
  To: cygwin, bug-gnulib

On 2/2/2011 4:19 PM, Corinna Vinschen wrote:
> On Feb  2 19:58, Bruno Haible wrote:
>> charset.alias is requested on Linux, even though it normally does not exist,
>> so that packagers and users have a chance to modify the behaviour.
> 
> I beg to keep this choice to Cygwin users as well.  It will be empty by
> default as well.  The supported codesets are documented in
> http://cygwin.com/cygwin-ug-net/setup-locale.html#setup-locale-charsetlist
> If some weird alias is required, the user can add it to charset.alias.
> That's the optimal solution.

FWIW, using a fresh git clone of libiconv

	3cdff14a3cc549dc4ccfe02dca46e73b1e7a68c6
	Sat Jan 29 18:34:14 2011 +0100)

bootstrapped using a fresh gnulib

	a036b7684f9671ee53999773785d1865603c3849
	Tue Feb 1 10:04:17 2011 -0800

and no other patches, libiconv + cygwin-1.7.7 [note: NOT 1.7.8pre]
works, passes its own self-tests, and passes Corinna's original test
case that spawned this thread.

Bruno's change in libiconv was:

-         This is also the case on native Woe32 systems.  */
-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) &&
!defined __CYGWIN__)
+         This is also the case on native Woe32 systems and Cygwin >=
1.7, where
+         we know that it is UTF-16.  */
+#if ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__) ||
(defined __CYGWIN__ && CYGWIN_VERSION_DLL_MAJOR >= 1007)
...some code...
+#elif __STDC_ISO_10646__
...other code...
#endif

repeated at various places. Obviously the use of
CYGWIN_VERSION_DLL_MAJOR means there is a

+#ifdef __CYGWIN__
+#include <cygwin/version.h>
+#endif

in there, too.


Now, this configuration does NOT include:

  1) Corinna's suggested change to localcharset.c that modified
     get_charset_alias() to use charset.alias on cygwin instead of
     hardcoding the alias list, NOR the change in that file to
     locale_charset() to deal with copying the value returned by
     nl_langinfo() and remove some special cygwin workarounds involving
     GetACP().

  2) the relocation changes to avoid deprecated path conversion
     functions and to do things on cygwin "the linux way".
     http://lists.gnu.org/archive/html/bug-gnulib/2011-01/msg00522.html

I tested both with and without --enable-relocatable...


>> But such a user will then write a mail to a mailing list, and it will take
>> time for me (or someone else) to investigate and answer it. By writing
>>   #if (defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__
>> I avoid this potential problem.
> 
> Ok.  However, the other variation
> 
>    #if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
> 
> should be only used in very rare circumstances.  Usually it just means
> that some unnecessary Windowism is used on Cygwin, and that there's
> probably a POSIXy equivalent.  If not, kick us here on the list and
> we can discuss it.

See above, with the

#if ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__) ||
(defined __CYGWIN__ && CYGWIN_VERSION_DLL_MAJOR >= 1007)

formulation.  It's not an erroneous use of a windowism, it just reflects
that cygwin's unicode impl shares characteristics with the underlying
win32 unicode support.


--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-30 11:34         ` Corinna Vinschen
@ 2011-01-30 11:43           ` Corinna Vinschen
  0 siblings, 0 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-30 11:43 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

On Jan 29 19:12, Corinna Vinschen wrote:
> On Jan 29 10:21, Eric Blake wrote:
> > In other words, cygwin IS being POSIX-compliant by advertising only the
> > Unicode 4.0 character set in the __STDC_ISO_10646__, while still

Btw., you are aware that Unicode 4.0 already defines more characters than
fit into the base plane, aren't you?

I chose the value in an attempt to be carefully conservative.  I'm not
100% sure if that's the right thing to do...

> > supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an

Yeah, how could we live without emoticons all the time ;-)

But, sure, it would be nice to get contributions to to support 6.0
in the wctype functions.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 18:28       ` Eric Blake
@ 2011-01-30 11:34         ` Corinna Vinschen
  2011-01-30 11:43           ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-30 11:34 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

On Jan 29 10:21, Eric Blake wrote:
> On 01/29/2011 09:01 AM, Corinna Vinschen wrote:
> >> So, using UTF-16 surrogate encodings for characters outside the basic
> >> plane violates POSIX, but it's the best we can do for those characters.
> > 
> > Right, and we discussed this already on this list.  Or the developer
> > list, I don't remember.  Maybe we should have stick to the base plane
> > and only use UCS-2 to be more POSIX compatible.
> 
> The burden is on the application, not on cygwin.  If the application
> wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY
> characters from the basic plane (no surrogates), at which point their
> use of wchar_t fits the POSIX definition (one wchar_t per character).
> The moment they pass a surrogate, they are no longer honoring the
> restriction documented by __STDC_ISO_10646__ so they are no longer under
> the rules of POSIX, and then cygwin can do whatever it wants (and in

Erm... hang on.  __STDC_ISO_10646__ and the POSIX requirement are two
different beasts.  I still think that __STDC_ISO_10646__ does not
restrict a 2 byte wchar_t to UCS-2.  Per the definition UTF-16 is a
valid coded representation of characters from ISO/IEC 10646.

So, to say it with your words, the moment applications pass a surrogate,
they are no longer under the rules of POSIX, but they still honor the
restriction documented by __STDC_ISO_10646__.

However, *usually* an application shouldn't really notice that a
surrogate has been used, at least as long as they only manipulate entire
strings.

> this case, QoI demands that we honor surrogates to the best of our
> ability for full UTF-16 support, and you can have multi-wchar_t
> characters just as you already have multi-byte UTF-8 char characters).
> In other words, cygwin IS being POSIX-compliant by advertising only the
> Unicode 4.0 character set in the __STDC_ISO_10646__, while still
> supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an
> extension when you no longer care about POSIX.
> 
> > However, the POSIX definition doesn't contradict what I said about the
> > definition of __STDC_ISO_10646__ as far as I'm concerned.
> 
> Yep - I think we're in violent agreement :)

Hmm, I'm not quite sure, see above.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 17:51   ` Eric Blake
  2011-01-29 18:12     ` Corinna Vinschen
@ 2011-01-30  2:40     ` Corinna Vinschen
  1 sibling, 0 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-30  2:40 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

[Duplicate message to honor the missing CC of bug-gnu-libiconv@gnu.org]

On Jan 29 08:10, Eric Blake wrote:
> On 01/29/2011 05:30 AM, Corinna Vinschen wrote:
> >> But when characters outside the basic plane, such as
> >> U+12345 (CUNEIFORM SIGN URU TIMES KI), are encoded by 2 consecutive wchar_t
> >> values, values of type wchar_t don't correspond to ISO/IEC 10646 characters.
> >> (Or maybe I'm underestimating what "coded representations" means...?)
> > 
> > I don't read that from your above quote.  The core is that the *type*
> > wchar_t is a *coded* *representation* of the characters defined in
> > 10646.  At no point it says that a single wchar_t value must represent a
> > single character from 10646.  So I take it that UTF-16 is a valid, coded
> > representation of the characters from 10646.
> 
> POSIX is clear that wchar_t must be wide enough so that 1 wchar_t is one
> character.  Which limits a 2-byte wchar_t to just the Unicode basic
> plane.  There's nothing cygwin can do about this other than break LOTS
> of ABI to support a 4-byte wchar_t to supply all of Unicode.
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_03
> 
> "All wide-character codes in a given process consist of an equal number
> of bits. This is in contrast to characters, which can consist of a
> variable number of bytes. The byte or byte sequence that represents a
> character can also be represented as a wide-character code.
> Wide-character codes thus provide a uniform size for manipulating text
> data."
> 
> So, using UTF-16 surrogate encodings for characters outside the basic
> plane violates POSIX, but it's the best we can do for those characters.

Right, and we discussed this already on this list.  Or the developer
list, I don't remember.  Maybe we should have stick to the base plane
and only use UCS-2 to be more POSIX compatible.  I have to admit that
I was more interested to get all (or as much as possible) of Unicode
working than to follow POSIX to the last word in this regard.  And I
was interested to make sure that east asian users would get all of the
characters used and there *are* the CJK idograpsh in the 0x2xxxx plane.

However, the POSIX definition doesn't contradict what I said about the
definition of __STDC_ISO_10646__ as far as I'm concerned.

> Someday when gcc has better support for C+1x 16- and 32-bit characters
> (regardless of the sizing of wchar_t), then we can add all the new
> 32-bit character APIs that use Unicode unimpeded, without breaking
> existing ones that use wchar_t.

Yeah, that's what I'm waiting for as well.  But for the time being,
I'm confident that we have the best compromise possible at the time.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 18:12     ` Corinna Vinschen
@ 2011-01-29 18:28       ` Eric Blake
  2011-01-30 11:34         ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: Eric Blake @ 2011-01-29 18:28 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1647 bytes --]

On 01/29/2011 09:01 AM, Corinna Vinschen wrote:
>> So, using UTF-16 surrogate encodings for characters outside the basic
>> plane violates POSIX, but it's the best we can do for those characters.
> 
> Right, and we discussed this already on this list.  Or the developer
> list, I don't remember.  Maybe we should have stick to the base plane
> and only use UCS-2 to be more POSIX compatible.

The burden is on the application, not on cygwin.  If the application
wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY
characters from the basic plane (no surrogates), at which point their
use of wchar_t fits the POSIX definition (one wchar_t per character).
The moment they pass a surrogate, they are no longer honoring the
restriction documented by __STDC_ISO_10646__ so they are no longer under
the rules of POSIX, and then cygwin can do whatever it wants (and in
this case, QoI demands that we honor surrogates to the best of our
ability for full UTF-16 support, and you can have multi-wchar_t
characters just as you already have multi-byte UTF-8 char characters).
In other words, cygwin IS being POSIX-compliant by advertising only the
Unicode 4.0 character set in the __STDC_ISO_10646__, while still
supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an
extension when you no longer care about POSIX.

> However, the POSIX definition doesn't contradict what I said about the
> definition of __STDC_ISO_10646__ as far as I'm concerned.

Yep - I think we're in violent agreement :)

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 17:51   ` Eric Blake
@ 2011-01-29 18:12     ` Corinna Vinschen
  2011-01-29 18:28       ` Eric Blake
  2011-01-30  2:40     ` Corinna Vinschen
  1 sibling, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-29 18:12 UTC (permalink / raw)
  To: cygwin

On Jan 29 08:10, Eric Blake wrote:
> On 01/29/2011 05:30 AM, Corinna Vinschen wrote:
> >> But when characters outside the basic plane, such as
> >> U+12345 (CUNEIFORM SIGN URU TIMES KI), are encoded by 2 consecutive wchar_t
> >> values, values of type wchar_t don't correspond to ISO/IEC 10646 characters.
> >> (Or maybe I'm underestimating what "coded representations" means...?)
> > 
> > I don't read that from your above quote.  The core is that the *type*
> > wchar_t is a *coded* *representation* of the characters defined in
> > 10646.  At no point it says that a single wchar_t value must represent a
> > single character from 10646.  So I take it that UTF-16 is a valid, coded
> > representation of the characters from 10646.
> 
> POSIX is clear that wchar_t must be wide enough so that 1 wchar_t is one
> character.  Which limits a 2-byte wchar_t to just the Unicode basic
> plane.  There's nothing cygwin can do about this other than break LOTS
> of ABI to support a 4-byte wchar_t to supply all of Unicode.
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_03
> 
> "All wide-character codes in a given process consist of an equal number
> of bits. This is in contrast to characters, which can consist of a
> variable number of bytes. The byte or byte sequence that represents a
> character can also be represented as a wide-character code.
> Wide-character codes thus provide a uniform size for manipulating text
> data."
> 
> So, using UTF-16 surrogate encodings for characters outside the basic
> plane violates POSIX, but it's the best we can do for those characters.

Right, and we discussed this already on this list.  Or the developer
list, I don't remember.  Maybe we should have stick to the base plane
and only use UCS-2 to be more POSIX compatible.  I have to admit that
I was more interested to get all (or as much as possible) of Unicode
working than to follow POSIX to the last word in this regard.  And I
was interested to make sure that east asian users would get all of the
characters used and there *are* the CJK idograpsh in the 0x2xxxx plane.

However, the POSIX definition doesn't contradict what I said about the
definition of __STDC_ISO_10646__ as far as I'm concerned.

> Someday when gcc has better support for C+1x 16- and 32-bit characters
> (regardless of the sizing of wchar_t), then we can add all the new
> 32-bit character APIs that use Unicode unimpeded, without breaking
> existing ones that use wchar_t.

Yeah, that's what I'm waiting for as well.  But for the time being,
I'm confident that we have the best compromise possible at the time.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 16:02 ` Corinna Vinschen
@ 2011-01-29 17:51   ` Eric Blake
  2011-01-29 18:12     ` Corinna Vinschen
  2011-01-30  2:40     ` Corinna Vinschen
  0 siblings, 2 replies; 33+ messages in thread
From: Eric Blake @ 2011-01-29 17:51 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

[-- Attachment #1: Type: text/plain, Size: 2788 bytes --]

On 01/29/2011 05:30 AM, Corinna Vinschen wrote:
>> But when characters outside the basic plane, such as
>> U+12345 (CUNEIFORM SIGN URU TIMES KI), are encoded by 2 consecutive wchar_t
>> values, values of type wchar_t don't correspond to ISO/IEC 10646 characters.
>> (Or maybe I'm underestimating what "coded representations" means...?)
> 
> I don't read that from your above quote.  The core is that the *type*
> wchar_t is a *coded* *representation* of the characters defined in
> 10646.  At no point it says that a single wchar_t value must represent a
> single character from 10646.  So I take it that UTF-16 is a valid, coded
> representation of the characters from 10646.

POSIX is clear that wchar_t must be wide enough so that 1 wchar_t is one
character.  Which limits a 2-byte wchar_t to just the Unicode basic
plane.  There's nothing cygwin can do about this other than break LOTS
of ABI to support a 4-byte wchar_t to supply all of Unicode.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_03

"All wide-character codes in a given process consist of an equal number
of bits. This is in contrast to characters, which can consist of a
variable number of bytes. The byte or byte sequence that represents a
character can also be represented as a wide-character code.
Wide-character codes thus provide a uniform size for manipulating text
data."

So, using UTF-16 surrogate encodings for characters outside the basic
plane violates POSIX, but it's the best we can do for those characters.

> I've put a lot of effort in 2009 and early 2010 to make the wchar_t
> representation in Cygwin and newlib as much Unicode 5.2 compatible as
> possible.  Even the wcrtomb and mbrtowc functions in newlib are capable
> of dealing with UTF-16 surrogates.

And I appreciate that effort - even though it means wchar_t is just as
painful as multi-byte char characters in that an array of wchar_t is not
necessarily that many characters long, but only when surrogates are
involved.

> 
> However, given that Windows XP basically only supports the charset from
> Unicode 4.0, and given that Cygwin's support for east-asian double and
> triple byte codesets (Big5, GBK, eucKR, eucJP, and a SJIS/CP932 bastard)
> still requires the underlying Windows conversion functions, I've set
> __STDC_ISO_10646__ to a value which reflects Unicode 4.0 (200305L) for
> Cygwin 1.7.8.

Someday when gcc has better support for C+1x 16- and 32-bit characters
(regardless of the sizing of wchar_t), then we can add all the new
32-bit character APIs that use Unicode unimpeded, without breaking
existing ones that use wchar_t.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29 13:20 ` Charles Wilson
@ 2011-01-29 17:15   ` Corinna Vinschen
  0 siblings, 0 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-29 17:15 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

On Jan 28 22:06, Charles Wilson wrote:
> On 1/28/2011 5:12 PM, Bruno Haible wrote:
> >> the old cygwin_conv_to_posix_path function as well.
> > 
> > Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
> > some kind?
> 
> Yes, and (and because:) yes.
> 
> The limitation is, the old functions:
> 
> extern int cygwin_win32_to_posix_path_list (const char *, char *)
> extern int cygwin_win32_to_posix_path_list_buf_size (const char *)
> extern int cygwin_posix_to_win32_path_list (const char *, char *)
> extern int cygwin_posix_to_win32_path_list_buf_size (const char *)
> extern int cygwin_conv_to_win32_path (const char *, char *)
> extern int cygwin_conv_to_full_win32_path (const char *, char *)
> extern int cygwin_conv_to_posix_path (const char *, char *)
> extern int cygwin_conv_to_full_posix_path (const char *, char *)
> 
> are all deprecated, because (a) they don't handle wide chars, (b) and
> are limited to only 254 char path lengths.  The replacement functions
> 
> extern ssize_t cygwin_conv_path (cygwin_conv_path_t what,
>                                  const void *from,
>                                  void *to, size_t size);
> 
> extern ssize_t cygwin_conv_path_list (cygwin_conv_path_t what,
>                                       const void *from,
>                                       void *to, size_t size);
> 
> extern void *cygwin_create_path (cygwin_conv_path_t what,
>                                  const void *from);
> 
> do not have these limitations (well, 4Kbytes/2k wchars for a single
> filename; 32K? for pathlists).  cygwin_conv_path_t controls the
> behavior, and can accept the following values:

Cygwin defines PATH_MAX == 4096 since we don't guarantee that an
incoming filename of more than 4K is handled.  However, the conversion
itself does not restrict what we get from Windows, which is 32K
pathnames.

> However, by using the linux-ish facilities throughout and avoiding the
> win32 stuff, you can ALSO avoid the necessity of calling any path
> conversion functions at all -- and eliminate a lot of platform-specific
> code.  (e.g. let the cygwin dll do ALL the work)

Exactly.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29  2:15 Bruno Haible
  2011-01-29 12:34 ` Charles Wilson
  2011-01-29 13:20 ` Charles Wilson
@ 2011-01-29 16:02 ` Corinna Vinschen
  2011-01-29 17:51   ` Eric Blake
  2 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-29 16:02 UTC (permalink / raw)
  To: cygwin; +Cc: bug-gnu-libiconv

On Jan 28 23:12, Bruno Haible wrote:
> Hi Corinna and Chuck,
> 
> Please CC the bug-gnu-libiconv mailing list when discussing possible
> bugs in GNU libiconv.

Ok, no worries.  However, please remove my mail account from the CC.
I'm reading the cygwin ML anyway, so I don't need dups to my private
email account.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00292.html>:
> 
> > the application tests to convert a UTF-8 to WCHAR_T string in four
> >   combinations of the current locale, in this order:
> > 
> >   - iconv_open "C",       iconv "C"
> >   - iconv_open "C",       iconv "C.UTF-8"
> >   - iconv_open "C.UTF-8", iconv "C"
> >   - iconv_open "C.UTF-8", iconv "C.UTF-8"
> > [...]
> > Here's what happens on Cygwin:
> > 
> >   $ gcc -g -o ic ic.c -liconv
> >   $ ./ic
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃƒ sana>, inbuf = <Ãƒ sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃƒ sana>, inbuf = <Ãƒ sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃƒ sana>, inbuf = <Ãƒ sana>, inbytesleft = 7, outbytesleft = 492
> >   in = <Liian pitkÃƒ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
> 
> On glibc systems, the encoding "WCHAR_T" is equivalent to "UCS-4" with machine
> dependent endianness and alignment. In particular it is independent of the
> locale. That explains the first set of results.
> 
> In libiconv, on systems which don't define __STDC_ISO_10646__, the encoding
> "WCHAR_T" is equivalent to wchar_t[], that is, dependent on the locale.
> Changing the locale encoding after allocating an iconv_t from or to "WCHAR_T"
> yields undefined behaviour. That explains the second set of results.

IMHO this is undesired behaviour.  My testcase is a result of trying
to build a real-life application, gencat from glibc.  For some reason
gencat thinks it has to set the locale back to "C" in a hardcoded manner.

This works fine for glibc systems, but the invisible and, IMHO,
intransparent behaviour of libiconv on other systems makes it pretty
hard to understand the behaviour of an application when porting it.

So it's good that Cygwin will define __STDC_ISO_10646__ in future.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00299.html>:
> 
> > I defined __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.
> 
> What is the Cygwin wchar_t[] encoding? Is it UTF-16, like on Win32? The

Yes.  It makes a lot of sense, doesn't it?  Otherwise we would constantly
have to convert twice, from MB to wchar_t and then to UTF-16 for system
access.

> documentation is silent about it. I had expected to find some word about it
> in <http://cygwin.com/cygwin-api/compatibility.html#std-susv4>
> or <http://cygwin.com/cygwin-api/std-notes.html>.

Hmm, I thought this would go without saying, given that sizeof(wchar_t)
is 2 for as long as wchar_t is a supported type in Cygwin at all.  Maybe
a hint in the docs wouldn't be too bad an idea...

> In any case, sizeof (wchar_t) == 2. I don't think defining __STDC_ISO_10646__
> is compliant with ISO C 99 in this situation. ISO C 99 section 6.10.8.(2) says:
> 
>   __STDC_ISO_10646__
>           An integer constant of the form yyyymmL (for example,
>           199712L), intended to indicate that values of type wchar_t are the
>           coded representations of the characters defined by ISO/IEC 10646,
>           along with all amendments and technical corrigenda as of the
>           specified year and month.
> 
> But when characters outside the basic plane, such as
> U+12345 (CUNEIFORM SIGN URU TIMES KI), are encoded by 2 consecutive wchar_t
> values, values of type wchar_t don't correspond to ISO/IEC 10646 characters.
> (Or maybe I'm underestimating what "coded representations" means...?)

I don't read that from your above quote.  The core is that the *type*
wchar_t is a *coded* *representation* of the characters defined in
10646.  At no point it says that a single wchar_t value must represent a
single character from 10646.  So I take it that UTF-16 is a valid, coded
representation of the characters from 10646.

I've put a lot of effort in 2009 and early 2010 to make the wchar_t
representation in Cygwin and newlib as much Unicode 5.2 compatible as
possible.  Even the wcrtomb and mbrtowc functions in newlib are capable
of dealing with UTF-16 surrogates.

However, given that Windows XP basically only supports the charset from
Unicode 4.0, and given that Cygwin's support for east-asian double and
triple byte codesets (Big5, GBK, eucKR, eucJP, and a SJIS/CP932 bastard)
still requires the underlying Windows conversion functions, I've set
__STDC_ISO_10646__ to a value which reflects Unicode 4.0 (200305L) for
Cygwin 1.7.8.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00357.html>:
> 
> >   #if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
> > This should be
> > ...
> >   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
> 
> That makes sense if Cygwin guarantees that from now on and in the future,
> the wchar_t encoding will always be UTF-16. Is this the case?

We have no reason to change that.  We could have done that when
introducing Cygwin 1.7, but it would not only have broken backward
conpatibility with existing applications, it would also have required to
change GCC in a backward-incompatible way.  And given that the
underlying real OS is using UTF-16 for wchar_t anyway, it was the
natural choice to keep up with sizeof(wchar_t) == 2.

So, yes, for the forseeable future, Cygwin will define wchar_t == UTF-16.

But note Charles email.  I don't think it's correct to use the above

  #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__

*if* you want to to keep backward compatibility with Cygwin 1.5.

Btw., I don't quite grok the code at this point:

  #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
      if (sizeof(wchar_t) == 4) {
        index = ei_ucs4internal;
        break;
      }
      if (sizeof(wchar_t) == 2) {
        index = ei_ucs2internal;
        break;
      }
      if (sizeof(wchar_t) == 1) {
        index = ei_iso8859_1;
        break;
      }
  #endif

Given your interpretation of the definition of __STDC_ISO_10646__, I
understand why you use UCS-2 as wchar_t representation if
sizeof(wchar_t) == 2.

However, I *don't* understand that you do the same for Win32.  Old
Windows versions are using the basic UCS-2 character plane, but newer
versions, at least since Windows XP are using UTF-16.

So, here's the question.  Is ei_ucs2internal really UCS-2, or is it
UTF-16?  If the first, isn't it a bug to treat Windows as a UCS-2
system?

In general, shouldn't there be another choice to distinguish wchar_t ==
UCS-2 from wchar_t == UTF-16 system?  I don't see that at all in
libiconv.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00299.html>:
> 
> > Why on earth is libiconv on Cygwin using Windows functions in some
> > places?
> 
> So that I could reuse the essentially same code on Cygwin as on native Win32.
> 
> Charles has submitted a patch on this topic to bug-gnulib; I will handle it.

Thanks.  From my POV it makes sense to get rid of as much Windowisms as
possible on Cygwin.

> > the old cygwin_conv_to_posix_path function as well.
> 
> Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
> some kind?

Like the underlying Windows functions, Cygwin 1.7 now supports paths of
up to 32K chars.  The old cygwin_conv_to_posix_path function and it's
frineds are written with the Windows ANSI API in mind, so they only
support paths of up to MAX_PATH == 260 chars.

However, given that you can use /proc/self/maps for the same task,
there's no reason to use these functions at all.  Just let Cygwin do
it's stuff internally.

> > The usage of a fixed table instaed of the charset.alias file in
> > libcharset/lib/localcharset.c, function get_charset_aliases() is
> > not good, not good at all.
> 
> The alternative is to have this table stored in a file charset.alias;
> but then every package that includes the module 'localcharset' from
> gnulib (that is, libiconv, gettext, coreutils, and many others) will
> want to modify this file during "make install". And this causes a lot of
> headaches to packaging systems. Therefore, on platforms which have
> widely used packaging systems (Linux, MacOS X, Cygwin), it's better to
> avoid the need for this file.

Now I'm puzzled.  If that's the case, why does libiconv request the
charset.alias file on *any* other system than DARWIN7, VMS, and Windows?
Especially on Linux?

Additionally, the fixed, Windows-centric table in libiconv removes the
ability of a system to define their own set of aliases.  Also,
Cygwin/newlib already handles the Windows codepages by itself.

> Additionally, on Win32 systems relocatability
> is a must, and the code to compute the location of charset.alias from
> the location of libiconv.dll would be overkill.

Why?  Why isn't it overkill to do it on Linux and others, but why is it
overkill on Windows or, FWIW, Darwin7 and VMS?  Sure, it's faster to use
a fixed alias table than to read a file, but neither is the mechanism to
fetch the file location so much slower on these systems, nor is there a
reason that other systems get the additional flexibility which you deny
those three systems.

As one of the core Cygwin maintainers I prefer that external libs from
the POSIX world are built with as few Windowisms, and with as few #ifdef
__CYGWIN__ tweaks as possible.  That's the goal we're working for.  If
some POSIXism don't work, start with complaining here.  Either we can
make it work, or we can discuss the least intrusive workaround.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00303.html>:
> 
> > It looks like there's been some bitrot with respect
> > to some of the "&& !CYGWIN" guards on WIN32.  Both libiconv and gettext,
> > IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
> > "regular" win32 and for cygwin...which means defined(CYGWIN) guards are
> > necessary.
> 
> The reason for these "&& !defined __CYGWIN__" clauses is that - at least
> in Cygwin 1.5.x - gcc has an option that will define _WIN32 or __WIN32__.

That has nothing to do with Cygwin 1.5.  That's still an option in 
more recent GCC versions.  I have not the faintest idea why.

> So, when _WIN32 || __WIN32__ may evaluate to true on Cygwin, or it may
> evaluate to false on Cygwin. Since I don't want libiconv or gettext
> to be compiled in two possible ways on Cygwin, I add
> "&& !defined __CYGWIN__".
> 
> Neither libiconv nor gettext defines or undefines _WIN32 or __WIN32__.
> But they are prepared to either setting.

Isn't that just covering a PEBKAC?  I mean, there's no good reason to
define -mwin32 on the command line and the libiconv configure certainly
doesn't add it.  Whoever squeezed a -mwin32 onto the GCC command line,
or even defined -D__WIN32__ manually, deserves the result.

> Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00332.html>:
> 
> > there ARE still bugs in libiconv on Cygwin -- specifically:
> >  - Even though iconv_open has been opened explicitely with "UTF-8" as
> >    input string, the conversion still depends on the current application
> >    codeset.  That doesn't make sense.
> 
> If the other argument to iconv_open is "CHAR" or "WCHAR_T", hence locale
> dependent, and you change the locale in between, the result is undefined
> behaviour.

Why in libiconv?  Why not in glibc?  As I wrote above, this behaviour
is most unexpected and most surprising.  It also makes porting glibc
applications so much harder for no good reason.

> >  - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.
> 
> It's not a bug. From POSIX:2008
> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html>
> you can infer that (iconv_t) -1 is not a "conversion descriptor". It's a
> return value used from iconv_open(), nothing more. From
> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_close.html>
> you can see that the argument of iconv_close() has to be a conversion
> descriptor. From the ERRORS section in the same page you can see that
> iconv_close() is not required to catch a faulty argument. Note the word
> "may", not "shall".

Ok, so it's not a bug but blessed behaviour per POSIX.  However, is it
really necessary?  An extra check doesn't cost you anything, makes
libiconv more user-friendly, and aligns the behaviour with glibc, which,
again, makes porting applications easier.  Glibc's gencat is such an
application which crashes under libiconv due to that.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29  2:15 Bruno Haible
  2011-01-29 12:34 ` Charles Wilson
@ 2011-01-29 13:20 ` Charles Wilson
  2011-01-29 17:15   ` Corinna Vinschen
  2011-01-29 16:02 ` Corinna Vinschen
  2 siblings, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-29 13:20 UTC (permalink / raw)
  Cc: cygwin, bug-gnu-libiconv

On 1/28/2011 5:12 PM, Bruno Haible wrote:
>> the old cygwin_conv_to_posix_path function as well.
> 
> Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
> some kind?

Yes, and (and because:) yes.

The limitation is, the old functions:

extern int cygwin_win32_to_posix_path_list (const char *, char *)
extern int cygwin_win32_to_posix_path_list_buf_size (const char *)
extern int cygwin_posix_to_win32_path_list (const char *, char *)
extern int cygwin_posix_to_win32_path_list_buf_size (const char *)
extern int cygwin_conv_to_win32_path (const char *, char *)
extern int cygwin_conv_to_full_win32_path (const char *, char *)
extern int cygwin_conv_to_posix_path (const char *, char *)
extern int cygwin_conv_to_full_posix_path (const char *, char *)

are all deprecated, because (a) they don't handle wide chars, (b) and
are limited to only 254 char path lengths.  The replacement functions

extern ssize_t cygwin_conv_path (cygwin_conv_path_t what,
                                 const void *from,
                                 void *to, size_t size);

extern ssize_t cygwin_conv_path_list (cygwin_conv_path_t what,
                                      const void *from,
                                      void *to, size_t size);

extern void *cygwin_create_path (cygwin_conv_path_t what,
                                 const void *from);

do not have these limitations (well, 4Kbytes/2k wchars for a single
filename; 32K? for pathlists).  cygwin_conv_path_t controls the
behavior, and can accept the following values:

enum
{
  CCP_POSIX_TO_WIN_A = 0, /* from is char*, to is char*       */
  CCP_POSIX_TO_WIN_W,     /* from is char*, to is wchar_t*    */
  CCP_WIN_A_TO_POSIX,     /* from is char*, to is char*       */
  CCP_WIN_W_TO_POSIX,     /* from is wchar_t*, to is char*    */

  /* Or these values to the above as needed. */
  CCP_ABSOLUTE = 0,       /* Request absolute path (default). */
  CCP_RELATIVE = 0x100    /* Request to keep path relative.   */
};

However, by using the linux-ish facilities throughout and avoiding the
win32 stuff, you can ALSO avoid the necessity of calling any path
conversion functions at all -- and eliminate a lot of platform-specific
code.  (e.g. let the cygwin dll do ALL the work)

>> The usage of a fixed table instaed of the charset.alias file in
>> libcharset/lib/localcharset.c, function get_charset_aliases() is
>> not good, not good at all.
> 
> The alternative is to have this table stored in a file charset.alias;
> but then every package that includes the module 'localcharset' from
> gnulib (that is, libiconv, gettext, coreutils, and many others) will
> want to modify this file during "make install". And this causes a lot of
> headaches to packaging systems. Therefore, on platforms which have
> widely used packaging systems (Linux,

huh?

> MacOS X, Cygwin), it's better to
> avoid the need for this file.

From inspecting the code, it sure looks like linux still uses the
charset.alias file to me, at least in the released version of
libiconv-1.13.1 (of course, most actual linux platforms don't install
libiconv anyway, since glibc handles that).

> Additionally, on Win32 systems relocatability
> is a must, and the code to compute the location of charset.alias from
> the location of libiconv.dll would be overkill.

Meh, for win32.  Cygwin -- not so much, since cygwin handles the
"relocation" itself, relative to the underlying win32 paths, via its
mount point emulation.  cygiconv-2.dll is always in the
cygwin-translated path "/usr/bin", and charset.alias is in "/usr/lib".

> The reason for these "&& !defined __CYGWIN__" clauses is that - at least
> in Cygwin 1.5.x - gcc has an option that will define _WIN32 or __WIN32__.

Yes, or if you #include <windows.h>.  I was talking about the fact that
in several places, libiconv explicitly #includes windows.h -- and that
triggers _WIN32 and __WIN32__ to be defined, regardless of any gcc
command line args.  Avoid including windows.h, and...

> So, when _WIN32 || __WIN32__ may evaluate to true on Cygwin, or it may
> evaluate to false on Cygwin. Since I don't want libiconv or gettext
> to be compiled in two possible ways on Cygwin, I add
> "&& !defined __CYGWIN__".

Except now we (might) need to distinguish between OLD cygwin (1.5) and
current, supported cygwin (1.7+).  Since __STDC_ISO_10646__ is defined
by the latter (at least, 1.7.8+, even if the underlying functionality
works back to 1.7.2) -- but is NOT defined on cygwin-1.5...it seems ok
to use that symbol to "distinguish" between the two flavors -- except
that this removes your "safety net" concerning __WIN32__ getting defined
on Cygwin, for old cygwin where __STDC_ISO_10646__ is not defined.

> Neither libiconv nor gettext defines or undefines _WIN32 or __WIN32__.
> But they are prepared to either setting.

Should libiconv support for cygwin be 1.7 only, or 1.5 only?  Supporting
both is going seriously uglify the code.  And 1.5 is no longer supported
even by cygwin.com.

You *could* do this:
AC_CHECK_DECLS([cygwin_conv_path], [],[], [[#include <sys/cygwin.h>]])

and use __CYGWIN__ && HAVE_DECL_CYGWIN_CONV_PATH to mean "new" cygwin,
and __CYGWIN__ && !HAVE_DECL_CYGWIN_CONV_PATH to mean "old" cygwin,
but...ugly, horrid, icky...

>>  - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.
> 
> It's not a bug. From POSIX:2008
...
> "may", not "shall".

Well, sure.  But it /would/ be nice to just set errno (EINVAL?) instead
of segfaulting...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-29  2:15 Bruno Haible
@ 2011-01-29 12:34 ` Charles Wilson
  2011-01-29 13:20 ` Charles Wilson
  2011-01-29 16:02 ` Corinna Vinschen
  2 siblings, 0 replies; 33+ messages in thread
From: Charles Wilson @ 2011-01-29 12:34 UTC (permalink / raw)
  To: cygwin, bug-gnu-libiconv

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

On 1/28/2011 5:12 PM, Bruno Haible wrote:
> Please CC the bug-gnu-libiconv mailing list when discussing possible
> bugs in GNU libiconv.

I hadn't intended on involving bug-gnu-libiconv until we had a working
fix, and a consensus here on @cygwin.  But, in any case, here is the
portion of Corinna's patch dealing with the iconv issues, stripped down
to the minimum necessary to correct the "problem".

As pointed out in the @cygwin thread, there are still some open
questions, which I had hoped to avoid by waiting until cygwin-1.7.8 was
released.

1) On cygwin-1.7.8, __STDC_ISO_10646__ is defined, so this change will
   allow "correct" behavior *if compiled on cygwin-1.7.8*.

-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) &&
!defined __CYGWIN__)
+#if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__

   But cygwin-1.7.8 isn't out yet. But with this change (and the "don't
include windows.h" change) then libiconv will still compile properly on
cygwin-1.5 -- which does not support wide chars, and does NOT define
__STDC_ISO_10646__.  However, it WON'T compile properly on cygwin-1.7.x
up to 1.7.7.

2) From cygwin-1.7.2 to cygwin-1.7.7, the following change could be
   used instead (there's an issue with 1.7.1 which doesn't bear
   exploration here):

-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) &&
!defined __CYGWIN__)
+#if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ ||
defined __CYGWIN__

   But arguably, then it would break on "old" cygwin like 1.5.  Perhaps
this is ok, since 1.7 has been "out" for over a year, and maybe
bug-gnu-libiconv doesn't care about old,
unsupported-by-the-cygwin-project versions of cygwin.

In any case, the attached patch goes with option 1 above.  It is
completely orthogonal to, and independent of, the other "relocation"
patch, that I posted to the gnulib list.

2010-01-28  Corinna Vinschen  <...>

	Correct wchar handling on cygwin-1.7.x
	* lib/iconv.c (iconv_canonicalize): Allow __STDC_ISO_10646__
	to control, rather than using __CYGWIN__ to veto.
	* lib/iconv_open1.h: Ditto.
	* libcharset/lib/localcharset.c: Don't include windows.h if
	__CYGWIN__.
	(get_charset_aliases): Remove cygwin workaround; rely on generic
	implementation. Be sure to copy result of nl_langinfo into local
	buffer.

--
Chuck

[-- Attachment #2: libiconv-1.13.1-2.wchar.patch --]
[-- Type: application/x-patch, Size: 4652 bytes --]

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
@ 2011-01-29  2:15 Bruno Haible
  2011-01-29 12:34 ` Charles Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Bruno Haible @ 2011-01-29  2:15 UTC (permalink / raw)
  To: cygwin, Corinna Vinschen, Charles Wilson, bug-gnu-libiconv

Hi Corinna and Chuck,

Please CC the bug-gnu-libiconv mailing list when discussing possible
bugs in GNU libiconv.

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00292.html>:

> the application tests to convert a UTF-8 to WCHAR_T string in four
>   combinations of the current locale, in this order:
> 
>   - iconv_open "C",       iconv "C"
>   - iconv_open "C",       iconv "C.UTF-8"
>   - iconv_open "C.UTF-8", iconv "C"
>   - iconv_open "C.UTF-8", iconv "C.UTF-8"
> 
> Here's what happens in Linux:
> 
>   $ gcc -g -o ic ic.c
>   $ ./ic
>   in = <Liian pitkÃ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
>   in = <Liian pitkÃ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
>   in = <Liian pitkÃ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
>   in = <Liian pitkÃ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
> 
> Here's what happens on Cygwin:
> 
>   $ gcc -g -o ic ic.c -liconv
>   $ ./ic
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ sana>, inbuf = <Ã sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ sana>, inbuf = <Ã sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ sana>, inbuf = <Ã sana>, inbytesleft = 7, outbytesleft = 492
>   in = <Liian pitkÃ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

On glibc systems, the encoding "WCHAR_T" is equivalent to "UCS-4" with machine
dependent endianness and alignment. In particular it is independent of the
locale. That explains the first set of results.

In libiconv, on systems which don't define __STDC_ISO_10646__, the encoding
"WCHAR_T" is equivalent to wchar_t[], that is, dependent on the locale.
Changing the locale encoding after allocating an iconv_t from or to "WCHAR_T"
yields undefined behaviour. That explains the second set of results.

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00299.html>:

> I defined __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.

What is the Cygwin wchar_t[] encoding? Is it UTF-16, like on Win32? The
documentation is silent about it. I had expected to find some word about it
in <http://cygwin.com/cygwin-api/compatibility.html#std-susv4>
or <http://cygwin.com/cygwin-api/std-notes.html>.

In any case, sizeof (wchar_t) == 2. I don't think defining __STDC_ISO_10646__
is compliant with ISO C 99 in this situation. ISO C 99 section 6.10.8.(2) says:

  __STDC_ISO_10646__
          An integer constant of the form yyyymmL (for example,
          199712L), intended to indicate that values of type wchar_t are the
          coded representations of the characters defined by ISO/IEC 10646,
          along with all amendments and technical corrigenda as of the
          specified year and month.

But when characters outside the basic plane, such as
U+12345 (CUNEIFORM SIGN URU TIMES KI), are encoded by 2 consecutive wchar_t
values, values of type wchar_t don't correspond to ISO/IEC 10646 characters.
(Or maybe I'm underestimating what "coded representations" means...?)

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00357.html>:

>   #if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
> This should be
> ...
>   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__

That makes sense if Cygwin guarantees that from now on and in the future,
the wchar_t encoding will always be UTF-16. Is this the case?

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00299.html>:

> Why on earth is libiconv on Cygwin using Windows functions in some
> places?

So that I could reuse the essentially same code on Cygwin as on native Win32.

Charles has submitted a patch on this topic to bug-gnulib; I will handle it.

> the old cygwin_conv_to_posix_path function as well.

Is cygwin_conv_to_posix_path deprecated? Does it introduce limitations of
some kind?

> The usage of a fixed table instaed of the charset.alias file in
> libcharset/lib/localcharset.c, function get_charset_aliases() is
> not good, not good at all.

The alternative is to have this table stored in a file charset.alias;
but then every package that includes the module 'localcharset' from
gnulib (that is, libiconv, gettext, coreutils, and many others) will
want to modify this file during "make install". And this causes a lot of
headaches to packaging systems. Therefore, on platforms which have
widely used packaging systems (Linux, MacOS X, Cygwin), it's better to
avoid the need for this file. Additionally, on Win32 systems relocatability
is a must, and the code to compute the location of charset.alias from
the location of libiconv.dll would be overkill.

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00303.html>:

> It looks like there's been some bitrot with respect
> to some of the "&& !CYGWIN" guards on WIN32.  Both libiconv and gettext,
> IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
> "regular" win32 and for cygwin...which means defined(CYGWIN) guards are
> necessary.

The reason for these "&& !defined __CYGWIN__" clauses is that - at least
in Cygwin 1.5.x - gcc has an option that will define _WIN32 or __WIN32__.
So, when _WIN32 || __WIN32__ may evaluate to true on Cygwin, or it may
evaluate to false on Cygwin. Since I don't want libiconv or gettext
to be compiled in two possible ways on Cygwin, I add
"&& !defined __CYGWIN__".

Neither libiconv nor gettext defines or undefines _WIN32 or __WIN32__.
But they are prepared to either setting.

Replying to <http://www.cygwin.com/ml/cygwin/2011-01/msg00332.html>:

> there ARE still bugs in libiconv on Cygwin -- specifically:
>  - Even though iconv_open has been opened explicitely with "UTF-8" as
>    input string, the conversion still depends on the current application
>    codeset.  That doesn't make sense.

If the other argument to iconv_open is "CHAR" or "WCHAR_T", hence locale
dependent, and you change the locale in between, the result is undefined
behaviour.

>  - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.

It's not a bug. From POSIX:2008
<http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html>
you can infer that (iconv_t) -1 is not a "conversion descriptor". It's a
return value used from iconv_open(), nothing more. From
<http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_close.html>
you can see that the argument of iconv_close() has to be a conversion
descriptor. From the ERRORS section in the same page you can see that
iconv_close() is not required to catch a faulty argument. Note the word
"may", not "shall".

Bruno

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 20:11       ` Corinna Vinschen
@ 2011-01-28 22:13         ` Charles Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Charles Wilson @ 2011-01-28 22:13 UTC (permalink / raw)
  To: cygwin

On 1/25/2011 10:50 AM, Corinna Vinschen wrote:
>>> Please note that I defined
>>> __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is
>>> missing since 1.7.2.
>>
>> Hmmm...maybe I should (re)build libiconv against a snapshot?
> 
> I think that should be safe.  There's most likely no new API which would
> accidentally be pulled in by libiconv(*).
> 
> (*) Famous last words...

Yep.  The new fenv support causes applications to add a dependency (from
their startup code?) to the new symbol _feinitialize. So, cygiconv-2.dll
appears ok, but iconv.exe is not.

--
Chuck


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-27 17:39       ` Charles Wilson
@ 2011-01-27 18:05         ` Corinna Vinschen
  0 siblings, 0 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-27 18:05 UTC (permalink / raw)
  To: cygwin

On Jan 27 11:21, Charles Wilson wrote:
> On 1/27/2011 7:20 AM, Corinna Vinschen wrote:
> > I got it working.  The major reason was that the conversion to wchar_t
> > was broken due to the #if expressions in lib/iconv.c and
> > lib/iconv_open1.h:
> > 
> >   #if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
> > 
> > This should be
> > 
> >   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
> > ...
> >   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
> > 
> > Other than that, here's the full patchset which I applied to let
> > libiconv work more POSIXy on Cygwin.  I tested especially that the Linux
> > code works fine on Cygwin as well.  Use the patch at you own leasure.
> > If you have any questiosn, feel free to ask.
> 
> Thanks for the patch.  I'll use the reloc bits as the basis for a patch
> upstream to gnulib...but unless you:
> 	1) configure libiconv with --enable-relocate and
>            --prefix=/some/really/wierd/tmpdir, AND
> 	2) then installed into /not/the/tmpdir
> then you didn't actually test the relocation stuff.

What I did was to run my testapp under GDB and to check that the
find_shared_library_fullname() function as well as the calling function
get_charset_aliases() were doing as expected.

> Don't worry about it tho; I'll do that. (I also need to research why we
> didn't use /proc/self/maps originally, since it was available in 1.5.  I

Well, there's a comment in the original code which refers to this.
See srclib/progreloc.c, line 149ff.  The comment basically says that
the functionality is known, but we don't use them because it doesn't
match what we do elsewhere.  The fact that the code elsewhere wasn't
required for Cygwin anymore was not noticed for some reason.

> don't think this historical data matters NOW, tho, since surely it's
> been around long enough we can assume its presence...

Yep.

> The important thing, in my mind, is getting the charset conv stuff
> working correctly.  That stuff makes my teeth itch, so I'm very grateful
> you tracked it down!
> 
> One question: does it matter if the code is changed, in libiconv, as you
> suggest, and then libiconv is built on old-cygwin:
>   1) 1.3.x (e.g. fork-that-shall-not-be-named)
>   2) 1.5.x
>   3) 1.7.1--1.7.7
> or, for upstream submission, do I need to be careful about versions and
> munge all of the #ifdefs appropriately?

Cygwin Versions prior to 1.7.7 are not support anyway.  The changes
should work with versions at least back to 1.7.2 and I don't care the
least for older versions.  There's no reason to clutter the code to
support old, unsupported Cygwin versions.  There are existing, older
builds of libiconv available for them.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-27 16:21     ` Corinna Vinschen
@ 2011-01-27 17:39       ` Charles Wilson
  2011-01-27 18:05         ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-27 17:39 UTC (permalink / raw)
  To: cygwin

On 1/27/2011 7:20 AM, Corinna Vinschen wrote:
> I got it working.  The major reason was that the conversion to wchar_t
> was broken due to the #if expressions in lib/iconv.c and
> lib/iconv_open1.h:
> 
>   #if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
> 
> This should be
> 
>   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
> ...
>   #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
> 
> Other than that, here's the full patchset which I applied to let
> libiconv work more POSIXy on Cygwin.  I tested especially that the Linux
> code works fine on Cygwin as well.  Use the patch at you own leasure.
> If you have any questiosn, feel free to ask.

Thanks for the patch.  I'll use the reloc bits as the basis for a patch
upstream to gnulib...but unless you:
	1) configure libiconv with --enable-relocate and
           --prefix=/some/really/wierd/tmpdir, AND
	2) then installed into /not/the/tmpdir
then you didn't actually test the relocation stuff.

Don't worry about it tho; I'll do that. (I also need to research why we
didn't use /proc/self/maps originally, since it was available in 1.5.  I
don't think this historical data matters NOW, tho, since surely it's
been around long enough we can assume its presence...  The downside for
me is that it was NOT around in 1.3.x -- which means my port of libiconv
for the-fork-that-shall-not-be-named will have to *undo* this patch
locally. Oh well.)

The important thing, in my mind, is getting the charset conv stuff
working correctly.  That stuff makes my teeth itch, so I'm very grateful
you tracked it down!

One question: does it matter if the code is changed, in libiconv, as you
suggest, and then libiconv is built on old-cygwin:
  1) 1.3.x (e.g. fork-that-shall-not-be-named)
  2) 1.5.x
  3) 1.7.1--1.7.7
or, for upstream submission, do I need to be careful about versions and
munge all of the #ifdefs appropriately?

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-27 16:05       ` Corinna Vinschen
@ 2011-01-27 17:18         ` Charles Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Charles Wilson @ 2011-01-27 17:18 UTC (permalink / raw)
  To: cygwin

On 1/27/2011 4:25 AM, Corinna Vinschen wrote:
> On Jan 26 22:12, Charles Wilson wrote:
>> Can we get a newer snapshot that the current 20110117?
> 
> Done.

Thanks.  Will try again (with today's snap #2) tonight.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-27  3:53   ` Charles Wilson
@ 2011-01-27 16:21     ` Corinna Vinschen
  2011-01-27 17:39       ` Charles Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-27 16:21 UTC (permalink / raw)
  To: cygwin

On Jan 26 22:09, Charles Wilson wrote:
> On 1/24/2011 10:09 PM, Charles Wilson wrote:
> > Now, since there has not yet been an updated upstream release of
> > libiconv, my first step would be to simply rebuild our existing
> > libiconv-1.13.1 on a platform with current cygwin (1.7.7-1), and try the
> > test case again.
> 
> Rebuilt libiconv against 20110117 snapshot.  Built test case.  Still see
> erroneous behavior:
> 
> iconv: 138 <Invalid or incomplete multibyte or wide character>
> in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
> = 492
> iconv: 138 <Invalid or incomplete multibyte or wide character>
> in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
> = 492
> iconv: 138 <Invalid or incomplete multibyte or wide character>
> in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
> = 492
> in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

I got it working.  The major reason was that the conversion to wchar_t
was broken due to the #if expressions in lib/iconv.c and
lib/iconv_open1.h:

  #if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)

This should be

  #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__

if you use the latests snapshot I *just* uploaded.  It's the second one
today since the definition in /usr/include/features.h wasn't picked up
at all.  So I had to change the newlib headers similar to what Linux
does to get it working.  With 1.7.7 you would have to define

  #if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__ || defined __CYGWIN__

Other than that, here's the full patchset which I applied to let
libiconv work more POSIXy on Cygwin.  I tested especially that the Linux
code works fine on Cygwin as well.  Use the patch at you own leasure.
If you have any questiosn, feel free to ask.

--- libiconv-1.13.1.orig/lib/relocatable.c	2009-06-21 13:17:33.000000000 +0200
+++ libiconv-1.13.1/lib/relocatable.c	2011-01-27 11:37:16.748956079 +0100
@@ -43,7 +43,7 @@
 # include "xalloc.h"
 #endif
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
 #endif
@@ -70,8 +70,8 @@
    ISSLASH(C)           tests whether C is a directory separator character.
    IS_PATH_WITH_DIR(P)  tests whether P contains a directory specification.
  */
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__ || defined __EMX__ || defined __DJGPP__
-  /* Win32, Cygwin, OS/2, DOS */
+#if defined _WIN32 || defined __WIN32__ || defined __EMX__ || defined __DJGPP__
+  /* Win32, OS/2, DOS */
 # define ISSLASH(C) ((C) == '/' || (C) == '\\')
 # define HAS_DEVICE(P) \
     ((((P)[0] >= 'A' && (P)[0] <= 'Z') || ((P)[0] >= 'a' && (P)[0] <= 'z')) \
@@ -281,7 +281,7 @@ compute_curr_prefix (const char *orig_in
 /* Full pathname of shared library, or NULL.  */
 static char *shared_library_fullname;
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 
 /* Determine the full pathname of the shared library when it is loaded.  */
 
@@ -303,37 +303,20 @@ DllMain (HINSTANCE module_handle, DWORD 
 	/* Shouldn't happen.  */
 	return FALSE;
 
-      {
-#if defined __CYGWIN__
-	/* On Cygwin, we need to convert paths coming from Win32 system calls
-	   to the Unix-like slashified notation.  */
-	static char location_as_posix_path[2 * MAX_PATH];
-	/* There's no error return defined for cygwin_conv_to_posix_path.
-	   See cygwin-api/func-cygwin-conv-to-posix-path.html.
-	   Does it overflow the buffer of expected size MAX_PATH or does it
-	   truncate the path?  I don't know.  Let's catch both.  */
-	cygwin_conv_to_posix_path (location, location_as_posix_path);
-	location_as_posix_path[MAX_PATH - 1] = '\0';
-	if (strlen (location_as_posix_path) >= MAX_PATH - 1)
-	  /* A sign of buffer overflow or path truncation.  */
-	  return FALSE;
-	shared_library_fullname = strdup (location_as_posix_path);
-#else
-	shared_library_fullname = strdup (location);
-#endif
-      }
+      shared_library_fullname = strdup (location);
     }
 
   return TRUE;
 }
 
-#else /* Unix except Cygwin */
+#else /* Unix */
 
 static void
 find_shared_library_fullname ()
 {
-#if defined __linux__ && __GLIBC__ >= 2
-  /* Linux has /proc/self/maps. glibc 2 has the getline() function.  */
+#if (defined __linux__ && __GLIBC__ >= 2) || defined __CYGWIN__
+  /* Linux has /proc/self/maps. glibc 2 has the getline() function.
+     Cygwin as well. */
   FILE *fp;
 
   /* Open the current process' maps file.  It describes one VMA per line.  */
@@ -378,7 +361,7 @@ find_shared_library_fullname ()
 #endif
 }
 
-#endif /* (WIN32 or Cygwin) / (Unix except Cygwin) */
+#endif /* WIN32 / Unix */
 
 /* Return the full pathname of the current shared library.
    Return NULL if unknown.
@@ -386,7 +369,7 @@ find_shared_library_fullname ()
 static char *
 get_shared_library_fullname ()
 {
-#if !(defined _WIN32 || defined __WIN32__ || defined __CYGWIN__)
+#if !(defined _WIN32 || defined __WIN32__)
   static bool tried_find_shared_library_fullname;
   if (!tried_find_shared_library_fullname)
     {
--- libiconv-1.13.1.orig/lib/iconv.c	2009-06-21 13:17:33.000000000 +0200
+++ libiconv-1.13.1/lib/iconv.c	2011-01-27 12:46:21.544296281 +0100
@@ -550,7 +550,7 @@ const char * iconv_canonicalize (const c
     if (ap->encoding_index == ei_local_wchar_t) {
       /* On systems which define __STDC_ISO_10646__, wchar_t is Unicode.
          This is also the case on native Woe32 systems.  */
-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
+#if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
       if (sizeof(wchar_t) == 4) {
         index = ei_ucs4internal;
         break;
--- libiconv-1.13.1.orig/lib/iconv_open1.h	2009-06-21 13:17:33.000000000 +0200
+++ libiconv-1.13.1/lib/iconv_open1.h	2011-01-27 12:47:03.119371056 +0100
@@ -98,7 +98,7 @@
     if (ap->encoding_index == ei_local_wchar_t) {
       /* On systems which define __STDC_ISO_10646__, wchar_t is Unicode.
          This is also the case on native Woe32 systems.  */
-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
+#if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
       if (sizeof(wchar_t) == 4) {
         to_index = ei_ucs4internal;
         break;
@@ -174,7 +174,7 @@
     if (ap->encoding_index == ei_local_wchar_t) {
       /* On systems which define __STDC_ISO_10646__, wchar_t is Unicode.
          This is also the case on native Woe32 systems.  */
-#if __STDC_ISO_10646__ || ((defined _WIN32 || defined __WIN32__) && !defined __CYGWIN__)
+#if __STDC_ISO_10646__ || defined _WIN32 || defined __WIN32__
       if (sizeof(wchar_t) == 4) {
         from_index = ei_ucs4internal;
         break;
--- libiconv-1.13.1.orig/libcharset/lib/localcharset.c	2009-06-21 13:17:33.000000000 +0200
+++ libiconv-1.13.1/libcharset/lib/localcharset.c	2011-01-27 11:53:33.201852883 +0100
@@ -52,10 +52,6 @@
 #   include <locale.h>
 #  endif
 # endif
-# ifdef __CYGWIN__
-#  define WIN32_LEAN_AND_MEAN
-#  include <windows.h>
-# endif
 #elif defined WIN32_NATIVE
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
@@ -76,7 +72,7 @@
 # include "configmake.h"
 #endif
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__ || defined __EMX__ || defined __DJGPP__
+#if defined _WIN32 || defined __WIN32__ || defined __EMX__ || defined __DJGPP__
   /* Win32, Cygwin, OS/2, DOS */
 # define ISSLASH(C) ((C) == '/' || (C) == '\\')
 #endif
@@ -117,7 +113,7 @@ get_charset_aliases (void)
   cp = charset_aliases;
   if (cp == NULL)
     {
-#if !(defined DARWIN7 || defined VMS || defined WIN32_NATIVE || defined __CYGWIN__)
+#if !(defined DARWIN7 || defined VMS || defined WIN32_NATIVE)
       FILE *fp;
       const char *dir;
       const char *base = "charset.alias";
@@ -276,7 +272,7 @@ get_charset_aliases (void)
 	   "DECKOREAN" "\0" "EUC-KR" "\0";
 # endif
 
-# if defined WIN32_NATIVE || defined __CYGWIN__
+# if defined WIN32_NATIVE
       /* To avoid the troubles of installing a separate file in the same
 	 directory as the DLL and of retrieving the DLL's directory at
 	 runtime, simply inline the aliases here.  */
@@ -332,55 +328,14 @@ locale_charset (void)
 
 # if HAVE_LANGINFO_CODESET
 
-  /* Most systems support nl_langinfo (CODESET) nowadays.  */
-  codeset = nl_langinfo (CODESET);
-
-#  ifdef __CYGWIN__
-  /* Cygwin 2006 does not have locales.  nl_langinfo (CODESET) always
-     returns "US-ASCII".  As long as this is not fixed, return the suffix
-     of the locale name from the environment variables (if present) or
-     the codepage as a number.  */
-  if (codeset != NULL && strcmp (codeset, "US-ASCII") == 0)
-    {
-      const char *locale;
-      static char buf[2 + 10 + 1];
+  /* Most systems support nl_langinfo (CODESET) nowadays.
+  
+     POSIX allows that the returned pointer may point to a static area that
+     may be overwritten by subsequent calls to setlocale or nl_langinfo. */
+  static char codeset_buf[64];
 
-      locale = getenv ("LC_ALL");
-      if (locale == NULL || locale[0] == '\0')
-	{
-	  locale = getenv ("LC_CTYPE");
-	  if (locale == NULL || locale[0] == '\0')
-	    locale = getenv ("LANG");
-	}
-      if (locale != NULL && locale[0] != '\0')
-	{
-	  /* If the locale name contains an encoding after the dot, return
-	     it.  */
-	  const char *dot = strchr (locale, '.');
-
-	  if (dot != NULL)
-	    {
-	      const char *modifier;
-
-	      dot++;
-	      /* Look for the possible @... trailer and remove it, if any.  */
-	      modifier = strchr (dot, '@');
-	      if (modifier == NULL)
-		return dot;
-	      if (modifier - dot < sizeof (buf))
-		{
-		  memcpy (buf, dot, modifier - dot);
-		  buf [modifier - dot] = '\0';
-		  return buf;
-		}
-	    }
-	}
-
-      /* Woe32 has a function returning the locale's codepage as a number.  */
-      sprintf (buf, "CP%u", GetACP ());
-      codeset = buf;
-    }
-#  endif
+  codeset_buf[0] = '\0';
+  codeset = strncat (codeset_buf, nl_langinfo (CODESET), sizeof (codeset_buf));
 
 # else
 
--- libiconv-1.13.1.orig/libcharset/lib/relocatable.c	2009-06-21 13:17:33.000000000 +0200
+++ libiconv-1.13.1/libcharset/lib/relocatable.c	2011-01-27 11:37:43.626054538 +0100
@@ -43,7 +43,7 @@
 # include "xalloc.h"
 #endif
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
 #endif
@@ -70,8 +70,8 @@
    ISSLASH(C)           tests whether C is a directory separator character.
    IS_PATH_WITH_DIR(P)  tests whether P contains a directory specification.
  */
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__ || defined __EMX__ || defined __DJGPP__
-  /* Win32, Cygwin, OS/2, DOS */
+#if defined _WIN32 || defined __WIN32__ || defined __EMX__ || defined __DJGPP__
+  /* Win32, OS/2, DOS */
 # define ISSLASH(C) ((C) == '/' || (C) == '\\')
 # define HAS_DEVICE(P) \
     ((((P)[0] >= 'A' && (P)[0] <= 'Z') || ((P)[0] >= 'a' && (P)[0] <= 'z')) \
@@ -290,7 +290,7 @@ compute_curr_prefix (const char *orig_in
 /* Full pathname of shared library, or NULL.  */
 static char *shared_library_fullname;
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 
 /* Determine the full pathname of the shared library when it is loaded.  */
 
@@ -313,35 +313,19 @@ DllMain (HINSTANCE module_handle, DWORD 
 	return FALSE;
 
       {
-#if defined __CYGWIN__
-	/* On Cygwin, we need to convert paths coming from Win32 system calls
-	   to the Unix-like slashified notation.  */
-	static char location_as_posix_path[2 * MAX_PATH];
-	/* There's no error return defined for cygwin_conv_to_posix_path.
-	   See cygwin-api/func-cygwin-conv-to-posix-path.html.
-	   Does it overflow the buffer of expected size MAX_PATH or does it
-	   truncate the path?  I don't know.  Let's catch both.  */
-	cygwin_conv_to_posix_path (location, location_as_posix_path);
-	location_as_posix_path[MAX_PATH - 1] = '\0';
-	if (strlen (location_as_posix_path) >= MAX_PATH - 1)
-	  /* A sign of buffer overflow or path truncation.  */
-	  return FALSE;
-	shared_library_fullname = strdup (location_as_posix_path);
-#else
 	shared_library_fullname = strdup (location);
-#endif
       }
     }
 
   return TRUE;
 }
 
-#else /* Unix except Cygwin */
+#else /* Unix */
 
 static void
 find_shared_library_fullname ()
 {
-#if defined __linux__ && __GLIBC__ >= 2
+#if (defined __linux__ && __GLIBC__ >= 2) || defined __CYGWIN__
   /* Linux has /proc/self/maps. glibc 2 has the getline() function.  */
   FILE *fp;
 
@@ -387,7 +371,7 @@ find_shared_library_fullname ()
 #endif
 }
 
-#endif /* (WIN32 or Cygwin) / (Unix except Cygwin) */
+#endif /* WIN32 / Unix */
 
 /* Return the full pathname of the current shared library.
    Return NULL if unknown.
@@ -395,7 +379,7 @@ find_shared_library_fullname ()
 static char *
 get_shared_library_fullname ()
 {
-#if !(defined _WIN32 || defined __WIN32__ || defined __CYGWIN__)
+#if !(defined _WIN32 || defined __WIN32__)
   static bool tried_find_shared_library_fullname;
   if (!tried_find_shared_library_fullname)
     {
--- libiconv-1.13.1.orig/srclib/errno.in.h	2009-06-21 13:31:08.000000000 +0200
+++ libiconv-1.13.1/srclib/errno.in.h	2011-01-27 11:39:52.666924514 +0100
@@ -30,7 +30,7 @@
 
 
 /* On native Windows platforms, many macros are not defined.  */
-# if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
+# if defined _WIN32 || defined __WIN32__
 
 /* POSIX says that EAGAIN and EWOULDBLOCK may have the same value.  */
 #  define EWOULDBLOCK     EAGAIN
--- libiconv-1.13.1.orig/srclib/progreloc.c	2009-06-21 13:31:08.000000000 +0200
+++ libiconv-1.13.1/srclib/progreloc.c	2011-01-27 11:39:01.062575765 +0100
@@ -38,7 +38,7 @@
 # define WIN32_NATIVE
 #endif
 
-#if defined WIN32_NATIVE || defined __CYGWIN__
+#if defined WIN32_NATIVE
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
 #endif
@@ -64,8 +64,8 @@
    ISSLASH(C)           tests whether C is a directory separator character.
    IS_PATH_WITH_DIR(P)  tests whether P contains a directory specification.
  */
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__ || defined __EMX__ || defined __DJGPP__
-  /* Win32, Cygwin, OS/2, DOS */
+#if defined _WIN32 || defined __WIN32__ || defined __EMX__ || defined __DJGPP__
+  /* Win32, OS/2, DOS */
 # define ISSLASH(C) ((C) == '/' || (C) == '\\')
 # define HAS_DEVICE(P) \
     ((((P)[0] >= 'A' && (P)[0] <= 'Z') || ((P)[0] >= 'a' && (P)[0] <= 'z')) \
@@ -90,7 +90,7 @@
 
 #if ENABLE_RELOCATABLE
 
-#ifdef __linux__
+#if defined __linux__ || defined __CYGWIN__
 /* File descriptor of the executable.
    (Only used to verify that we find the correct executable.)  */
 static int executable_fd = -1;
@@ -100,12 +100,12 @@ static int executable_fd = -1;
 static bool
 maybe_executable (const char *filename)
 {
-  /* Woe32 lacks the access() function, but Cygwin doesn't.  */
-#if !(defined WIN32_NATIVE && !defined __CYGWIN__)
+  /* Woe32 lacks the access() function.  */
+#if !(defined WIN32_NATIVE)
   if (access (filename, X_OK) < 0)
     return false;
 
-#ifdef __linux__
+#if defined __linux__ || defined __CYGWIN__
   if (executable_fd >= 0)
     {
       /* If we already have an executable_fd, check that filename points to
@@ -136,7 +136,7 @@ maybe_executable (const char *filename)
 static char *
 find_executable (const char *argv0)
 {
-#if defined WIN32_NATIVE || defined __CYGWIN__
+#if defined WIN32_NATIVE
   char location[MAX_PATH];
   int length = GetModuleFileName (NULL, location, sizeof (location));
   if (length < 0)
@@ -144,36 +144,13 @@ find_executable (const char *argv0)
   if (!IS_PATH_WITH_DIR (location))
     /* Shouldn't happen.  */
     return NULL;
-  {
-#if defined __CYGWIN__
-    /* cygwin-1.5.13 (2005-03-01) or newer would also allow a Linux-like
-       implementation: readlink of "/proc/self/exe".  But using the
-       result of the Win32 system call is simpler and is consistent with the
-       code in relocatable.c.  */
-    /* On Cygwin, we need to convert paths coming from Win32 system calls
-       to the Unix-like slashified notation.  */
-    static char location_as_posix_path[2 * MAX_PATH];
-    /* There's no error return defined for cygwin_conv_to_posix_path.
-       See cygwin-api/func-cygwin-conv-to-posix-path.html.
-       Does it overflow the buffer of expected size MAX_PATH or does it
-       truncate the path?  I don't know.  Let's catch both.  */
-    cygwin_conv_to_posix_path (location, location_as_posix_path);
-    location_as_posix_path[MAX_PATH - 1] = '\0';
-    if (strlen (location_as_posix_path) >= MAX_PATH - 1)
-      /* A sign of buffer overflow or path truncation.  */
-      return NULL;
-    /* Call canonicalize_file_name, because Cygwin supports symbolic links.  */
-    return canonicalize_file_name (location_as_posix_path);
-#else
-    return xstrdup (location);
-#endif
-  }
-#else /* Unix && !Cygwin */
-#ifdef __linux__
-  /* The executable is accessible as /proc/<pid>/exe.  In newer Linux
-     versions, also as /proc/self/exe.  Linux >= 2.1 provides a symlink
-     to the true pathname; older Linux versions give only device and ino,
-     enclosed in brackets, which we cannot use here.  */
+  return xstrdup (location);
+#else /* Unix */
+#if defined __linux__ || defined __CYGWIN__
+  /* The executable is accessible as /proc/<pid>/exe.  In Cygwin and in
+     newer Linux versions, also as /proc/self/exe.  Linux >= 2.1 provides
+     a symlink to the true pathname; older Linux versions give only device
+     and ino, enclosed in brackets, which we cannot use here.  */
   {
     char *link;
 
--- libiconv-1.13.1.orig/srclib/stdio-write.c	2009-06-21 13:31:08.000000000 +0200
+++ libiconv-1.13.1/srclib/stdio-write.c	2011-01-27 11:38:26.831997673 +0100
@@ -29,7 +29,7 @@
    error EINVAL.  This write() function is at the basis of the function
    which flushes the buffer of a FILE stream.  */
 
-# if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
+# if defined _WIN32 || defined __WIN32__
 
 #  include <errno.h>
 #  include <signal.h>
--- libiconv-1.13.1.orig/srclib/relocatable.c	2009-06-21 13:31:08.000000000 +0200
+++ libiconv-1.13.1/srclib/relocatable.c	2011-01-27 11:38:19.852491486 +0100
@@ -43,7 +43,7 @@
 # include "xalloc.h"
 #endif
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
 #endif
@@ -70,8 +70,8 @@
    ISSLASH(C)           tests whether C is a directory separator character.
    IS_PATH_WITH_DIR(P)  tests whether P contains a directory specification.
  */
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__ || defined __EMX__ || defined __DJGPP__
-  /* Win32, Cygwin, OS/2, DOS */
+#if defined _WIN32 || defined __WIN32__ || defined __EMX__ || defined __DJGPP__
+  /* Win32, OS/2, DOS */
 # define ISSLASH(C) ((C) == '/' || (C) == '\\')
 # define HAS_DEVICE(P) \
     ((((P)[0] >= 'A' && (P)[0] <= 'Z') || ((P)[0] >= 'a' && (P)[0] <= 'z')) \
@@ -290,7 +290,7 @@ compute_curr_prefix (const char *orig_in
 /* Full pathname of shared library, or NULL.  */
 static char *shared_library_fullname;
 
-#if defined _WIN32 || defined __WIN32__ || defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 
 /* Determine the full pathname of the shared library when it is loaded.  */
 
@@ -312,31 +312,13 @@ DllMain (HINSTANCE module_handle, DWORD 
 	/* Shouldn't happen.  */
 	return FALSE;
 
-      {
-#if defined __CYGWIN__
-	/* On Cygwin, we need to convert paths coming from Win32 system calls
-	   to the Unix-like slashified notation.  */
-	static char location_as_posix_path[2 * MAX_PATH];
-	/* There's no error return defined for cygwin_conv_to_posix_path.
-	   See cygwin-api/func-cygwin-conv-to-posix-path.html.
-	   Does it overflow the buffer of expected size MAX_PATH or does it
-	   truncate the path?  I don't know.  Let's catch both.  */
-	cygwin_conv_to_posix_path (location, location_as_posix_path);
-	location_as_posix_path[MAX_PATH - 1] = '\0';
-	if (strlen (location_as_posix_path) >= MAX_PATH - 1)
-	  /* A sign of buffer overflow or path truncation.  */
-	  return FALSE;
-	shared_library_fullname = strdup (location_as_posix_path);
-#else
-	shared_library_fullname = strdup (location);
-#endif
-      }
+      shared_library_fullname = strdup (location);
     }
 
   return TRUE;
 }
 
-#else /* Unix except Cygwin */
+#else /* Unix */
 
 static void
 find_shared_library_fullname ()
@@ -387,7 +369,7 @@ find_shared_library_fullname ()
 #endif
 }
 
-#endif /* (WIN32 or Cygwin) / (Unix except Cygwin) */
+#endif /* WIN32 / Unix */
 
 /* Return the full pathname of the current shared library.
    Return NULL if unknown.
@@ -395,7 +377,7 @@ find_shared_library_fullname ()
 static char *
 get_shared_library_fullname ()
 {
-#if !(defined _WIN32 || defined __WIN32__ || defined __CYGWIN__)
+#if !(defined _WIN32 || defined __WIN32__ )
   static bool tried_find_shared_library_fullname;
   if (!tried_find_shared_library_fullname)
     {
--- libiconv-1.13.1.orig/srclib/sigprocmask.c	2009-06-21 13:31:08.000000000 +0200
+++ libiconv-1.13.1/srclib/sigprocmask.c	2011-01-27 11:37:56.450147229 +0100
@@ -46,7 +46,7 @@
 /* On native Windows, as of 2008, the signal SIGABRT_COMPAT is an alias
    for the signal SIGABRT.  Only one signal handler is stored for both
    SIGABRT and SIGABRT_COMPAT.  SIGABRT_COMPAT is not a signal of its own.  */
-#if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
+#if defined _WIN32 || defined __WIN32__
 # undef SIGABRT_COMPAT
 # define SIGABRT_COMPAT 6
 #endif

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
@ 2011-01-27 16:06 simrw
  0 siblings, 0 replies; 33+ messages in thread
From: simrw @ 2011-01-27 16:06 UTC (permalink / raw)
  To: cygwin

> If that doesn't correct the issue...then I'd try to run your test case
> on linux, but *explicitly* using libiconv on that system, rather than
> (as is typically the case on linux) relying on the underlying glibc
> implementation of iconv functionality.
>
> Did this.  Here are the characteristics of the test case object and
> executable:
>
> $ ldd ./foo
>         linux-vdso.so.1 =>  (0x00007fff51928000)
>         libiconv.so.2 => /home/me/libiconv/_inst/lib/libiconv.so.2
> (0x00007f0b7d7dd000)
>         libc.so.6 => /lib64/libc.so.6 (0x0000003d5b400000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000003d5b000000)
> $ nm foo.o | grep ' U '
>                  U __errno_location
>                  U exit
>                  U fprintf
>          >>      U iconv
>          >>      U iconv_close
>          >>      U iconv_open
>                  U printf
>                  U setlocale
>                  U stderr
>                  U strerror
>                  U strlen

IMHO, that is not correctly compiled. You are still using
the version in glibc.
libiconv only has "libiconv_open", "libiconv_close" and "libiconv".
Looks like there is missing include path.
eg.
gcc -I/home/me/libiconv/_inst/include -o foo foo.c
-L/home/me/libiconv/_inst/lib -liconv

iconv.h in libiconv has eg.
#define iconv_open libiconv_open

The "nm" should look like this -
                 U __errno_location
                 U exit
                 U fprintf
                 U libiconv
                 U libiconv_close
                 U libiconv_open
                 U printf
                 U setlocale
                 U stderr
                 U strerror

Roger


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-27  5:46     ` Charles Wilson
@ 2011-01-27 16:05       ` Corinna Vinschen
  2011-01-27 17:18         ` Charles Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-27 16:05 UTC (permalink / raw)
  To: cygwin

On Jan 26 22:12, Charles Wilson wrote:
> On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> > - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
> >   ei_ucs2internal encoding table.  I'm not sure if that's right or
> >   wrong, but it looks worrying.  Please note that I defined
> >   __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is missing
> >   since 1.7.2.
> 
> Can we get a newer snapshot that the current 20110117?

Done.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 15:04   ` Corinna Vinschen
  2011-01-25 18:58     ` Charles Wilson
@ 2011-01-27  5:46     ` Charles Wilson
  2011-01-27 16:05       ` Corinna Vinschen
  1 sibling, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-27  5:46 UTC (permalink / raw)
  To: cygwin

On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
>   ei_ucs2internal encoding table.  I'm not sure if that's right or
>   wrong, but it looks worrying.  Please note that I defined
>   __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is missing
>   since 1.7.2.

Can we get a newer snapshot that the current 20110117?

--
Chuck



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 11:15 ` Charles Wilson
  2011-01-25 15:04   ` Corinna Vinschen
@ 2011-01-27  3:53   ` Charles Wilson
  2011-01-27 16:21     ` Corinna Vinschen
  1 sibling, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-27  3:53 UTC (permalink / raw)
  To: cygwin

On 1/24/2011 10:09 PM, Charles Wilson wrote:
> Now, since there has not yet been an updated upstream release of
> libiconv, my first step would be to simply rebuild our existing
> libiconv-1.13.1 on a platform with current cygwin (1.7.7-1), and try the
> test case again.

Rebuilt libiconv against 20110117 snapshot.  Built test case.  Still see
erroneous behavior:

iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
= 492
iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
= 492
iconv: 138 <Invalid or incomplete multibyte or wide character>
in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft
= 492
in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

> If that doesn't correct the issue...then I'd try to run your test case
> on linux, but *explicitly* using libiconv on that system, rather than
> (as is typically the case on linux) relying on the underlying glibc
> implementation of iconv functionality. 

Did this.  Here are the characteristics of the test case object and
executable:

$ ldd ./foo
        linux-vdso.so.1 =>  (0x00007fff51928000)
        libiconv.so.2 => /home/me/libiconv/_inst/lib/libiconv.so.2
(0x00007f0b7d7dd000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003d5b400000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003d5b000000)

$ nm foo.o | grep ' U '
                 U __errno_location
                 U exit
                 U fprintf
         >>      U iconv
         >>      U iconv_close
         >>      U iconv_open
                 U printf
                 U setlocale
                 U stderr
                 U strerror
                 U strlen

It works fine:

in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960

> If the test case fails there,
> then we've got a presumption that the problem is in the (generic,
> cross-platform bits of) libiconv library itself. 

Well, apparently the problem is not the generic, cross-platform bits of
libiconv.  It's in the cygwin-specific bits, and/or how it interfaces
with cygwin's underlying charset manips.  So...

> Then, it's debugging
> time... :-(

...it's still debugging time.  Sigh.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-26 17:01   ` Charles Wilson
@ 2011-01-26 22:39     ` Corinna Vinschen
  0 siblings, 0 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-26 22:39 UTC (permalink / raw)
  To: cygwin

On Jan 26 08:43, Charles Wilson wrote:
> On 1/26/2011 8:26 AM, Corinna Vinschen wrote:
> > On Jan 26 13:15, simrw@sim-basis.de wrote:
> >>> Here's what happens on Cygwin:
> >>>   - Even though the last parameter to iconv is defined in bytes, the
> >>>     value of outbytesleft after the conversion is the number of remaining
> >>>     wchar"t's, not the number of remaining bytes.  That's contrary to
> >>> what POSIX defines, see
> >>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> >>
> >> IMHO, the count is correct.
> >> On Windows/Cygwin, wchar_t is 2 bytes, on Linux, 4 bytes.
> >> So the buffer is 512 bytes.
> >> In the first 3 cases, 10 input bytes were consumed so that there remains
> >> in the buffer (512 - 20) = 492 bytes.
> >> In the last case all 16 bytes are consumed so there remains in
> >> the buffer (512 - 32) = 480 bytes.
> > 
> > Yes, you're right.  Quite obviously I misinterpreted the results without
> > realizing that the buffer is smaller under Cygwin.
> 
> Sure, but there ARE still bugs in libiconv on Cygwin -- specifically:
>  - Even though iconv_open has been opened explicitely with "UTF-8" as
>    input string, the conversion still depends on the current application
>    codeset.  That doesn't make sense.
> and
>  - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.

Indeed.  But it was an important hint, nevertheless.  It just didn't
occur to me that the buffer size is different between Cygwin and
Linux.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-26 13:50 ` Corinna Vinschen
@ 2011-01-26 17:01   ` Charles Wilson
  2011-01-26 22:39     ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-26 17:01 UTC (permalink / raw)
  To: cygwin

On 1/26/2011 8:26 AM, Corinna Vinschen wrote:
> On Jan 26 13:15, simrw@sim-basis.de wrote:
>>> Here's what happens on Cygwin:
>>>
>>> $ gcc -g -o ic ic.c -liconv
>>> $ ./ic
>>> iconv: 138 <Invalid or incomplete multibyte or wide character>
>>> in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>>>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>>>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
>> outbytesleft = 492
>>>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
>>>
>>> So, AFAICS, there are two problems:
>>>
>>>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>>>     input string, the conversion still depends on the current application
>>>     codeset.  That dsoesn't make sense.
>>>
>>>   - Even though the last parameter to iconv is defined in bytes, the
>>>     value of outbytesleft after the conversion is the number of remaining
>>>     wchar"t's, not the number of remaining bytes.  That's contrary to
>>> what POSIX defines, see
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>
>> IMHO, the count is correct.
>> On Windows/Cygwin, wchar_t is 2 bytes, on Linux, 4 bytes.
>> So the buffer is 512 bytes.
>> In the first 3 cases, 10 input bytes were consumed so that there remains
>> in the buffer (512 - 20) = 492 bytes.
>> In the last case all 16 bytes are consumed so there remains in
>> the buffer (512 - 32) = 480 bytes.
> 
> Yes, you're right.  Quite obviously I misinterpreted the results without
> realizing that the buffer is smaller under Cygwin.

Sure, but there ARE still bugs in libiconv on Cygwin -- specifically:
 - Even though iconv_open has been opened explicitely with "UTF-8" as
   input string, the conversion still depends on the current application
   codeset.  That doesn't make sense.
and
 - 'iconv_close ((iconv_t) -1);' crashes the application with a SEGV.

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-26 13:39 simrw
@ 2011-01-26 13:50 ` Corinna Vinschen
  2011-01-26 17:01   ` Charles Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-26 13:50 UTC (permalink / raw)
  To: cygwin

On Jan 26 13:15, simrw@sim-basis.de wrote:
> > Here's what happens on Cygwin:
> >
> > $ gcc -g -o ic ic.c -liconv
> > $ ./ic
> > iconv: 138 <Invalid or incomplete multibyte or wide character>
> > in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
> outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
> outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
> outbytesleft = 492
> >   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
> >
> > So, AFAICS, there are two problems:
> >
> >   - Even though iconv_open has been opened explicitely with "UTF-8" as
> >     input string, the conversion still depends on the current application
> >     codeset.  That dsoesn't make sense.
> >
> >   - Even though the last parameter to iconv is defined in bytes, the
> >     value of outbytesleft after the conversion is the number of remaining
> >     wchar"t's, not the number of remaining bytes.  That's contrary to
> > what POSIX defines, see
> > http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> 
> IMHO, the count is correct.
> On Windows/Cygwin, wchar_t is 2 bytes, on Linux, 4 bytes.
> So the buffer is 512 bytes.
> In the first 3 cases, 10 input bytes were consumed so that there remains
> in the buffer (512 - 20) = 492 bytes.
> In the last case all 16 bytes are consumed so there remains in
> the buffer (512 - 32) = 480 bytes.

Yes, you're right.  Quite obviously I misinterpreted the results without
realizing that the buffer is smaller under Cygwin.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
@ 2011-01-26 13:39 simrw
  2011-01-26 13:50 ` Corinna Vinschen
  0 siblings, 1 reply; 33+ messages in thread
From: simrw @ 2011-01-26 13:39 UTC (permalink / raw)
  To: cygwin

> Here's what happens on Cygwin:
>
> $ gcc -g -o ic ic.c -liconv
> $ ./ic
> iconv: 138 <Invalid or incomplete multibyte or wide character>
> in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <ÃƒÂ¤ sana>, inbytesleft = 7,
outbytesleft = 492
>   in = <Liian pitkÃƒÂ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
>
> So, AFAICS, there are two problems:
>
>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>     input string, the conversion still depends on the current application
>     codeset.  That dsoesn't make sense.
>
>   - Even though the last parameter to iconv is defined in bytes, the
>     value of outbytesleft after the conversion is the number of remaining
>     wchar"t's, not the number of remaining bytes.  That's contrary to
> what POSIX defines, see
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html

IMHO, the count is correct.
On Windows/Cygwin, wchar_t is 2 bytes, on Linux, 4 bytes.
So the buffer is 512 bytes.
In the first 3 cases, 10 input bytes were consumed so that there remains
in the buffer (512 - 20) = 492 bytes.
In the last case all 16 bytes are consumed so there remains in
the buffer (512 - 32) = 480 bytes.

Roger


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 18:58     ` Charles Wilson
@ 2011-01-25 20:11       ` Corinna Vinschen
  2011-01-28 22:13         ` Charles Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-25 20:11 UTC (permalink / raw)
  To: cygwin

On Jan 25 10:04, Charles Wilson wrote:
> On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> > - Why on earth is libiconv on Cygwin using Windows functions in some
> >   places?
> > 
> >   - libcharset/lib/relocatable.c
> >   - srclib/progreloc.c
> >   - srclib/relocatable.c
> >   - lib/relocatable.c
> 
> whoo boy. That's...a long story. It's all part of Bruno's magic
> relocatability machinery.  However, on cygwin it should be using unixish
> mechanisms (at least for exe's -- looking at /proc/$pid/exe.  For
> DLLs...I think it needs to keep using the DllMain approach).

Really?  Why can't you use the same mechanism as on Linux?  The
find_shared_library_fullname() function examines /proc/self/maps, which
is also available on Cygwin.  This was already available on 1.5!

> > - libcharset/lib/relocatable.c and srclib/relocatable.c define their own
> >   DllMain and use Windows functions.  And the old
> >   cygwin_conv_to_posix_path function as well.
> 
> Well, yes.  It's how DLLs determine their installation path, so they can
> then automatically deduce the relative path to <whatever>.  And since
> that requires using a win32 function (GetModuleFileName) it needs to
> convert to cygwin format.

See above.

> > - Same file, function locale_charset() contains old Cygwin-specific
> >   code which is outdated.  AFAICS it shouldn't hurt, though, since
> >   Cygwin no longer returns "US-ASCII".
> > 
> > - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
> >   ei_ucs2internal encoding table.  I'm not sure if that's right or
> >   wrong, but it looks worrying.
> 
> Well, remember (A) upstream libiconv itself hasn't been updated since
> 30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our
> most recent version (1.13.1-1) was released almost simultaneously (23
> Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1
> thru 1.7.5.

I wasn't trying to blame you.  I was just trying to point out potential
problems which deserve a more POSIXy handling on Cygwin.

> > Please note that I defined
> > __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is
> > missing since 1.7.2.
> 
> Hmmm...maybe I should (re)build libiconv against a snapshot?

I think that should be safe.  There's most likely no new API which would
accidentally be pulled in by libiconv(*).


Corinna


(*) Famous last words...

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 15:04   ` Corinna Vinschen
@ 2011-01-25 18:58     ` Charles Wilson
  2011-01-25 20:11       ` Corinna Vinschen
  2011-01-27  5:46     ` Charles Wilson
  1 sibling, 1 reply; 33+ messages in thread
From: Charles Wilson @ 2011-01-25 18:58 UTC (permalink / raw)
  To: cygwin

On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> On Jan 24 22:09, Charles Wilson wrote:
>> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
>>> So, AFAICS, there are two problems:
>>>
>>>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>>>     input string, the conversion still depends on the current application
>>>     codeset.  That dsoesn't make sense.
>>>
>>>   - Even though the last parameter to iconv is defined in bytes, the
>>>     value of outbytesleft after the conversion is the number of remaining
>>>     wchar"t's, not the number of remaining bytes.  That's contrary to what
>>>     POSIX defines, see
>>>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>>
>>> Is this analyzes correct?  Is there by any chance a newer version of
>>> libiconv2 which does not have these problems?
>>
>> Well, iconv's behavior is very dependent on detailed characteristics of
>> the system on which it was compiled -- e.g. it's very finicky about the
>> platform's behavior vis character sets.
> 
> Ok, but that doesn't mean it has to stumble over its own feet if the
> current locale's codeset is different from the codeset which has to
> be converted.

True, of course. I was just thinking that *maybe* just recompiling
libiconv now that cygwin's i18n stuff has become more stable...might help.

> I found that gencat uses the return value of the nl_langinfo call
> after it called setlocale, like this:
> 
>   setlocale (LC_ALL, "");
>   codeset = nl_langinfo (CODESET);
>   setlocale (LC_ALL, "C");
>   [...]
> 
> This is plain wrong.  See
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html
> 
>   "Calls to setlocale() with a category corresponding to the category of
>    item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
>    array pointed to by the return value."
> 
> That's what happens in newlib, but not in glibc.  Maybe that's
> libiconv's problem as well?

Hmm. I'll look for that.

> I also found that
> 
>   iconv_close ((iconv_t) -1);
> 
> crashes the application with a SEGV.  It's clearly the fault of the
> application, but it doesn't deserve a SEGV, imho.

Yeah, that's bad.

> FYI, I examined the libiconv sources cursorily, and I found a couple of
> code snippets with Cygwin-specific code which is rather questionable.
> 
> - Why on earth is libiconv on Cygwin using Windows functions in some
>   places?
> 
>   - libcharset/lib/relocatable.c
>   - srclib/progreloc.c
>   - srclib/relocatable.c
>   - lib/relocatable.c

whoo boy. That's...a long story. It's all part of Bruno's magic
relocatability machinery.  However, on cygwin it should be using unixish
mechanisms (at least for exe's -- looking at /proc/$pid/exe.  For
DLLs...I think it needs to keep using the DllMain approach).

> - libcharset/lib/relocatable.c and srclib/relocatable.c define their own
>   DllMain and use Windows functions.  And the old
>   cygwin_conv_to_posix_path function as well.

Well, yes.  It's how DLLs determine their installation path, so they can
then automatically deduce the relative path to <whatever>.  And since
that requires using a win32 function (GetModuleFileName) it needs to
convert to cygwin format.  These days it ought to use the new
functions...I'll prepare a gnulib patch, and from there it will work its
way down into libiconv/gettext.  I'm not sure if the gnulib guys want to
preserve compat with 1.5 (e.g. check for cygwin_conv_path() and only use
if present, otherwise use deprecated?) or not.

> - The usage of a fixed table instaed of the charset.alias file in
>   libcharset/lib/localcharset.c, function get_charset_aliases() is
>   not good, not good at all.

Yeah, you're right.  It looks like there's been some bitrot with respect
to some of the "&& !CYGWIN" guards on WIN32.  Both libiconv and gettext,
IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
"regular" win32 and for cygwin...which means defined(CYGWIN) guards are
necessary.

> - Same file, function locale_charset() contains old Cygwin-specific
>   code which is outdated.  AFAICS it shouldn't hurt, though, since
>   Cygwin no longer returns "US-ASCII".
> 
> - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
>   ei_ucs2internal encoding table.  I'm not sure if that's right or
>   wrong, but it looks worrying.

Well, remember (A) upstream libiconv itself hasn't been updated since
30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our
most recent version (1.13.1-1) was released almost simultaneously (23
Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1
thru 1.7.5.

> Please note that I defined
> __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is
> missing since 1.7.2.

Hmmm...maybe I should (re)build libiconv against a snapshot?

I don't routinely use extended character sets, and have to rely on the
test suites.  They passed, so...I thought good enough.  Perhaps not...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25 11:15 ` Charles Wilson
@ 2011-01-25 15:04   ` Corinna Vinschen
  2011-01-25 18:58     ` Charles Wilson
  2011-01-27  5:46     ` Charles Wilson
  2011-01-27  3:53   ` Charles Wilson
  1 sibling, 2 replies; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-25 15:04 UTC (permalink / raw)
  To: cygwin

On Jan 24 22:09, Charles Wilson wrote:
> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
> > Here's what happens on Cygwin:
> > 
> >   $ gcc -g -o ic ic.c -liconv
> >   $ ./ic
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
> >   in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
> 
> Confirmed.
> 
> > So, AFAICS, there are two problems:
> > 
> >   - Even though iconv_open has been opened explicitely with "UTF-8" as
> >     input string, the conversion still depends on the current application
> >     codeset.  That dsoesn't make sense.
> > 
> >   - Even though the last parameter to iconv is defined in bytes, the
> >     value of outbytesleft after the conversion is the number of remaining
> >     wchar"t's, not the number of remaining bytes.  That's contrary to what
> >     POSIX defines, see
> >     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> > 
> > Is this analyzes correct?  Is there by any chance a newer version of
> > libiconv2 which does not have these problems?
> 
> Well, iconv's behavior is very dependent on detailed characteristics of
> the system on which it was compiled -- e.g. it's very finicky about the
> platform's behavior vis character sets.

Ok, but that doesn't mean it has to stumble over its own feet if the
current locale's codeset is different from the codeset which has to
be converted.

I found that gencat uses the return value of the nl_langinfo call
after it called setlocale, like this:

  setlocale (LC_ALL, "");
  codeset = nl_langinfo (CODESET);
  setlocale (LC_ALL, "C");
  [...]

This is plain wrong.  See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html

  "Calls to setlocale() with a category corresponding to the category of
   item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
   array pointed to by the return value."

That's what happens in newlib, but not in glibc.  Maybe that's
libiconv's problem as well?

I also found that

  iconv_close ((iconv_t) -1);

crashes the application with a SEGV.  It's clearly the fault of the
application, but it doesn't deserve a SEGV, imho.

FYI, I examined the libiconv sources cursorily, and I found a couple of
code snippets with Cygwin-specific code which is rather questionable.

- Why on earth is libiconv on Cygwin using Windows functions in some
  places?

  - libcharset/lib/relocatable.c
  - srclib/progreloc.c
  - srclib/relocatable.c
  - lib/relocatable.c

- libcharset/lib/relocatable.c and srclib/relocatable.c define their own
  DllMain and use Windows functions.  And the old
  cygwin_conv_to_posix_path function as well.


- The usage of a fixed table instaed of the charset.alias file in
  libcharset/lib/localcharset.c, function get_charset_aliases() is
  not good, not good at all.

- Same file, function locale_charset() contains old Cygwin-specific
  code which is outdated.  AFAICS it shouldn't hurt, though, since
  Cygwin no longer returns "US-ASCII".

- lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
  ei_ucs2internal encoding table.  I'm not sure if that's right or
  wrong, but it looks worrying.  Please note that I defined
  __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is missing
  since 1.7.2.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Bug in libiconv?
  2011-01-25  6:36 Corinna Vinschen
@ 2011-01-25 11:15 ` Charles Wilson
  2011-01-25 15:04   ` Corinna Vinschen
  2011-01-27  3:53   ` Charles Wilson
  0 siblings, 2 replies; 33+ messages in thread
From: Charles Wilson @ 2011-01-25 11:15 UTC (permalink / raw)
  To: cygwin

On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
> Here's what happens on Cygwin:
> 
>   $ gcc -g -o ic ic.c -liconv
>   $ ./ic
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
>   iconv: 138 <Invalid or incomplete multibyte or wide character>
>   in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
>   in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

Confirmed.

> So, AFAICS, there are two problems:
> 
>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>     input string, the conversion still depends on the current application
>     codeset.  That dsoesn't make sense.
> 
>   - Even though the last parameter to iconv is defined in bytes, the
>     value of outbytesleft after the conversion is the number of remaining
>     wchar"t's, not the number of remaining bytes.  That's contrary to what
>     POSIX defines, see
>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> 
> Is this analyzes correct?  Is there by any chance a newer version of
> libiconv2 which does not have these problems?

Well, iconv's behavior is very dependent on detailed characteristics of
the system on which it was compiled -- e.g. it's very finicky about the
platform's behavior vis character sets.

Now, cygwin's libiconv-1.13.1 was built a LONG time ago (2009 Dec 23),
and many things have changed in cygwin itself since then (e.g.
cygwin-1.7.1-1 was current at that time).

Now, since there has not yet been an updated upstream release of
libiconv, my first step would be to simply rebuild our existing
libiconv-1.13.1 on a platform with current cygwin (1.7.7-1), and try the
test case again.

If that doesn't correct the issue...then I'd try to run your test case
on linux, but *explicitly* using libiconv on that system, rather than
(as is typically the case on linux) relying on the underlying glibc
implementation of iconv functionality.  If the test case fails there,
then we've got a presumption that the problem is in the (generic,
cross-platform bits of) libiconv library itself.  Then, it's debugging
time... :-(

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Bug in libiconv?
@ 2011-01-25  6:36 Corinna Vinschen
  2011-01-25 11:15 ` Charles Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Corinna Vinschen @ 2011-01-25  6:36 UTC (permalink / raw)
  To: cygwin

Hi Chuck,
hi everyone else,


In a twisted turn of events, I'm trying to get the orphaned catgets
package to work correctly on Cygwin 1.7.  As you might know, the package
is derived from the glibc package.  Apart from other portability issues
of this *very* glibc-centric piece of code, I found some problem which
appears to point to two bugs in Cygwin's libiconv2.

For some reason, the iconv conversion seems to be overly dependent on
the usage of setlocale, and the returned value in the fourth parameter
appears to be incorrect, if the output codeset is "WCHAR_T".

Here's a simple testcase:

==== SNIP ====
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <iconv.h>
#include <locale.h>
#include <wchar.h>

iconv_t
open_iconv ()
{
  iconv_t cd_towcp = iconv_open ("WCHAR_T", "UTF-8");
  if (cd_towcp == (iconv_t) -1)
    {
      fprintf (stderr, "iconv_open: %d <%s>\n", errno, strerror (errno));
      exit (1);
    }
  return cd_towcp;
}

void
run_iconv (iconv_t cd_towcp, char *input)
{
  wchar_t out[256];

  char *inbuf = input;
  size_t inbytesleft = strlen (inbuf);
  char *outbuf = (char *) out;
  size_t outbytesleft = sizeof (out);
  size_t ret = iconv (cd_towcp, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
  if (ret == (size_t) -1)
    fprintf (stderr, "iconv: %d <%s>\n", errno, strerror (errno));
  printf ("in = <%s>, inbuf = <%s>, inbytesleft = %zd, outbytesleft = %zd\n",
	  input, inbuf, inbytesleft, outbytesleft);
}

int
main ()
{
  iconv_t cd_towcp;
  char *finnish = "Liian pitk\303\244 sana";  // Umlaut-a
  
  setlocale (LC_ALL, "C");
  cd_towcp = open_iconv ();
  setlocale (LC_ALL, "C");
  run_iconv (cd_towcp, finnish);
  setlocale (LC_ALL, "C.UTF-8");
  run_iconv (cd_towcp, finnish);
  iconv_close (cd_towcp);
  
  setlocale (LC_ALL, "C.UTF-8");
  cd_towcp = open_iconv ();
  setlocale (LC_ALL, "C");
  run_iconv (cd_towcp, finnish);
  setlocale (LC_ALL, "C.UTF-8");
  run_iconv (cd_towcp, finnish);
  iconv_close (cd_towcp);

  return 0;
}
==== SNAP ====

Here are the important details:

- The input string is a fixed finnish UTF-8 sentence containing a
  single non-ASCII char.

- The testcase always calls setlocale before calling iconv_open(),
  then subsequently it sets setlocale before calling iconv().

- So the application tests to convert a UTF-8 to WCHAR_T string in four
  combinations of the current locale, in this order:

  - iconv_open "C",       iconv "C"
  - iconv_open "C",       iconv "C.UTF-8"
  - iconv_open "C.UTF-8", iconv "C"
  - iconv_open "C.UTF-8", iconv "C.UTF-8"

Here's what happens in Linux:

  $ gcc -g -o ic ic.c
  $ ./ic
  in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
  in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
  in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960
  in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 960

Here's what happens on Cygwin:

  $ gcc -g -o ic ic.c -liconv
  $ ./ic
  iconv: 138 <Invalid or incomplete multibyte or wide character>
  in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
  iconv: 138 <Invalid or incomplete multibyte or wide character>
  in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
  iconv: 138 <Invalid or incomplete multibyte or wide character>
  in = <Liian pitkÃ¤ sana>, inbuf = <Ã¤ sana>, inbytesleft = 7, outbytesleft = 492
  in = <Liian pitkÃ¤ sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480

So, AFAICS, there are two problems:

  - Even though iconv_open has been opened explicitely with "UTF-8" as
    input string, the conversion still depends on the current application
    codeset.  That dsoesn't make sense.

  - Even though the last parameter to iconv is defined in bytes, the
    value of outbytesleft after the conversion is the number of remaining
    wchar"t's, not the number of remaining bytes.  That's contrary to what
    POSIX defines, see
    http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html

Is this analyzes correct?  Is there by any chance a newer version of
libiconv2 which does not have these problems?


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-02-02 22:57 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-02 18:58 Bug in libiconv? Bruno Haible
2011-02-02 21:20 ` Corinna Vinschen
2011-02-02 21:35   ` bug#7971: Acknowledgement (Bug in libiconv?) GNU bug Tracking System
2011-02-02 22:57   ` Bug in libiconv? Charles Wilson
  -- strict thread matches above, loose matches on Subject: below --
2011-01-29  2:15 Bruno Haible
2011-01-29 12:34 ` Charles Wilson
2011-01-29 13:20 ` Charles Wilson
2011-01-29 17:15   ` Corinna Vinschen
2011-01-29 16:02 ` Corinna Vinschen
2011-01-29 17:51   ` Eric Blake
2011-01-29 18:12     ` Corinna Vinschen
2011-01-29 18:28       ` Eric Blake
2011-01-30 11:34         ` Corinna Vinschen
2011-01-30 11:43           ` Corinna Vinschen
2011-01-30  2:40     ` Corinna Vinschen
2011-01-27 16:06 simrw
2011-01-26 13:39 simrw
2011-01-26 13:50 ` Corinna Vinschen
2011-01-26 17:01   ` Charles Wilson
2011-01-26 22:39     ` Corinna Vinschen
2011-01-25  6:36 Corinna Vinschen
2011-01-25 11:15 ` Charles Wilson
2011-01-25 15:04   ` Corinna Vinschen
2011-01-25 18:58     ` Charles Wilson
2011-01-25 20:11       ` Corinna Vinschen
2011-01-28 22:13         ` Charles Wilson
2011-01-27  5:46     ` Charles Wilson
2011-01-27 16:05       ` Corinna Vinschen
2011-01-27 17:18         ` Charles Wilson
2011-01-27  3:53   ` Charles Wilson
2011-01-27 16:21     ` Corinna Vinschen
2011-01-27 17:39       ` Charles Wilson
2011-01-27 18:05         ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).