public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
From: Thomas Wolff <towo@towo.net>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().
Date: Fri, 11 Sep 2020 17:10:13 +0200	[thread overview]
Message-ID: <bfc29d4a-f65d-435b-ca6b-52472a9d5a02@towo.net> (raw)
In-Reply-To: <20200911140601.GK4127@calimero.vinschen.de>

Am 11.09.2020 um 16:06 schrieb Corinna Vinschen:
> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>>
>> On Fri, 11 Sep 2020 14:08:40 +0200
>> Corinna Vinschen wrote:
>>> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
>>>> - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
>>>>    for the case that the multibyte char is splitted in the middle.
>>>>    The reason is as follows.
>>>>    * ISO-2022 is too complicated to handle correctly.
>>>>    * Not sure what to do with ISCII.
>>>> ---
>>>>   winsup/cygwin/fhandler_tty.cc | 9 +++++++--
>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
>>>> index 37d033bbe..ee5c6a90a 100644
>>>> --- a/winsup/cygwin/fhandler_tty.cc
>>>> +++ b/winsup/cygwin/fhandler_tty.cc
>>>> @@ -117,6 +117,9 @@ CreateProcessW_Hooked
>>>>     return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>>>>   }
>>>>   
>>>> +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
>>>> +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
>>>> +
>>>>   static void
>>>>   convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>   		UINT cp_from, const char *ptr_from, size_t len_from,
>>>> @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>     tmp_pathbuf tp;
>>>>     wchar_t *wbuf = tp.w_get ();
>>>>     int wlen = 0;
>>>> -  if (cp_from == CP_UTF7)
>>>> -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>          Therefore, just convert string without checking */
>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>   				wbuf, NT_MAX_PATH);
>>>> -- 
>>>> 2.28.0
>>> I'd prefer to not handle them at all.  We just don't support these
>>> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
>>> compatible.  Let's please just drop any handling for these weird
>>> or outdated codepages.
>> What do you mean by "just drop any handling"?
>>
>> Do you mean remove following if block?
>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>          Therefore, just convert string without checking */
>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>   				wbuf, NT_MAX_PATH);
>> In this case, the conversion for ISO-2022, ISCII and UTF-7 will
>> not be done correctly.
>>
>> Or skip charset conversion if the codepage is EBCDIC, ISO-2022
>> or ISCII? What should we do for UTF-7?
> Nothing, just like for any other of these weird charsets.  Cygwin never
> supported any charset which wasn't at least ASCII compatible in the
> 0 <= x <= 127 range.
Actually, in Shift-JIS (CP932, supported via locale ja_JP.sjis), 0x5C is 
¥ :/
>    Just ignore them and the possibility that a
> user chooses them for fun.
>
>> What should happen if user or apps chage codepage to one of them?
> Garbage output, I guess.  We shouldn't really care.
>
>
> Corinna


  reply	other threads:[~2020-09-11 15:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11 10:54 Takashi Yano
2020-09-11 12:08 ` Corinna Vinschen
2020-09-11 12:35   ` Takashi Yano
2020-09-11 14:06     ` Corinna Vinschen
2020-09-11 15:10       ` Thomas Wolff [this message]
2020-09-11 15:18         ` Thomas Wolff
2020-09-11 16:05       ` Takashi Yano
2020-09-11 17:38         ` Takashi Yano
2020-09-11 18:37           ` Takashi Yano
2020-09-11 18:57             ` Corinna Vinschen
2020-09-11 19:11               ` Takashi Yano
2020-10-13 11:44                 ` Corinna Vinschen
2020-09-11 18:14         ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bfc29d4a-f65d-435b-ca6b-52472a9d5a02@towo.net \
    --to=towo@towo.net \
    --cc=cygwin-patches@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).