From: Takashi Yano <takashi.yano@nifty.ne.jp>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().
Date: Sat, 12 Sep 2020 02:38:43 +0900 [thread overview]
Message-ID: <20200912023843.58ef0f3134d6aea5359c27c0@nifty.ne.jp> (raw)
In-Reply-To: <20200912010504.586a156f1712f61c3c696d40@nifty.ne.jp>
[-- Attachment #1: Type: text/plain, Size: 4436 bytes --]
On Sat, 12 Sep 2020 01:05:04 +0900
Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> On Fri, 11 Sep 2020 16:06:01 +0200
> Corinna Vinschen wrote:
> > On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > > Hi Corinna,
> > >
> > > On Fri, 11 Sep 2020 14:08:40 +0200
> > > Corinna Vinschen wrote:
> > > > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > > > > for the case that the multibyte char is splitted in the middle.
> > > > > The reason is as follows.
> > > > > * ISO-2022 is too complicated to handle correctly.
> > > > > * Not sure what to do with ISCII.
> > > > > ---
> > > > > winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > > > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > > > index 37d033bbe..ee5c6a90a 100644
> > > > > --- a/winsup/cygwin/fhandler_tty.cc
> > > > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > > > > return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > > > > }
> > > > >
> > > > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > > > +
> > > > > static void
> > > > > convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > > UINT cp_from, const char *ptr_from, size_t len_from,
> > > > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > > tmp_pathbuf tp;
> > > > > wchar_t *wbuf = tp.w_get ();
> > > > > int wlen = 0;
> > > > > - if (cp_from == CP_UTF7)
> > > > > - /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > + if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > + /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > + - ISO-2022 is too complicated to handle correctly.
> > > > > + - FIXME: Not sure what to do for ISCII.
> > > > > Therefore, just convert string without checking */
> > > > > wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > > wbuf, NT_MAX_PATH);
> > > > > --
> > > > > 2.28.0
> > > >
> > > > I'd prefer to not handle them at all. We just don't support these
> > > > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > > > compatible. Let's please just drop any handling for these weird
> > > > or outdated codepages.
> > >
> > > What do you mean by "just drop any handling"?
> > >
> > > Do you mean remove following if block?
> > > > > + if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > + /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > + - ISO-2022 is too complicated to handle correctly.
> > > > > + - FIXME: Not sure what to do for ISCII.
> > > > > Therefore, just convert string without checking */
> > > > > wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > > wbuf, NT_MAX_PATH);
> > > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > > not be done correctly.
> > >
> > > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > > or ISCII? What should we do for UTF-7?
> >
> > Nothing, just like for any other of these weird charsets. Cygwin never
> > supported any charset which wasn't at least ASCII compatible in the
> > 0 <= x <= 127 range. Just ignore them and the possibility that a
> > user chooses them for fun.
> >
> > > What should happen if user or apps chage codepage to one of them?
> >
> > Garbage output, I guess. We shouldn't really care.
>
> Do you mean a patch attached?
>
> Please try:
> (1) Open mintty with "env CYGWIN=disable_pcon mintty".
> (2) Start cmd.exe in that mintty.
> (3) Try chcp such as
> 37 (EBCDIC),
> 65000 (UTF-7),
> 50220 (ISO-2022),
> and 57002 (ISCII).
> (4) Execute dir or some other commands in cmd.exe.
>
> For 65000, 50220 adn 57002, even the prompt will be broken.
> Are the results as you expected?
>
> If pseudo console is enabled, all the above are work without
> problem. With the previous patch, the results was sane even
> if pseudo console is disabled.
How about the patch attached?
I think this is safer than previous patch.
--
Takashi Yano <takashi.yano@nifty.ne.jp>
[-- Attachment #2: 0001-Cygwin-pty-Skip-multibyte-char-boundary-check-condit.patch --]
[-- Type: application/octet-stream, Size: 1770 bytes --]
From be6d20abf3027ccf24c549c58a7e1d05c1ea4dbd Mon Sep 17 00:00:00 2001
From: Takashi Yano <takashi.yano@nifty.ne.jp>
Date: Sat, 12 Sep 2020 02:29:21 +0900
Subject: [PATCH] Cygwin: pty: Skip multibyte char boundary check
conditionally.
- For charset in which MB_ERR_INVALID_CHARS does not work properly,
skip multibyte char boundary check in convert_mb_str().
---
winsup/cygwin/fhandler_tty.cc | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 95b28c3da..50c36f645 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -122,13 +122,15 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
UINT cp_from, const char *ptr_from, size_t len_from,
mbstate_t *mbp)
{
- size_t nlen;
+ bool check_valid = false;
+ if (MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS, "A", 1, NULL, 0))
+ check_valid = true;
tmp_pathbuf tp;
wchar_t *wbuf = tp.w_get ();
int wlen = 0;
- if (cp_from == CP_UTF7)
- /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
- Therefore, just convert string without checking */
+ if (!check_valid)
+ /* If MB_ERR_INVALID_CHARS does not work properly,
+ just convert string without checking */
wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
wbuf, NT_MAX_PATH);
else
@@ -171,9 +173,8 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
/* Retry conversion with extended length */
}
}
- nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
- ptr_to, *len_to, NULL, NULL);
- *len_to = nlen;
+ *len_to = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
+ ptr_to, *len_to, NULL, NULL);
}
static bool
--
2.28.0
next prev parent reply other threads:[~2020-09-11 17:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-11 10:54 Takashi Yano
2020-09-11 12:08 ` Corinna Vinschen
2020-09-11 12:35 ` Takashi Yano
2020-09-11 14:06 ` Corinna Vinschen
2020-09-11 15:10 ` Thomas Wolff
2020-09-11 15:18 ` Thomas Wolff
2020-09-11 16:05 ` Takashi Yano
2020-09-11 17:38 ` Takashi Yano [this message]
2020-09-11 18:37 ` Takashi Yano
2020-09-11 18:57 ` Corinna Vinschen
2020-09-11 19:11 ` Takashi Yano
2020-10-13 11:44 ` Corinna Vinschen
2020-09-11 18:14 ` Corinna Vinschen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200912023843.58ef0f3134d6aea5359c27c0@nifty.ne.jp \
--to=takashi.yano@nifty.ne.jp \
--cc=cygwin-patches@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).