public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
From: Takashi Yano <takashi.yano@nifty.ne.jp>
To: cygwin-patches@cygwin.com
Subject: Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().
Date: Sat, 12 Sep 2020 01:05:04 +0900	[thread overview]
Message-ID: <20200912010504.586a156f1712f61c3c696d40@nifty.ne.jp> (raw)
In-Reply-To: <20200911140601.GK4127@calimero.vinschen.de>

[-- Attachment #1: Type: text/plain, Size: 4070 bytes --]

On Fri, 11 Sep 2020 16:06:01 +0200
Corinna Vinschen wrote:
> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> > 
> > On Fri, 11 Sep 2020 14:08:40 +0200
> > Corinna Vinschen wrote:
> > > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > > >   for the case that the multibyte char is splitted in the middle.
> > > >   The reason is as follows.
> > > >   * ISO-2022 is too complicated to handle correctly.
> > > >   * Not sure what to do with ISCII.
> > > > ---
> > > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > > index 37d033bbe..ee5c6a90a 100644
> > > > --- a/winsup/cygwin/fhandler_tty.cc
> > > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > > >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > > >  }
> > > >  
> > > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > > +
> > > >  static void
> > > >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > >  		UINT cp_from, const char *ptr_from, size_t len_from,
> > > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > >    tmp_pathbuf tp;
> > > >    wchar_t *wbuf = tp.w_get ();
> > > >    int wlen = 0;
> > > > -  if (cp_from == CP_UTF7)
> > > > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > +       - FIXME: Not sure what to do for ISCII.
> > > >         Therefore, just convert string without checking */
> > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > >  				wbuf, NT_MAX_PATH);
> > > > -- 
> > > > 2.28.0
> > > 
> > > I'd prefer to not handle them at all.  We just don't support these
> > > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > > compatible.  Let's please just drop any handling for these weird
> > > or outdated codepages.
> > 
> > What do you mean by "just drop any handling"? 
> > 
> > Do you mean remove following if block?
> > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > +       - FIXME: Not sure what to do for ISCII.
> > > >         Therefore, just convert string without checking */
> > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > >  				wbuf, NT_MAX_PATH);
> > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > not be done correctly.
> > 
> > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > or ISCII? What should we do for UTF-7?
> 
> Nothing, just like for any other of these weird charsets.  Cygwin never
> supported any charset which wasn't at least ASCII compatible in the
> 0 <= x <= 127 range.  Just ignore them and the possibility that a
> user chooses them for fun.
> 
> > What should happen if user or apps chage codepage to one of them?
> 
> Garbage output, I guess.  We shouldn't really care.

Do you mean a patch attached?

Please try:
(1) Open mintty with "env CYGWIN=disable_pcon mintty".
(2) Start cmd.exe in that mintty.
(3) Try chcp such as
    37 (EBCDIC),
    65000 (UTF-7),
    50220 (ISO-2022),
    and 57002 (ISCII).
(4) Execute dir or some other commands in cmd.exe.

For 65000, 50220 adn 57002, even the prompt will be broken.
Are the results as you expected?

If pseudo console is enabled, all the above are work without
problem. With the previous patch, the results was sane even
if pseudo console is disabled.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

[-- Attachment #2: 0001-Cygwin-pty-Drop-handling-for-UTF-7-in-convert_mb_str.patch --]
[-- Type: application/octet-stream, Size: 3764 bytes --]

From 7d26f4da2969425f7c3cc8e968a256d388b7e58a Mon Sep 17 00:00:00 2001
From: Takashi Yano <takashi.yano@nifty.ne.jp>
Date: Sat, 12 Sep 2020 00:37:26 +0900
Subject: [PATCH] Cygwin: pty: Drop handling for UTF-7 in convert_mb_str().

- Charset conversion for UTF-7, ISO-2022 and ISCII, which are not
  supported in cygwin, does not work properly as a result. At the
  expense of the above, the code has been simplified a bit.
---
 winsup/cygwin/fhandler_tty.cc | 86 ++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 48 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 95b28c3da..8910af1e7 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -122,58 +122,48 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
 		UINT cp_from, const char *ptr_from, size_t len_from,
 		mbstate_t *mbp)
 {
-  size_t nlen;
   tmp_pathbuf tp;
   wchar_t *wbuf = tp.w_get ();
   int wlen = 0;
-  if (cp_from == CP_UTF7)
-    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
-       Therefore, just convert string without checking */
-    wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
-				wbuf, NT_MAX_PATH);
-  else
-    {
-      char *tmpbuf = tp.c_get ();
-      memcpy (tmpbuf, mbp->__value.__wchb, mbp->__count);
-      if (mbp->__count + len_from > NT_MAX_PATH)
-	len_from = NT_MAX_PATH - mbp->__count;
-      memcpy (tmpbuf + mbp->__count, ptr_from, len_from);
-      int total_len = mbp->__count + len_from;
-      mbp->__count = 0;
-      int mblen = 0;
-      for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
-	/* Max bytes in multibyte char is 4. */
-	for (mblen = 1; mblen <= 4; mblen ++)
-	  {
-	    /* Try conversion */
-	    int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
-					 p, mblen,
-					 wbuf + wlen, NT_MAX_PATH - wlen);
-	    if (l)
-	      { /* Conversion Success */
-		wlen += l;
-		break;
-	      }
-	    else if (mblen == 4)
-	      { /* Conversion Fail */
-		l = MultiByteToWideChar (cp_from, 0, p, 1,
-					 wbuf + wlen, NT_MAX_PATH - wlen);
-		wlen += l;
-		mblen = 1;
-		break;
-	      }
-	    else if (p + mblen == tmpbuf + total_len)
-	      { /* Multibyte char incomplete */
-		memcpy (mbp->__value.__wchb, p, mblen);
-		mbp->__count = mblen;
-		break;
-	      }
-	    /* Retry conversion with extended length */
+  char *tmpbuf = tp.c_get ();
+  memcpy (tmpbuf, mbp->__value.__wchb, mbp->__count);
+  if (mbp->__count + len_from > NT_MAX_PATH)
+    len_from = NT_MAX_PATH - mbp->__count;
+  memcpy (tmpbuf + mbp->__count, ptr_from, len_from);
+  int total_len = mbp->__count + len_from;
+  mbp->__count = 0;
+  int mblen = 0;
+  for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
+    /* Max bytes in multibyte char supported is 4. */
+    for (mblen = 1; mblen <= 4; mblen ++)
+      {
+	/* Try conversion */
+	int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
+				     p, mblen,
+				     wbuf + wlen, NT_MAX_PATH - wlen);
+	if (l)
+	  { /* Conversion Success */
+	    wlen += l;
+	    break;
 	  }
-    }
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
-			      ptr_to, *len_to, NULL, NULL);
-  *len_to = nlen;
+	else if (mblen == 4)
+	  { /* Conversion Fail */
+	    l = MultiByteToWideChar (cp_from, 0, p, 1,
+				     wbuf + wlen, NT_MAX_PATH - wlen);
+	    wlen += l;
+	    mblen = 1;
+	    break;
+	  }
+	else if (p + mblen == tmpbuf + total_len)
+	  { /* Multibyte char incomplete */
+	    memcpy (mbp->__value.__wchb, p, mblen);
+	    mbp->__count = mblen;
+	    break;
+	  }
+	/* Retry conversion with extended length */
+      }
+  *len_to = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
+				 ptr_to, *len_to, NULL, NULL);
 }
 
 static bool
-- 
2.28.0


  parent reply	other threads:[~2020-09-11 16:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11 10:54 Takashi Yano
2020-09-11 12:08 ` Corinna Vinschen
2020-09-11 12:35   ` Takashi Yano
2020-09-11 14:06     ` Corinna Vinschen
2020-09-11 15:10       ` Thomas Wolff
2020-09-11 15:18         ` Thomas Wolff
2020-09-11 16:05       ` Takashi Yano [this message]
2020-09-11 17:38         ` Takashi Yano
2020-09-11 18:37           ` Takashi Yano
2020-09-11 18:57             ` Corinna Vinschen
2020-09-11 19:11               ` Takashi Yano
2020-10-13 11:44                 ` Corinna Vinschen
2020-09-11 18:14         ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200912010504.586a156f1712f61c3c696d40@nifty.ne.jp \
    --to=takashi.yano@nifty.ne.jp \
    --cc=cygwin-patches@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).