public inbox for cygwin-patches@cygwin.com
 help / color / mirror / Atom feed
From: Johannes Schindelin <johannes.schindelin@gmx.de>
To: cygwin-patches@cygwin.com
Subject: [PATCH] console: handle Unicode surrogate pairs
Date: Tue, 16 Nov 2021 11:26:10 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2111161125300.21127@tvgsbejvaqbjf.bet> (raw)

When running Cygwin's Bash in the Windows Terminal (see
https://docs.microsoft.com/en-us/windows/terminal/ for details), Cygwin
is receiving keyboard input in the form of UTF-16 characters.

UTF-16 has that awkward challenge that it cannot map the full Unicode
range, and to make up for it, there are the ranges U+D800-U+DBFF and
U+DC00-U+DFFF which are illegal except when they come in a pair encoding
for Unicode characters beyond U+FFFF.

Cygwin does not handle such surrogate pairs correctly at the moment, as
can be seen e.g. when running Cygwin's Bash in the Windows Terminal and
then inserting an emoji (e.g. via Windows + <dot>, which opens an emoji
picker on recent Windows versions): Instead of showing an emoji, this
shows the infamous question mark in a black triangle, i.e. the invalid
Unicode character.

Let's special-case surrogate pairs in this scenario.

This fixes https://github.com/git-for-windows/git/issues/3281

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	This applies without merge conflict all the way back to
	cygwin_2_7_0-release.

 winsup/cygwin/fhandler_console.cc | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc
index 3e17fd9a41..d11f4a4770 100644
--- a/winsup/cygwin/fhandler_console.cc
+++ b/winsup/cygwin/fhandler_console.cc
@@ -453,7 +453,22 @@ fhandler_console::read (void *pv, size_t& buflen)
 	    }
 	  else
 	    {
-	      nread = con.con_to_str (tmp + 1, 59, unicode_char);
+	      WCHAR second = unicode_char >= 0xd800 && unicode_char <= 0xdbff
+		  && i + 1 < total_read ?
+		  input_rec[i + 1].Event.KeyEvent.uChar.UnicodeChar : 0;
+
+	      if (second < 0xdc00 || second > 0xdfff)
+		{
+		  nread = con.con_to_str (tmp + 1, 59, unicode_char);
+		}
+	      else
+		{
+		  /* handle surrogate pairs */
+		  WCHAR pair[2] = { unicode_char, second };
+		  nread = sys_wcstombs (tmp + 1, 59, pair, 2);
+		  i++;
+		}
+
 	      /* Determine if the keystroke is modified by META.  The tricky
 		 part is to distinguish whether the right Alt key should be
 		 recognized as Alt, or as AltGr. */
--
2.34.0.rc2.windows.1


             reply	other threads:[~2021-11-16 10:26 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 10:26 Johannes Schindelin [this message]
2021-11-16 13:19 ` Takashi Yano
2021-11-16 14:00   ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2111161125300.21127@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=cygwin-patches@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).