public inbox for cygwin-cvs@sourceware.org
help / color / mirror / Atom feed
* [newlib-cygwin/cygwin-3_3-branch] Cygwin: console: Handle Unicode surrogate pairs.
@ 2021-11-16 14:22 Takashi Yano
0 siblings, 0 replies; only message in thread
From: Takashi Yano @ 2021-11-16 14:22 UTC (permalink / raw)
To: cygwin-cvs
https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=e6ed90c8f79b2eff732ecace02b0bc5440f18b29
commit e6ed90c8f79b2eff732ecace02b0bc5440f18b29
Author: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Tue Nov 16 11:26:10 2021 +0100
Cygwin: console: Handle Unicode surrogate pairs.
When running Cygwin's Bash in the Windows Terminal (see
https://docs.microsoft.com/en-us/windows/terminal/ for details), Cygwin
is receiving keyboard input in the form of UTF-16 characters.
UTF-16 has that awkward challenge that it cannot map the full Unicode
range, and to make up for it, there are the ranges U+D800-U+DBFF and
U+DC00-U+DFFF which are illegal except when they come in a pair encoding
for Unicode characters beyond U+FFFF.
Cygwin does not handle such surrogate pairs correctly at the moment, as
can be seen e.g. when running Cygwin's Bash in the Windows Terminal and
then inserting an emoji (e.g. via Windows + <dot>, which opens an emoji
picker on recent Windows versions): Instead of showing an emoji, this
shows the infamous question mark in a black triangle, i.e. the invalid
Unicode character.
Let's special-case surrogate pairs in this scenario.
This fixes https://github.com/git-for-windows/git/issues/3281
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Diff:
---
winsup/cygwin/fhandler_console.cc | 17 ++++++++++++++++-
winsup/cygwin/release/3.3.3 | 6 ++++++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc
index 2e754a132..a7e9723bd 100644
--- a/winsup/cygwin/fhandler_console.cc
+++ b/winsup/cygwin/fhandler_console.cc
@@ -919,7 +919,22 @@ fhandler_console::process_input_message (void)
}
else
{
- nread = con.con_to_str (tmp + 1, 59, unicode_char);
+ WCHAR second = unicode_char >= 0xd800 && unicode_char <= 0xdbff
+ && i + 1 < total_read ?
+ input_rec[i + 1].Event.KeyEvent.uChar.UnicodeChar : 0;
+
+ if (second < 0xdc00 || second > 0xdfff)
+ {
+ nread = con.con_to_str (tmp + 1, 59, unicode_char);
+ }
+ else
+ {
+ /* handle surrogate pairs */
+ WCHAR pair[2] = { unicode_char, second };
+ nread = sys_wcstombs (tmp + 1, 59, pair, 2);
+ i++;
+ }
+
/* Determine if the keystroke is modified by META. The tricky
part is to distinguish whether the right Alt key should be
recognized as Alt, or as AltGr. */
diff --git a/winsup/cygwin/release/3.3.3 b/winsup/cygwin/release/3.3.3
index 1eb25e2fc..c1e8cefbd 100644
--- a/winsup/cygwin/release/3.3.3
+++ b/winsup/cygwin/release/3.3.3
@@ -16,3 +16,9 @@ Bug Fixes
- Fix long-standing problem that new files don't get created with the
FILE_ATTRIBUTE_ARCHIVE DOS attribute set.
Addresses: https://cygwin.com/pipermail/cygwin/2021-November/249909.html
+
+- Handle Unicode surrogate pairs in console. Cygwin console does not
+ handle surrogate pairs correctly at the moment. Fix issue that
+ running bash in Windows Terminal and inserting an emoji does not
+ work as expected.
+ Addresses: https://github.com/git-for-windows/git/issues/3281
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-11-16 14:22 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 14:22 [newlib-cygwin/cygwin-3_3-branch] Cygwin: console: Handle Unicode surrogate pairs Takashi Yano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).