* [PATCH] console: handle Unicode surrogate pairs
@ 2021-11-16 10:26 Johannes Schindelin
2021-11-16 13:19 ` Takashi Yano
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2021-11-16 10:26 UTC (permalink / raw)
To: cygwin-patches
When running Cygwin's Bash in the Windows Terminal (see
https://docs.microsoft.com/en-us/windows/terminal/ for details), Cygwin
is receiving keyboard input in the form of UTF-16 characters.
UTF-16 has that awkward challenge that it cannot map the full Unicode
range, and to make up for it, there are the ranges U+D800-U+DBFF and
U+DC00-U+DFFF which are illegal except when they come in a pair encoding
for Unicode characters beyond U+FFFF.
Cygwin does not handle such surrogate pairs correctly at the moment, as
can be seen e.g. when running Cygwin's Bash in the Windows Terminal and
then inserting an emoji (e.g. via Windows + <dot>, which opens an emoji
picker on recent Windows versions): Instead of showing an emoji, this
shows the infamous question mark in a black triangle, i.e. the invalid
Unicode character.
Let's special-case surrogate pairs in this scenario.
This fixes https://github.com/git-for-windows/git/issues/3281
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
This applies without merge conflict all the way back to
cygwin_2_7_0-release.
winsup/cygwin/fhandler_console.cc | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc
index 3e17fd9a41..d11f4a4770 100644
--- a/winsup/cygwin/fhandler_console.cc
+++ b/winsup/cygwin/fhandler_console.cc
@@ -453,7 +453,22 @@ fhandler_console::read (void *pv, size_t& buflen)
}
else
{
- nread = con.con_to_str (tmp + 1, 59, unicode_char);
+ WCHAR second = unicode_char >= 0xd800 && unicode_char <= 0xdbff
+ && i + 1 < total_read ?
+ input_rec[i + 1].Event.KeyEvent.uChar.UnicodeChar : 0;
+
+ if (second < 0xdc00 || second > 0xdfff)
+ {
+ nread = con.con_to_str (tmp + 1, 59, unicode_char);
+ }
+ else
+ {
+ /* handle surrogate pairs */
+ WCHAR pair[2] = { unicode_char, second };
+ nread = sys_wcstombs (tmp + 1, 59, pair, 2);
+ i++;
+ }
+
/* Determine if the keystroke is modified by META. The tricky
part is to distinguish whether the right Alt key should be
recognized as Alt, or as AltGr. */
--
2.34.0.rc2.windows.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] console: handle Unicode surrogate pairs
2021-11-16 10:26 [PATCH] console: handle Unicode surrogate pairs Johannes Schindelin
@ 2021-11-16 13:19 ` Takashi Yano
2021-11-16 14:00 ` Corinna Vinschen
0 siblings, 1 reply; 3+ messages in thread
From: Takashi Yano @ 2021-11-16 13:19 UTC (permalink / raw)
To: cygwin-patches
On Tue, 16 Nov 2021 11:26:10 +0100 (CET)
Johannes Schindelin wrote:
> When running Cygwin's Bash in the Windows Terminal (see
> https://docs.microsoft.com/en-us/windows/terminal/ for details), Cygwin
> is receiving keyboard input in the form of UTF-16 characters.
>
> UTF-16 has that awkward challenge that it cannot map the full Unicode
> range, and to make up for it, there are the ranges U+D800-U+DBFF and
> U+DC00-U+DFFF which are illegal except when they come in a pair encoding
> for Unicode characters beyond U+FFFF.
>
> Cygwin does not handle such surrogate pairs correctly at the moment, as
> can be seen e.g. when running Cygwin's Bash in the Windows Terminal and
> then inserting an emoji (e.g. via Windows + <dot>, which opens an emoji
> picker on recent Windows versions): Instead of showing an emoji, this
> shows the infamous question mark in a black triangle, i.e. the invalid
> Unicode character.
>
> Let's special-case surrogate pairs in this scenario.
>
> This fixes https://github.com/git-for-windows/git/issues/3281
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> This applies without merge conflict all the way back to
> cygwin_2_7_0-release.
>
> winsup/cygwin/fhandler_console.cc | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/winsup/cygwin/fhandler_console.cc b/winsup/cygwin/fhandler_console.cc
> index 3e17fd9a41..d11f4a4770 100644
> --- a/winsup/cygwin/fhandler_console.cc
> +++ b/winsup/cygwin/fhandler_console.cc
> @@ -453,7 +453,22 @@ fhandler_console::read (void *pv, size_t& buflen)
> }
> else
> {
> - nread = con.con_to_str (tmp + 1, 59, unicode_char);
> + WCHAR second = unicode_char >= 0xd800 && unicode_char <= 0xdbff
> + && i + 1 < total_read ?
> + input_rec[i + 1].Event.KeyEvent.uChar.UnicodeChar : 0;
> +
> + if (second < 0xdc00 || second > 0xdfff)
> + {
> + nread = con.con_to_str (tmp + 1, 59, unicode_char);
> + }
> + else
> + {
> + /* handle surrogate pairs */
> + WCHAR pair[2] = { unicode_char, second };
> + nread = sys_wcstombs (tmp + 1, 59, pair, 2);
> + i++;
> + }
> +
> /* Determine if the keystroke is modified by META. The tricky
> part is to distinguish whether the right Alt key should be
> recognized as Alt, or as AltGr. */
> --
> 2.34.0.rc2.windows.1
>
Thanks for the patch. LGTM.
I will push the patch to the master branch.
Corinna,
Should we apply this patch also to cygwin-3_3-branch?
Or should only the bug fix be for cygwin-3_3-branch?
--
Takashi Yano <takashi.yano@nifty.ne.jp>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-16 14:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 10:26 [PATCH] console: handle Unicode surrogate pairs Johannes Schindelin
2021-11-16 13:19 ` Takashi Yano
2021-11-16 14:00 ` Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).