public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
@ 2021-01-25 12:46 Ariel Burbaickij
  2021-01-25 13:29 ` Takashi Yano
  2021-01-25 20:50 ` Brian Inglis
  0 siblings, 2 replies; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 12:46 UTC (permalink / raw)
  To: cygwin

Hello Cygwin,
I tried to find some files from the command line prompt which are named
using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
(German) layouts under Windows 10 Enterprise using recent cygwin version
and the outcome is that instead of representing letters I see control
characters of the type: \263\320\321  (Unicode numeric value of the
letters?). Any ideas what happens here and how correct functionality can be
restored?

Kind Regards
Ariel Burbaickij

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 12:46 switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum Ariel Burbaickij
@ 2021-01-25 13:29 ` Takashi Yano
  2021-01-25 14:03   ` Ariel Burbaickij
  2021-01-25 20:50 ` Brian Inglis
  1 sibling, 1 reply; 14+ messages in thread
From: Takashi Yano @ 2021-01-25 13:29 UTC (permalink / raw)
  To: cygwin

On Mon, 25 Jan 2021 13:46:48 +0100
Ariel Burbaickij wrote:
> Hello Cygwin,
> I tried to find some files from the command line prompt which are named
> using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
> (German) layouts under Windows 10 Enterprise using recent cygwin version
> and the outcome is that instead of representing letters I see control
> characters of the type: \263\320\321  (Unicode numeric value of the
> letters?). Any ideas what happens here and how correct functionality can be
> restored?

What does locale command say?

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 13:29 ` Takashi Yano
@ 2021-01-25 14:03   ` Ariel Burbaickij
  2021-01-25 14:40     ` Thomas Wolff
  2021-01-25 20:20     ` L A Walsh
  0 siblings, 2 replies; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 14:03 UTC (permalink / raw)
  To: Takashi Yano; +Cc: cygwin

It says following:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

but why would it matter in the scenario where the user switches the layout
explicitly him-/herself?


Kind Regards
Ariel Burbaickij


On Mon, Jan 25, 2021 at 2:29 PM Takashi Yano <takashi.yano@nifty.ne.jp>
wrote:

> On Mon, 25 Jan 2021 13:46:48 +0100
> Ariel Burbaickij wrote:
> > Hello Cygwin,
> > I tried to find some files from the command line prompt which are named
> > using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
> > (German) layouts under Windows 10 Enterprise using recent cygwin version
> > and the outcome is that instead of representing letters I see control
> > characters of the type: \263\320\321  (Unicode numeric value of the
> > letters?). Any ideas what happens here and how correct functionality can
> be
> > restored?
>
> What does locale command say?
>
> --
> Takashi Yano <takashi.yano@nifty.ne.jp>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 14:03   ` Ariel Burbaickij
@ 2021-01-25 14:40     ` Thomas Wolff
  2021-01-25 21:01       ` Ariel Burbaickij
  2021-01-25 20:20     ` L A Walsh
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Wolff @ 2021-01-25 14:40 UTC (permalink / raw)
  To: cygwin

Am 25.01.2021 um 15:03 schrieb Ariel Burbaickij via Cygwin:
> It says following:
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_ALL=
>
> but why would it matter in the scenario where the user switches the layout
> explicitly him-/herself?
>
>
> Kind Regards
> Ariel Burbaickij
Please answer below the quoted mail in this list.

> On Mon, Jan 25, 2021 at 2:29 PM Takashi Yano <takashi.yano@nifty.ne.jp> wrote:
>
>> On Mon, 25 Jan 2021 13:46:48 +0100
>> Ariel Burbaickij wrote:
>>> Hello Cygwin,
>>> I tried to find some files from the command line prompt which are named
>>> using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
>>> (German) layouts under Windows 10 Enterprise using recent cygwin version
>>> and the outcome is that instead of representing letters I see control
>>> characters of the type: \263\320\321  (Unicode numeric value of the
>>> letters?). Any ideas what happens here and how correct functionality can be restored?
Your information is quite sparse. How do you try to find files? What's 
your command? Which shell do you use? Did it ever work before for you?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 14:03   ` Ariel Burbaickij
  2021-01-25 14:40     ` Thomas Wolff
@ 2021-01-25 20:20     ` L A Walsh
  2021-01-25 20:50       ` Ariel Burbaickij
  1 sibling, 1 reply; 14+ messages in thread
From: L A Walsh @ 2021-01-25 20:20 UTC (permalink / raw)
  To: Ariel Burbaickij; +Cc: Takashi Yano, cygwin

On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> It says following:
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_ALL=
>
> but why would it matter in the scenario where the user switches the layout
> explicitly him-/herself?
>   
----
    Because the OS (the keyboard driver) needs to know what mapping
is used on the keyboard, so that when you press a key,
the keyboard driver sends the keycode with the correct meaning to
programs.

    The keys on your keyboard, _inherently_ have no meaning.  They have
an "assigned" meaning as assigned by the locale settings so they can
send those characters to a program.

    If you create your own layout, you need to create a *custom*
mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
create the mapping or the meanings.

 (what cygwin uses -- cygwin didn't create its own system, it uses
the POSIX standard).
> On Mon, 25 Jan 2021 13:46:48 +0100
> Ariel Burbaickij wrote:
>   
>> Hello Cygwin,
>> I tried to find some files from the command line prompt which are
>> named using various non-Latin (Russian, Hebrew, Arabic) and
>> non-default Latin (German) layouts under Windows 10 Enterprise using
>> recent cygwin version and the outcome is that instead of representing
>> letters I see control characters of the type: \263\320\321  (Unicode
>> numeric value of the letters?). Any ideas what happens here and how
>> correct functionality can be restored?
>>     
---
    Note that the characters you type are 1 thing.  How a program
interprets those characters is by using the "locale" settings.

    The locale is using UTF-8.  So you need to set your terminal
to interpret unicode.  I don't know much about Win10, but in the Microsoft
cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
65001, so in such a terminal you could type:

chcp<Enter>                # this should say something like:
Active code page: 801      # your number may be different

# Remember it to switch back to your initial code page (or just
#  close the cmd window).

To switch to UTF-8, type:

chcp 65001

That will interpret output as UTF-8 in that program.

Note, I'm not sure that will be all of your problems.
"\263" is not valid for the 1st byte of a UTF-8 string. Valid
First bytes of a single UTF-8 char (in hex):
00-7f, c2-cf, d0-df, e0-ef, f0-f4.
So if you see something like 0xb3 in the 1st byte of a unicode
character, you know it can't exist (part of UTF-8's
self-synchronizing feature).

A very useful utility for displaying all unicode characters
and what character sets you have that can display them can be
found at:

https://www.babelstone.co.uk/Software/BabelMap.html

Unzip it into a folder and put a link to it where it is
easy to access.


Hope this helps.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 12:46 switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum Ariel Burbaickij
  2021-01-25 13:29 ` Takashi Yano
@ 2021-01-25 20:50 ` Brian Inglis
  2021-01-25 21:12   ` Ariel Burbaickij
  1 sibling, 1 reply; 14+ messages in thread
From: Brian Inglis @ 2021-01-25 20:50 UTC (permalink / raw)
  To: cygwin

On 2021-01-25 05:46, Ariel Burbaickij via Cygwin wrote:
> I tried to find some files from the command line prompt which are named
> using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
> (German) layouts under Windows 10 Enterprise using recent cygwin version
> and the outcome is that instead of representing letters I see control
> characters of the type: \263\320\321  (Unicode numeric value of the
> letters?). Any ideas what happens here and how correct functionality can be
> restored?

Which command line prompt(s): cmd, mintty, rxvt, xterm, ...?

Where and how did you switch layouts: Windows keyboard mapping, Windows system 
locale, Windows user regional settings, chcp, LANG, LC_CTYPE, LC_ALL, ...?

If you are using a terminal, what are the terminal locale and code page settings?

Maybe you could explicitly show and tell us what characters you used (sending in 
hex please and also in 8bit UTF-8 for maximum readability: that looks like octal 
which went out with ASCII, ISO-646, SBCS code pages), show us how the filenames 
appear including the locales and the shell command lines, and show and tell us 
what you expect, and what is the difference in what you see.

For details on Cygwin file name special character mappings, see:

	https://cygwin.com/cygwin-ug-net/using-specialnames.html

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 20:20     ` L A Walsh
@ 2021-01-25 20:50       ` Ariel Burbaickij
  2021-01-25 22:58         ` L A Walsh
  0 siblings, 1 reply; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 20:50 UTC (permalink / raw)
  To: L A Walsh; +Cc: Takashi Yano, cygwin

Wait a sec, what do you specifically mean with "... Cygwin just uses the
POSIX standard..." -- POSIX standard for what and how does it interfere
with getting the current layout and mapping from OS?
What do you also mean with "... So you need to set your terminal to
interpret unicode..." ? My terminal is Cygwin Terminal here. cmd.exe does
at least handle Russian and German just fine, not so Arabic and Hebrew but
this, I am pretty sure, because of some additional fiddling around
right-to-left writing needed. Notepad++(!) already handles all input types
just fine as do all the other programs tested so far. So, what are these
supposed big OS-side secrets specifically that cygwin cannot get to here?

Best Regards
Ariel Burbaickij


On Mon, Jan 25, 2021 at 9:21 PM L A Walsh <cygwin@tlinx.org> wrote:

> On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> > It says following:
> > LANG=en_US.UTF-8
> > LC_CTYPE="en_US.UTF-8"
> > LC_NUMERIC="en_US.UTF-8"
> > LC_TIME="en_US.UTF-8"
> > LC_COLLATE="en_US.UTF-8"
> > LC_MONETARY="en_US.UTF-8"
> > LC_MESSAGES="en_US.UTF-8"
> > LC_ALL=
> >
> > but why would it matter in the scenario where the user switches the
> layout
> > explicitly him-/herself?
> >
> ----
>     Because the OS (the keyboard driver) needs to know what mapping
> is used on the keyboard, so that when you press a key,
> the keyboard driver sends the keycode with the correct meaning to
> programs.
>
>     The keys on your keyboard, _inherently_ have no meaning.  They have
> an "assigned" meaning as assigned by the locale settings so they can
> send those characters to a program.
>
>     If you create your own layout, you need to create a *custom*
> mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
> create the mapping or the meanings.
>
>  (what cygwin uses -- cygwin didn't create its own system, it uses
> the POSIX standard).
> > On Mon, 25 Jan 2021 13:46:48 +0100
> > Ariel Burbaickij wrote:
> >
> >> Hello Cygwin,
> >> I tried to find some files from the command line prompt which are
> >> named using various non-Latin (Russian, Hebrew, Arabic) and
> >> non-default Latin (German) layouts under Windows 10 Enterprise using
> >> recent cygwin version and the outcome is that instead of representing
> >> letters I see control characters of the type: \263\320\321  (Unicode
> >> numeric value of the letters?). Any ideas what happens here and how
> >> correct functionality can be restored?
> >>
> ---
>     Note that the characters you type are 1 thing.  How a program
> interprets those characters is by using the "locale" settings.
>
>     The locale is using UTF-8.  So you need to set your terminal
> to interpret unicode.  I don't know much about Win10, but in the Microsoft
> cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
> 65001, so in such a terminal you could type:
>
> chcp<Enter>                # this should say something like:
> Active code page: 801      # your number may be different
>
> # Remember it to switch back to your initial code page (or just
> #  close the cmd window).
>
> To switch to UTF-8, type:
>
> chcp 65001
>
> That will interpret output as UTF-8 in that program.
>
> Note, I'm not sure that will be all of your problems.
> "\263" is not valid for the 1st byte of a UTF-8 string. Valid
> First bytes of a single UTF-8 char (in hex):
> 00-7f, c2-cf, d0-df, e0-ef, f0-f4.
> So if you see something like 0xb3 in the 1st byte of a unicode
> character, you know it can't exist (part of UTF-8's
> self-synchronizing feature).
>
> A very useful utility for displaying all unicode characters
> and what character sets you have that can display them can be
> found at:
>
> https://www.babelstone.co.uk/Software/BabelMap.html
>
> Unzip it into a folder and put a link to it where it is
> easy to access.
>
>
> Hope this helps.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 14:40     ` Thomas Wolff
@ 2021-01-25 21:01       ` Ariel Burbaickij
  0 siblings, 0 replies; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 21:01 UTC (permalink / raw)
  To: Thomas Wolff; +Cc: cygwin

I try to find files like "find . -name
<unexpected_numbers_instead_of_non-Latin_characters_appear_here>". Good
that you asked about the shell as the story gets even more interesting, in
a peculiar way:
If I start cygwin then echo $SHELL returns /bin/bash but input is not
handled correctly as described. If I spawn child bash -- then it handles
Russian and German correctly but no Hebrew and Arabic and the same applies
to vim.
Yes, it worked before.

Best Regards
Ariel Burbaickij

On Mon, Jan 25, 2021 at 4:01 PM Thomas Wolff <towo@towo.net> wrote:

> Am 25.01.2021 um 15:03 schrieb Ariel Burbaickij via Cygwin:
> > It says following:
> > LANG=en_US.UTF-8
> > LC_CTYPE="en_US.UTF-8"
> > LC_NUMERIC="en_US.UTF-8"
> > LC_TIME="en_US.UTF-8"
> > LC_COLLATE="en_US.UTF-8"
> > LC_MONETARY="en_US.UTF-8"
> > LC_MESSAGES="en_US.UTF-8"
> > LC_ALL=
> >
> > but why would it matter in the scenario where the user switches the
> layout
> > explicitly him-/herself?
> >
> >
> > Kind Regards
> > Ariel Burbaickij
> Please answer below the quoted mail in this list.
>
> > On Mon, Jan 25, 2021 at 2:29 PM Takashi Yano <takashi.yano@nifty.ne.jp>
> wrote:
> >
> >> On Mon, 25 Jan 2021 13:46:48 +0100
> >> Ariel Burbaickij wrote:
> >>> Hello Cygwin,
> >>> I tried to find some files from the command line prompt which are named
> >>> using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
> >>> (German) layouts under Windows 10 Enterprise using recent cygwin
> version
> >>> and the outcome is that instead of representing letters I see control
> >>> characters of the type: \263\320\321  (Unicode numeric value of the
> >>> letters?). Any ideas what happens here and how correct functionality
> can be restored?
> Your information is quite sparse. How do you try to find files? What's
> your command? Which shell do you use? Did it ever work before for you?
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 20:50 ` Brian Inglis
@ 2021-01-25 21:12   ` Ariel Burbaickij
  2021-01-25 21:53     ` Brian Inglis
  0 siblings, 1 reply; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 21:12 UTC (permalink / raw)
  To: cygwin

I used mintty -- default in cygwin installation as I understand.
I switch layouts by switching keyboard mappings, mappings are customized
from the standard ones to what is called "phonetic" for non-Latin alphabets
but this is handled just fine everywhere outside Cygwin.
I will be guessing here what you request from me but I attempted to type in
UTF-8  август,  basically Russian in  all small letters for August as a
more or less random but valid example. FIlename I was looking for contains
this string and filename is presented correctly as all others are with ls
but I cannot type this string in cygwin's prompt.

I hope we are coming closer to the cause here.

Best Regards
Ariel Burbaickij




On Mon, Jan 25, 2021 at 9:59 PM Brian Inglis <
Brian.Inglis@systematicsw.ab.ca> wrote:

> On 2021-01-25 05:46, Ariel Burbaickij via Cygwin wrote:
> > I tried to find some files from the command line prompt which are named
> > using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin
> > (German) layouts under Windows 10 Enterprise using recent cygwin version
> > and the outcome is that instead of representing letters I see control
> > characters of the type: \263\320\321  (Unicode numeric value of the
> > letters?). Any ideas what happens here and how correct functionality can
> be
> > restored?
>
> Which command line prompt(s): cmd, mintty, rxvt, xterm, ...?
>
> Where and how did you switch layouts: Windows keyboard mapping, Windows
> system
> locale, Windows user regional settings, chcp, LANG, LC_CTYPE, LC_ALL, ...?
>
> If you are using a terminal, what are the terminal locale and code page
> settings?
>
> Maybe you could explicitly show and tell us what characters you used
> (sending in
> hex please and also in 8bit UTF-8 for maximum readability: that looks like
> octal
> which went out with ASCII, ISO-646, SBCS code pages), show us how the
> filenames
> appear including the locales and the shell command lines, and show and
> tell us
> what you expect, and what is the difference in what you see.
>
> For details on Cygwin file name special character mappings, see:
>
>         https://cygwin.com/cygwin-ug-net/using-specialnames.html
>
> --
> Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
>
> This email may be disturbing to some readers as it contains
> too much technical detail. Reader discretion is advised.
> [Data in binary units and prefixes, physical quantities in SI.]
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 21:12   ` Ariel Burbaickij
@ 2021-01-25 21:53     ` Brian Inglis
  2021-01-25 22:20       ` Ariel Burbaickij
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Inglis @ 2021-01-25 21:53 UTC (permalink / raw)
  To: cygwin

On 2021-01-25 14:12, Ariel Burbaickij via Cygwin wrote:
> On Mon, Jan 25, 2021 at 9:59 PM Brian Inglis wrote:
>> On 2021-01-25 05:46, Ariel Burbaickij via Cygwin wrote:
>>> I tried to find some files from the command line prompt which are named 
>>> using various non-Latin (Russian, Hebrew, Arabic) and non-default Latin 
>>> (German) layouts under Windows 10 Enterprise using recent cygwin version 
>>> and the outcome is that instead of representing letters I see control 
>>> characters of the type: \263\320\321  (Unicode numeric value of the 
>>> letters?). Any ideas what happens here and how correct functionality can 
>>> be restored?

>> Which command line prompt(s): cmd, mintty, rxvt, xterm, ...?
>> Where and how did you switch layouts: Windows keyboard mapping, Windows 
>> system locale, Windows user regional settings, chcp, LANG, LC_CTYPE,
>> LC_ALL, ...?
>> If you are using a terminal, what are the terminal locale and code page
>> settings?
>> Maybe you could explicitly show and tell us what characters you used 
>> (sending in hex please and also in 8bit UTF-8 for maximum readability: that
>> looks like octal which went out with ASCII, ISO-646, SBCS code pages), show
>> us how the filenames appear including the locales and the shell command
>> lines, and show and tell us what you expect, and what is the difference in
>> what you see.
>> For details on Cygwin file name special character mappings, see:
>>          https://cygwin.com/cygwin-ug-net/using-specialnames.html
 > I used mintty -- default in cygwin installation as I understand.
 > I switch layouts by switching keyboard mappings, mappings are customized
 > from the standard ones to what is called "phonetic" for non-Latin alphabets
 > but this is handled just fine everywhere outside Cygwin.
 > I will be guessing here what you request from me but I attempted to type in
 > UTF-8  август,  basically Russian in  all small letters for August as a
 > more or less random but valid example. FIlename I was looking for contains
 > this string and filename is presented correctly as all others are with ls
 > but I cannot type this string in cygwin's prompt.

Using what utility/-ies, how and where did you customize and switch keyboard 
mappings: Windows keyboard mapping, Windows system locale, Windows user regional 
settings, readline {/etc/,~/.}inputrc?

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 21:53     ` Brian Inglis
@ 2021-01-25 22:20       ` Ariel Burbaickij
  2021-01-25 23:16         ` L A Walsh
  0 siblings, 1 reply; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-25 22:20 UTC (permalink / raw)
  To: cygwin

For Russian and Hebrew keyboard layout I used University of Kansas EGARC
Center foreign languages keyboard layouts:
https://egarc.ku.edu/keyboards
For Arabic, the same idea but different site:
https://arabic.omaralzabir.com/
presumably all done with this Windows Keyboard Layout Creator
BUT for German language I did not use anything -- it is plain vanilla
German in Germany layout, and this is what I get upon attempt to submit
little sweet ö:
 $(__fzf_cd__)Ignoring redcarpet-3.4.0 because its extensions are not
built. Try: gem pristine redcarpet --version 3.4.0
Traceback (most recent call last):
        4: from /usr/bin/fzf:1347:in `<main>'
        3: from /usr/bin/fzf:309:in `start'
        2: from /usr/bin/fzf:1157:in `start_loop'
        1: from /usr/bin/fzf:929:in `get_input'
/usr/bin/fzf:929:in `ord': invalid byte sequence in UTF-8 (ArgumentError)
$
and I mean what I say, pressing ö immediately leads to it, no tricks, no
custom builds, no debugs enabled,  no nothing.

Best Regards
Ariel Burbaickij





On Mon, Jan 25, 2021 at 11:10 PM Brian Inglis <
Brian.Inglis@systematicsw.ab.ca> wrote:

> On 2021-01-25 14:12, Ariel Burbaickij via Cygwin wrote:
> > On Mon, Jan 25, 2021 at 9:59 PM Brian Inglis wrote:
> >> On 2021-01-25 05:46, Ariel Burbaickij via Cygwin wrote:
> >>> I tried to find some files from the command line prompt which are
> named
> >>> using various non-Latin (Russian, Hebrew, Arabic) and non-default
> Latin
> >>> (German) layouts under Windows 10 Enterprise using recent cygwin
> version
> >>> and the outcome is that instead of representing letters I see control
> >>> characters of the type: \263\320\321  (Unicode numeric value of the
> >>> letters?). Any ideas what happens here and how correct functionality
> can
> >>> be restored?
>
> >> Which command line prompt(s): cmd, mintty, rxvt, xterm, ...?
> >> Where and how did you switch layouts: Windows keyboard mapping, Windows
> >> system locale, Windows user regional settings, chcp, LANG, LC_CTYPE,
> >> LC_ALL, ...?
> >> If you are using a terminal, what are the terminal locale and code page
> >> settings?
> >> Maybe you could explicitly show and tell us what characters you used
> >> (sending in hex please and also in 8bit UTF-8 for maximum readability:
> that
> >> looks like octal which went out with ASCII, ISO-646, SBCS code pages),
> show
> >> us how the filenames appear including the locales and the shell command
> >> lines, and show and tell us what you expect, and what is the difference
> in
> >> what you see.
> >> For details on Cygwin file name special character mappings, see:
> >>          https://cygwin.com/cygwin-ug-net/using-specialnames.html
>  > I used mintty -- default in cygwin installation as I understand.
>  > I switch layouts by switching keyboard mappings, mappings are customized
>  > from the standard ones to what is called "phonetic" for non-Latin
> alphabets
>  > but this is handled just fine everywhere outside Cygwin.
>  > I will be guessing here what you request from me but I attempted to
> type in
>  > UTF-8  август,  basically Russian in  all small letters for August as a
>  > more or less random but valid example. FIlename I was looking for
> contains
>  > this string and filename is presented correctly as all others are with
> ls
>  > but I cannot type this string in cygwin's prompt.
>
> Using what utility/-ies, how and where did you customize and switch
> keyboard
> mappings: Windows keyboard mapping, Windows system locale, Windows user
> regional
> settings, readline {/etc/,~/.}inputrc?
>
> --
> Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
>
> This email may be disturbing to some readers as it contains
> too much technical detail. Reader discretion is advised.
> [Data in binary units and prefixes, physical quantities in SI.]
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 20:50       ` Ariel Burbaickij
@ 2021-01-25 22:58         ` L A Walsh
  0 siblings, 0 replies; 14+ messages in thread
From: L A Walsh @ 2021-01-25 22:58 UTC (permalink / raw)
  To: Ariel Burbaickij; +Cc: Takashi Yano, cygwin



On 2021/01/25 12:50, Ariel Burbaickij wrote:
> Wait a sec, what do you specifically mean with "... Cygwin just uses the 
> POSIX standard..." -- POSIX standard for what and how does it interfere 
> with getting the current layout and mapping from OS?
---
	Cygwin doesn't get things directly from the OS.  It relies
on things like the locale being set correctly in order for it to
function.  Windows does not use the POSIX or unix/linux type API,
so changing settings in Windows isn't something that will necessarily
configure cygwin to work.

If you are using the cygwin terminal, I don't know -- I don't use 
the same features.  But, for example, bash, the shell likely running
in the cygwin terminal, has its own understanding of how it should
process characters.  It, in turn, relies on the "readline" facility.
It may be configured by default, but a few settings, I had to set
*ages ago* (been using cygwin for >20 years) in the ".inputrc" file
in my home directory:

set convert-meta off
set input-meta on
set output-meta on

Those all may be default settings now, but at one point in time
I think I had to set them.



> What do you also mean with "... So you need to set your terminal to 
> interpret unicode..." ? My terminal is Cygwin Terminal here. cmd.exe 
> does at least handle Russian and German just fine, not so Arabic and 
> Hebrew but this, I am pretty sure, because of some additional fiddling 
> around right-to-left writing needed. Notepad++(!) already handles all 
> input types just fine as do all the other programs tested so far. So, 
> what are these supposed big OS-side secrets specifically that cygwin 
> cannot get to here?
---
	It isn't a matter of "can't", it is a matter of doing so would
cause cygwin not to behave like programs on linux or on unix.  Cygwin
relies on settings in configuration files -- not OS-side hidden secrets.

	At one point in time, most to all of windows internals were 
undocumented. On unix/linux/posix they worked together to document how
all the programs would work together, but windows wanted no part of that.
Just like the path-separator on windows is usually '\'.  Bill Gates chose
that specifically because it was the "opposite" of "/" that was used
on unix and CP/M (a micro computer OS at the time).  He wanted to 
differentiate his offering by making sure DOS and Windows did things
differently from how standards were shaping up (early 80's) in other
OS's.  You and others are stuck with the legacy of those choices.  Just
like if you set the keyboard in a MS app running on an apple OS, it wouldn't
necessarily change anything in the apple OS.  They went yet another, 
different way.  

	At least MS didn't sue everyone who came close to their standards
or software like Apple did (and still does).  This prevented any sort of
compatible software from outside of Apple's approval.  MS could have
gone that way -- but because it was the largest, it got some controls 
slapped in place to allow compatibility.  Cygwin arose out of linux + 
posix compatibility -- not out of Windows, as such, you need to figure out
how to configure cygwin separately and apart from Windows.  Cygwin tries
to document things and follow open standards.  In the extreme, cygwin
has open source, so you can see how it works, and change it yourself to
suit your needs (or hire a software engineer to do so).  Windows and apple
don't supply source nor extreme detail in how their OS works.  

	I was trying to be helpful and tell you that cygwin interfaces may
need extra configuration if you want to personalize things or go with
non defaults.  On windows, if you want to do anything outside of what the
OS clearly presents to you, it will be a much more difficult path to change
anything.  I don't know the answers to the questions you are asking, I don't
work on cygwin, I just use it on windows as a slightly more comfortable 
text-based interface than what windows has provided over the years.  I have
some background in linux which makes cygwin's interface more familiar to me.

	I've never gotten into Windows development, since it was always too
costly and too dead-end.  Several technologies from MS have come and gone
over the past 40+ years.  I could have invested in learning any of them, and
now, many or most would be gone.  Given that, learning a special API that 
would only be usable for windows, that likely would be obsolete in 5 years, 
and paying for the privilege of being allowed to use such an API and tools
seems like a waste of time.  As such, things I learned about unix and shell
30 years ago, are still usable + useful today.

	I don't know how to create a keyboard layout on linux, I use what
is there.  I certainly don't know how to modify a cygwin/posix keyboard
layout to be compatible with windows.  Sorry.  Maybe someone else knows
more.  If you want cygwin to automatically read your changes in windows,
I'm told that the cygwin project would be happy to accept patches and
source updates to enhance its functionality.

Cheers!
Linda

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 22:20       ` Ariel Burbaickij
@ 2021-01-25 23:16         ` L A Walsh
  2021-01-26  8:50           ` Ariel Burbaickij
  0 siblings, 1 reply; 14+ messages in thread
From: L A Walsh @ 2021-01-25 23:16 UTC (permalink / raw)
  To: Ariel Burbaickij; +Cc: cygwin

On 2021/01/25 14:20, Ariel Burbaickij via Cygwin wrote:
>  and this is what I get upon attempt to submit
> little sweet ö:
>  $(__fzf_cd__)Ignoring redcarpet-3.4.0 because its extensions are not
> built. ...
>         1: from /usr/bin/fzf:929:in `get_input'
> /usr/bin/fzf:929:in `ord': invalid byte sequence in UTF-8 (ArgumentError)
> $
> and I mean what I say, pressing ö immediately leads to it, no tricks, no
> custom builds, no debugs enabled,  no nothing.
>   
---
    Remember in my first post, I said that the codes you included were
not valid UTF-8.  It sounds like your program wants UTF-8, but your
keyboard is putting out latin1.

Did you download that program I mentioned?  In there you can select
the 'o' with diaeresis then copy/paste it into your program.

The character you are inserting into your program isn't encoded in
UTF-8, so I'm pretty sure that your keyboard isn't producing
UTF-8 encoding.

You mention that it does work after you restart your terminal.

Setting in locale don't take effect in the current terminal, but in
future ones that you start.  So it is a good idea to restart your terminal
after you change locale. 

I assume the error you are showing is from a window that wasn't restarted
after your locale was changed?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum
  2021-01-25 23:16         ` L A Walsh
@ 2021-01-26  8:50           ` Ariel Burbaickij
  0 siblings, 0 replies; 14+ messages in thread
From: Ariel Burbaickij @ 2021-01-26  8:50 UTC (permalink / raw)
  To: L A Walsh; +Cc: cygwin

>It sounds like your program wants UTF-8, but your
>keyboard is putting out latin1.

OK, I did the following: I went to cmd.exe and changed Active code page
from 437 (latin1 ?) to 65001 (UTF-8 ?) but there were no changes in the
behaviour in Cygwin terminal. In cmd.exe itself it handles Russian and
German itself just fine also with 437 code page. For BabelMap -- I do not
see how it can be useful for regular input of non-English string and not
just one single character here and there. And again -- it is not something
that was present from day zero or somewhere close in Cygwin, it worked just
fine on my old laptop, I migrated to a new one and this is what I see for
the first time.

Best Regards
Ariel Burbaickij

On Tue, Jan 26, 2021 at 12:17 AM L A Walsh <cygwin@tlinx.org> wrote:

> On 2021/01/25 14:20, Ariel Burbaickij via Cygwin wrote:
> >  and this is what I get upon attempt to submit
> > little sweet ö:
> >  $(__fzf_cd__)Ignoring redcarpet-3.4.0 because its extensions are not
> > built. ...
> >         1: from /usr/bin/fzf:929:in `get_input'
> > /usr/bin/fzf:929:in `ord': invalid byte sequence in UTF-8 (ArgumentError)
> > $
> > and I mean what I say, pressing ö immediately leads to it, no tricks, no
> > custom builds, no debugs enabled,  no nothing.
> >
> ---
>     Remember in my first post, I said that the codes you included were
> not valid UTF-8.  It sounds like your program wants UTF-8, but your
> keyboard is putting out latin1.
>
> Did you download that program I mentioned?  In there you can select
> the 'o' with diaeresis then copy/paste it into your program.
>
> The character you are inserting into your program isn't encoded in
> UTF-8, so I'm pretty sure that your keyboard isn't producing
> UTF-8 encoding.
>
> You mention that it does work after you restart your terminal.
>
> Setting in locale don't take effect in the current terminal, but in
> future ones that you start.  So it is a good idea to restart your terminal
> after you change locale.
>
> I assume the error you are showing is from a window that wasn't restarted
> after your locale was changed?
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-26  8:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 12:46 switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum Ariel Burbaickij
2021-01-25 13:29 ` Takashi Yano
2021-01-25 14:03   ` Ariel Burbaickij
2021-01-25 14:40     ` Thomas Wolff
2021-01-25 21:01       ` Ariel Burbaickij
2021-01-25 20:20     ` L A Walsh
2021-01-25 20:50       ` Ariel Burbaickij
2021-01-25 22:58         ` L A Walsh
2021-01-25 20:50 ` Brian Inglis
2021-01-25 21:12   ` Ariel Burbaickij
2021-01-25 21:53     ` Brian Inglis
2021-01-25 22:20       ` Ariel Burbaickij
2021-01-25 23:16         ` L A Walsh
2021-01-26  8:50           ` Ariel Burbaickij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).