public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Removing ^X in paths
@ 2022-02-02 20:40 Dennis Heimbigner
  2022-02-03  2:23 ` L A Walsh
  0 siblings, 1 reply; 8+ messages in thread
From: Dennis Heimbigner @ 2022-02-02 20:40 UTC (permalink / raw)
  To: cygwin

It appears that windows now supports the UTF-8 codepage
and generally allows UTF-8 everywhere that ASCII was supported.
I light of this, it seems time to change cygwin so it no longer adds those
control-x (^X)  characters in e.g. path names.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-02 20:40 Removing ^X in paths Dennis Heimbigner
@ 2022-02-03  2:23 ` L A Walsh
  2022-02-03  4:12   ` Dennis Heimbigner
  0 siblings, 1 reply; 8+ messages in thread
From: L A Walsh @ 2022-02-03  2:23 UTC (permalink / raw)
  To: Dennis Heimbigner; +Cc: cygwin

On 2022/02/02 12:40, Dennis Heimbigner wrote:
> It appears that windows now supports the UTF-8 codepage.
>   
It has since early 2000's.
> I light of this, it seems time to change cygwin so it no longer adds those
> control-x (^X)  characters in e.g. path names.
>   
^x is ASCII.  Cygwin doesn't insert ^X characters in paths.

Perhaps you are thinking of '\' which looks like ¥ (a capital 'Y' with 
2 horizontal lines, (Fullwidth Yen Sign  U+FFE5)...if that's the case, 
some 8-bit font
displayed that sign instead of a backslash in non-unicode locals.

Are you using a 32-bit or 64-bit version of Cygwin?  on what version of 
windows?

If you still use a 32-bit version, you might need to move to a 64-bit 
version.
I know the 32-bit version sometimes had the problem because it supported
fewer fonts and fewer characters at the same time.

You might check out your locale (if in english, try setting:
LC_CTYPE="en_US.UTF-8"
in your shell and also check that your used font has a backslash in the
0x7f position.

But in shell, ^x is usually a character to erase the whole line -- so it 
really
wouldn't do to have it in a PATH.

Hope this helps, and sorry if this is completely off base.

Linda

>
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  2:23 ` L A Walsh
@ 2022-02-03  4:12   ` Dennis Heimbigner
  2022-02-03  5:02     ` Thomas Wolff
                       ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Dennis Heimbigner @ 2022-02-03  4:12 UTC (permalink / raw)
  To: L A Walsh; +Cc: cygwin

I am using 64bit.
And it has nothing to do misreading characters.

The ^X is described in this document: 
https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,

There you will see this text:

"If you don't want or can't use UTF-8 as character set for
whatever reason, you will nevertheless be able to access the
file. How does that work? When Cygwin converts the filename from
UTF-16 to your character set, it recognizes characters which
can't be converted. If that occurs, Cygwin replaces the
non-convertible character with a special character sequence. The
sequence starts with an ASCII CAN character (hex code 0x18,
equivalent Control-X), followed by the UTF-8 representation of
the character. The result is a filename containing some ugly
looking characters. While it doesn't look nice, it is nice,
because Cygwin knows how to convert this filename back to
UTF-16. The filename will be converted using your usual
character set. However, when Cygwin recognizes an ASCII CAN
character, it skips over the ASCII CAN and handles the following
bytes as a UTF-8 character. Thus, the filename is symmetrically
converted back to UTF-16 and you can access the file."

There is no obvious good reason to continue this convention.


On 2/2/2022 7:23 PM, L A Walsh wrote:
> On 2022/02/02 12:40, Dennis Heimbigner wrote:
>> It appears that windows now supports the UTF-8 codepage.
> It has since early 2000's.
>> I light of this, it seems time to change cygwin so it no longer adds 
>> those
>> control-x (^X)  characters in e.g. path names.
> ^x is ASCII.  Cygwin doesn't insert ^X characters in paths.
>
> Perhaps you are thinking of '\' which looks like ¥ (a capital 'Y' with 
> 2 horizontal lines, (Fullwidth Yen Sign  U+FFE5)...if that's the case, 
> some 8-bit font
> displayed that sign instead of a backslash in non-unicode locals.
>
> Are you using a 32-bit or 64-bit version of Cygwin?  on what version 
> of windows?
>
> If you still use a 32-bit version, you might need to move to a 64-bit 
> version.
> I know the 32-bit version sometimes had the problem because it supported
> fewer fonts and fewer characters at the same time.
>
> You might check out your locale (if in english, try setting:
> LC_CTYPE="en_US.UTF-8"
> in your shell and also check that your used font has a backslash in the
> 0x7f position.
>
> But in shell, ^x is usually a character to erase the whole line -- so 
> it really
> wouldn't do to have it in a PATH.
>
> Hope this helps, and sorry if this is completely off base.
>
> Linda
>
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  4:12   ` Dennis Heimbigner
@ 2022-02-03  5:02     ` Thomas Wolff
  2022-02-03  5:09     ` Brian Inglis
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Thomas Wolff @ 2022-02-03  5:02 UTC (permalink / raw)
  To: cygwin



Am 03.02.2022 um 05:12 schrieb Dennis Heimbigner:
> I am using 64bit.
> And it has nothing to do misreading characters.
>
> The ^X is described in this document: 
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
>
> There you will see this text:
>
> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
This supports a non-UTF-8 cygwin client side, e.g. when running 
LC_ALL=de_DE mintty and you have a Chinese character in a file name.

> There is no obvious good reason to continue this convention.
See above, there is good reason and no reason to drop it.

Thomas

>
> On 2/2/2022 7:23 PM, L A Walsh wrote:
>> On 2022/02/02 12:40, Dennis Heimbigner wrote:
>>> It appears that windows now supports the UTF-8 codepage.
>> It has since early 2000's.
>>> I light of this, it seems time to change cygwin so it no longer adds 
>>> those
>>> control-x (^X)  characters in e.g. path names.
>> ^x is ASCII.  Cygwin doesn't insert ^X characters in paths.
>>
>> Perhaps you are thinking of '\' which looks like ¥ (a capital 'Y' 
>> with 2 horizontal lines, (Fullwidth Yen Sign  U+FFE5)...if that's the 
>> case, some 8-bit font
>> displayed that sign instead of a backslash in non-unicode locals.
>>
>> Are you using a 32-bit or 64-bit version of Cygwin?  on what version 
>> of windows?
>>
>> If you still use a 32-bit version, you might need to move to a 64-bit 
>> version.
>> I know the 32-bit version sometimes had the problem because it supported
>> fewer fonts and fewer characters at the same time.
>>
>> You might check out your locale (if in english, try setting:
>> LC_CTYPE="en_US.UTF-8"
>> in your shell and also check that your used font has a backslash in the
>> 0x7f position.
>>
>> But in shell, ^x is usually a character to erase the whole line -- so 
>> it really
>> wouldn't do to have it in a PATH.
>>
>> Hope this helps, and sorry if this is completely off base.
>>
>> Linda
>>
>>>
>>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  4:12   ` Dennis Heimbigner
  2022-02-03  5:02     ` Thomas Wolff
@ 2022-02-03  5:09     ` Brian Inglis
  2022-02-03  6:11     ` L A Walsh
  2022-02-03  8:53     ` Corinna Vinschen
  3 siblings, 0 replies; 8+ messages in thread
From: Brian Inglis @ 2022-02-03  5:09 UTC (permalink / raw)
  To: cygwin

On 2022-02-02 21:12, Dennis Heimbigner wrote:
> On 2/2/2022 7:23 PM, L A Walsh wrote:
>> On 2022/02/02 12:40, Dennis Heimbigner wrote:
>>> It appears that windows now supports the UTF-8 codepage.
>> It has since early 2000's.
>>> I light of this, it seems time to change cygwin so it no longer adds 
>>> those
>>> control-x (^X)  characters in e.g. path names.
>> ^x is ASCII.  Cygwin doesn't insert ^X characters in paths.
>> Perhaps you are thinking of '\' which looks like ¥ (a capital 'Y' 
>> with 2 horizontal lines, (Fullwidth Yen Sign  U+FFE5)...if that's the 
>> case, some 8-bit font
>> displayed that sign instead of a backslash in non-unicode locals.
>> Are you using a 32-bit or 64-bit version of Cygwin?  on what version 
>> of windows?
>> If you still use a 32-bit version, you might need to move to a 64-bit 
>> version.
>> I know the 32-bit version sometimes had the problem because it supported
>> fewer fonts and fewer characters at the same time.
>> You might check out your locale (if in english, try setting:
>> LC_CTYPE="en_US.UTF-8"
>> in your shell and also check that your used font has a backslash in the
>> 0x7f position.
>> But in shell, ^x is usually a character to erase the whole line -- so 
>> it really
>> wouldn't do to have it in a PATH.
>> Hope this helps, and sorry if this is completely off base.

 > I am using 64bit.
 > And it has nothing to do misreading characters.
 > The ^X is described in this document:
 > https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
 > There you will see this text:
 > "If you don't want or can't use UTF-8 as character set for
 > whatever reason, you will nevertheless be able to access the
 > file. How does that work? When Cygwin converts the filename from
 > UTF-16 to your character set, it recognizes characters which
 > can't be converted. If that occurs, Cygwin replaces the
 > non-convertible character with a special character sequence. The
 > sequence starts with an ASCII CAN character (hex code 0x18,
 > equivalent Control-X), followed by the UTF-8 representation of
 > the character. The result is a filename containing some ugly
 > looking characters. While it doesn't look nice, it is nice,
 > because Cygwin knows how to convert this filename back to
 > UTF-16. The filename will be converted using your usual
 > character set. However, when Cygwin recognizes an ASCII CAN
 > character, it skips over the ASCII CAN and handles the following
 > bytes as a UTF-8 character. Thus, the filename is symmetrically
 > converted back to UTF-16 and you can access the file."
 > There is no obvious good reason to continue this convention.

This is not a convention, it is an interoperability feature, to allow 
unsupported characters to be used in filenames, otherwise Cygwin would 
have to fail the file open in locales where those characters are 
unsupported.

I have always used ASCII, ISO-8859-1/15, or UTF-8 and have never seen a 
^X in any filename, although I have produced many other control and 
special characters in filenames by error. ;^>

If you never use a limited character set locale with filenames using 
extended character sets you will never see this either.

This feature is for those who may be importing files with names in 
extended character sets but their selected locale only supports a 
limited character set.

Some users and nationalities still prefer to use locales with limited 
character sets, perhaps because their important apps still use them, and 
they are familiar with the related keyboard mappings and font glyphs.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  4:12   ` Dennis Heimbigner
  2022-02-03  5:02     ` Thomas Wolff
  2022-02-03  5:09     ` Brian Inglis
@ 2022-02-03  6:11     ` L A Walsh
  2022-02-03  9:18       ` Thomas Wolff
  2022-02-03  8:53     ` Corinna Vinschen
  3 siblings, 1 reply; 8+ messages in thread
From: L A Walsh @ 2022-02-03  6:11 UTC (permalink / raw)
  To: Dennis Heimbigner; +Cc: cygwin



On 2022/02/02 20:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
> 
> The ^X is described in this document: 
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
----
Wow, I've never seen such a pathname.

What's an example of a filename that cygwin displays this way?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  4:12   ` Dennis Heimbigner
                       ` (2 preceding siblings ...)
  2022-02-03  6:11     ` L A Walsh
@ 2022-02-03  8:53     ` Corinna Vinschen
  3 siblings, 0 replies; 8+ messages in thread
From: Corinna Vinschen @ 2022-02-03  8:53 UTC (permalink / raw)
  To: cygwin

On Feb  2 21:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
> 
> The ^X is described in this document:
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
> 
> There you will see this text:
> 
> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
> 
> There is no obvious good reason to continue this convention.

You're probably using a non-UTF-8 locale, e. g., LANG=en_US using
ISO-8859-1 as charset.  See the output of `locale -av' to learn what
charset your locale uses.  Either way, converting the UTF-16 filenames
to a non-UTF charset is not lossless.  That's what the ASCII CAN stuff
is for.  If you want to avoid that, use a UTF-8 locale, e.g.
en_US.UTF-8.


Corinna

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Removing ^X in paths
  2022-02-03  6:11     ` L A Walsh
@ 2022-02-03  9:18       ` Thomas Wolff
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Wolff @ 2022-02-03  9:18 UTC (permalink / raw)
  To: cygwin



Am 03.02.2022 um 07:11 schrieb L A Walsh:
>
>
> On 2022/02/02 20:12, Dennis Heimbigner wrote:
>> I am using 64bit.
>> And it has nothing to do misreading characters.
>>
>> The ^X is described in this document: 
>> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
> ----
> Wow, I've never seen such a pathname.
>
> What's an example of a filename that cygwin displays this way?
touch xy€
LC_ALL=de_DE mintty
ls

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-03  9:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 20:40 Removing ^X in paths Dennis Heimbigner
2022-02-03  2:23 ` L A Walsh
2022-02-03  4:12   ` Dennis Heimbigner
2022-02-03  5:02     ` Thomas Wolff
2022-02-03  5:09     ` Brian Inglis
2022-02-03  6:11     ` L A Walsh
2022-02-03  9:18       ` Thomas Wolff
2022-02-03  8:53     ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).