From: Gionatan Danti <g.danti@assyoma.it>
To: L A Walsh <cygwin@tlinx.org>
Cc: cygwin@cygwin.com
Subject: Re: Can not stat file with utf char U+F020
Date: Wed, 19 Apr 2023 13:56:54 +0200 [thread overview]
Message-ID: <9f1593d259faf7f845b96947eaff8619@assyoma.it> (raw)
In-Reply-To: <643F3F87.2050403@tlinx.org>
Il 2023-04-19 03:10 L A Walsh ha scritto:
> I'm a bit confused as to what char you are trying to access/use, as
> U+F020 is in the Private Use area (PUA)
>
> Since it's in the PUA, it seems its meaning could differ by
> application/OS/User, no?
> I.e. have no set definition
>
> I mean you can use it in Cygwin to represent some character not
> usually permitted in
> a DOS/Win filename (like :/\, etc.), but it wouldn't have the same
> meaning then
> in Windows though.? Isn't Private Use area application specific so an
> application can
> create and use its own symbol set -- even though it wouldn't be
> portable to another application.
The issue is with any clients/applications (even cygwin) creating a
filename ending with a dot (or other chars) which is replaced with
U+F020. If this file is later renamed adding some other character
*after* the replaced dot, it become unreadable by cygwin.
Something similar to that:
- an user create a file name "project.", forgetting the extension, on an
Windows share;
- the client replace the dot with U+F020;
- at this point all is good: the file can be read by the client, Windows
and cygwin;
- the user notice the missing extension and rename the file in
"project.txt";
- cygwin now does *not* traslate back U+F020 to dot and it is unable to
read the file.
> I think characters in the PUA range are used to allow Cygwin filenames
> to contain colon, slashes
> and quotes -- so one wouldn't want Windows to understand the cygwin
> intent or it would defeat
> the purpose of using custom characters to represent filenames that are
> legal under POSIX but not
> under Windows.
True, but dot and spaces are somewhat different from the other reserved
chars. While backslash, colons, etc. are rejected by NTFS itself (or by
lower layer API), trailing dot and spaces are ignored/stripped by Win32.
This means that Linux clients accessing an SMB share *can* successfully
create such filenames without any issue and without replacing them with
PUA chars.
For example, I created a file called "zzz." from a Linux+Mate client.
Cygwin correctly see the filename as:
$ ls "zzz." | od -x --endian=big
0000000 7a7a 7a2e 0a00
True, Windows can not access this file, but this is fine because such a
filename should never be understood by Windows. Not being able to open
the file from Windows, its users themselves will find and correct the
issue, renaming the file.
As things are now, we have the opposite issue: should (for whichever
reason) a file exist with names as "zzz[U+F020]txt", cygwin will not be
able to access this file. This means that anyone using cygwin+rsync to
backup a Windows server will now have an inaccessible and impossible to
backup file.
Thinking about that: how do you feel having an option to exclude
trailing dots and spaces from PUA translations (effectively reverting
them to the status of "normal" characters)?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
next prev parent reply other threads:[~2023-04-19 11:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-14 17:53 Gionatan Danti
2023-04-14 19:00 ` Corinna Vinschen
2023-04-14 19:54 ` Brian Inglis
2023-04-14 20:20 ` Corinna Vinschen
2023-04-14 20:21 ` Gionatan Danti
2023-04-14 20:25 ` Corinna Vinschen
2023-04-14 21:01 ` Gionatan Danti
2023-04-17 5:36 ` Gionatan Danti
2023-04-17 9:05 ` Corinna Vinschen
2023-04-17 10:58 ` Andrey Repin
2023-04-17 13:46 ` Gionatan Danti
2023-04-18 21:09 ` Gionatan Danti
2023-04-19 1:10 ` L A Walsh
2023-04-19 11:56 ` Gionatan Danti [this message]
2023-04-14 20:17 ` Gionatan Danti
2023-04-14 20:40 ` Corinna Vinschen
2023-04-14 20:51 ` Gionatan Danti
2023-04-15 5:10 ` Brian Inglis
2023-04-17 9:10 ` Corinna Vinschen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f1593d259faf7f845b96947eaff8619@assyoma.it \
--to=g.danti@assyoma.it \
--cc=cygwin@cygwin.com \
--cc=cygwin@tlinx.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).