public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* [1.7] Invalid UTF8 while creating a file -> cannot delete?
@ 2009-09-10 19:31 Lapo Luchini
  2009-09-10 22:12 ` Andy Koppe
  0 siblings, 1 reply; 15+ messages in thread
From: Lapo Luchini @ 2009-09-10 19:31 UTC (permalink / raw)
  To: cygwin

After a few problems with monotone's unit tests on Cygwin-1.7, I began
searching and experimenting a bit with new 1.7 support for wide chars.

I also read the full thread about its last change:
http://www.cygwin.com/ml/cygwin/2009-05/msg00344.html
which really makes some sense to me (when I create a file from the
console I want "ls" to show back that file to me with same encoding).

Problem is, that unit test assumes filenames are "raw data" and tries to
create three types of filenames: ISO-8859-1, EUC-JP and UTF-8.
Except on OSX where it only tries UTF-8 as that's the disk format.

Now we have an UTF-16 disk format, except the library is using
LANG-value-from-process-start to initialize some LANG-to-UTF16
conversion as far as I understoof so there's not really one "correct"
format: it depends on the LANG env value when the test unit is launched.

OK, that's a side issue since I can probably modify the tests to always
be launched with LANG=C instead of using the current value so that at
least it is consitent. And then maybe remove the creation of ISO-8859-1
and EUC-JP tests just like on OSX. Which could be correct... but a bit
less so than on OSX itself, when that is really "the format" and not the
"the DEFAULT format which could be overridden with a correct setlocale".

But the real problem with that test is not really what shows and how,
the biggest problem is that it seems that filenames created with a
"wrong" filename are quite limited in usage and can't seemingly be deleted.

% export LANG=en_EN.UTF-8
% cat t.c
#include <stdio.h>
int main() {
    fopen("a-\xF6\xE4\xFC\xDF", "w"); //ISO-8859-1
    fopen("b-\xC3\xB6\xC3\xA4\xC3\xBc\xC3\x9F", "w"); //UTF-8
    return 0;
}
% gcc -o t t.c
% mkdir test ; cd test ; ../t ; cd ..
% ls -l test
ls: cannot access test/a-â–’â–’â–’: No such file or directory
total 0
-????????? ? ?    ?    ?                ? a-â–’â–’â–’
-rw-r--r-- 1 lapo None 0 2009-09-10 21:19 b-öäüß
% find test
test
test/a-???
test/b-öäüß
% find test -delete
find: cannot delete `test/a-\366\344\374': No such file or directory
find: cannot delete `test': Directory not empty
% find test
test
test/a-???

Now... I don't know how exactly `find` works but it seems strange to me
it isn't capable of deleting something it is capable of listing.
Also seems strange `ls` is not capable of stat-ing something it's
capable of listing.

Yep, I do know that filename is "broken" in the first place, but since
in the Unix world such stuff can happen as filenames are really raw
data, I think probably an error on file creation would be better than
creating a file that can't be consequently stat-ed or even unlinked.

% cat u.c
#include <stdio.h>
int main() {
    remove("a-\xF6\xE4\xFC\xDF");
    remove("b-\xC3\xB6\xC3\xA4\xC3\xBc\xC3\x9F");
    return 0;
}
% gcc -o u u.c

OK, a program using a similarly-broken filename can delete it, but the
fact it can't be deleted with "normal" tools is a bit of an inconvenience...

-- 
Lapo Luchini - http://lapo.it/

“Premature optimisation is the root of all evil in programming.” (C. A.
R. Hoare)


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-09-25 22:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-10 19:31 [1.7] Invalid UTF8 while creating a file -> cannot delete? Lapo Luchini
2009-09-10 22:12 ` Andy Koppe
2009-09-15 22:38   ` Lapo Luchini
2009-09-21 16:10     ` Corinna Vinschen
2009-09-21 18:54       ` Andy Koppe
2009-09-22  9:45         ` Corinna Vinschen
2009-09-22 16:12           ` Andy Koppe
2009-09-22 17:07             ` Corinna Vinschen
2009-09-23 11:52               ` Andy Koppe
2009-09-23 12:02               ` Corinna Vinschen
2009-09-23 12:35                 ` Andy Koppe
2009-09-23 12:43                   ` Corinna Vinschen
2009-09-23 13:39                     ` Corinna Vinschen
2009-09-23 21:31                       ` Ross Smith
2009-09-25 22:36                         ` Robert Pendell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).