* stat() lstat() not able to read long filename with cyrillic chars?
@ 2015-12-23 19:44 Denis Corbin
2015-12-24 19:24 ` Corinna Vinschen
0 siblings, 1 reply; 5+ messages in thread
From: Denis Corbin @ 2015-12-23 19:44 UTC (permalink / raw)
To: cygwin
Hi,
First, I have read the FAQ and this mailing archive :)
Here is the problem I meet:
In a directory are placed three files using windows 8's explorer:
- a short Cyrillic filename "абваб.txt"
- a long Cyrillic filename
"абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
- a long Latin filename
"ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
From a C program compiled under Cygwin, I can obtain the corresponding
filename strings using readdir_r()...
"\320\260\320\261\320\262\320\260\320\261.txt"
"\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
"abababababaababababa [snipped]"
... but passing these strings in turn to lstat() or stat() returns 0 as
expected for all except for the long Cyrillic filename. For for this
string a get a negative value from lstat() and stat() and errno is set
to ENOENT (while the entry is still present).
using "ls" instead of my own program gives something similar: the long
Cyrillic filename is listed but no permission, username, groupname or
data are displayed, these are replaced by question marks.
Is there something special to do and that I missed in order to read long
Cyrillic filenames from a C program under Cygwin?
Thanks for any help,
Regards,
Denis.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stat() lstat() not able to read long filename with cyrillic chars?
2015-12-23 19:44 stat() lstat() not able to read long filename with cyrillic chars? Denis Corbin
@ 2015-12-24 19:24 ` Corinna Vinschen
2015-12-25 0:05 ` Andrey Repin
0 siblings, 1 reply; 5+ messages in thread
From: Corinna Vinschen @ 2015-12-24 19:24 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]
On Dec 23 20:44, Denis Corbin wrote:
> Hi,
>
> First, I have read the FAQ and this mailing archive :)
>
> Here is the problem I meet:
>
> In a directory are placed three files using windows 8's explorer:
> - a short Cyrillic filename "абваб.txt"
> - a long Cyrillic filename
> "абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
> - a long Latin filename
> "ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
>
>
> >From a C program compiled under Cygwin, I can obtain the corresponding
> filename strings using readdir_r()...
>
> "\320\260\320\261\320\262\320\260\320\261.txt"
> "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
> "abababababaababababa [snipped]"
>
> ... but passing these strings in turn to lstat() or stat() returns 0 as
> expected for all except for the long Cyrillic filename.
NAME_MAX is 255. On Windows this is the number of UTF-16 chars
unfortunately. On POSIX systems (as on Cygwin) this is the number of
bytes. Long UTF-16 strings in cyrillic take twice as much UTF-8 chars
as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into
a maximum of 127 UTF-16 chars.
If you need access to UTF-16 filenames with more characters, you can
switch to a one-byte charset temporarily, e.g.
$ LC_ALL=ru_RU your_app
to switch to iso-8859-5 or
$ LC_ALL=ru_RU.CP1251
to switch to Windows codepage 1251. See
https://cygwin.com/cygwin-ug-net/setup-locale.html
HTH,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stat() lstat() not able to read long filename with cyrillic chars?
2015-12-24 19:24 ` Corinna Vinschen
@ 2015-12-25 0:05 ` Andrey Repin
2015-12-26 21:57 ` Denis Corbin
0 siblings, 1 reply; 5+ messages in thread
From: Andrey Repin @ 2015-12-25 0:05 UTC (permalink / raw)
To: Corinna Vinschen, cygwin
Greetings, Corinna Vinschen!
>> First, I have read the FAQ and this mailing archive :)
>>
>> Here is the problem I meet:
>>
>> In a directory are placed three files using windows 8's explorer:
>> - a short Cyrillic filename "абваб.txt"
>> - a long Cyrillic filename
>> "абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
>> - a long Latin filename
>> "ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
>>
>>
>> >From a C program compiled under Cygwin, I can obtain the corresponding
>> filename strings using readdir_r()...
>>
>> "\320\260\320\261\320\262\320\260\320\261.txt"
>> "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
>> "abababababaababababa [snipped]"
>>
>> ... but passing these strings in turn to lstat() or stat() returns 0 as
>> expected for all except for the long Cyrillic filename.
> NAME_MAX is 255. On Windows this is the number of UTF-16 chars
> unfortunately. On POSIX systems (as on Cygwin) this is the number of
> bytes. Long UTF-16 strings in cyrillic take twice as much UTF-8 chars
> as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into
> a maximum of 127 UTF-16 chars.
Aren't POSIX restrictions are a bit different?
Namely 128 bytes per path element and 4096 bytes for file name?
> If you need access to UTF-16 filenames with more characters, you can
> switch to a one-byte charset temporarily, e.g.
> $ LC_ALL=ru_RU your_app
> to switch to iso-8859-5 or
> $ LC_ALL=ru_RU.CP1251
> to switch to Windows codepage 1251. See
> https://cygwin.com/cygwin-ug-net/setup-locale.html
> HTH,
> Corinna
--
With best regards,
Andrey Repin
Friday, December 25, 2015 03:03:51
Sorry for my terrible english...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stat() lstat() not able to read long filename with cyrillic chars?
2015-12-25 0:05 ` Andrey Repin
@ 2015-12-26 21:57 ` Denis Corbin
2016-01-07 12:48 ` Corinna Vinschen
0 siblings, 1 reply; 5+ messages in thread
From: Denis Corbin @ 2015-12-26 21:57 UTC (permalink / raw)
To: cygwin
On 25/12/2015 01:04, Andrey Repin wrote:
> Greetings, Corinna Vinschen!
>
>>> First, I have read the FAQ and this mailing archive :)
>>>
[..]
>
>> NAME_MAX is 255. On Windows this is the number of UTF-16 chars
>> unfortunately. On POSIX systems (as on Cygwin) this is the
>> number of bytes. Long UTF-16 strings in cyrillic take twice as
>> much UTF-8 chars as it has UTF-16 chars, so NAME_MAX in utf-8
>> cyrillics translates into a maximum of 127 UTF-16 chars.
Ok, I understand. Thanks for your explanation.
>
> Aren't POSIX restrictions are a bit different? Namely 128 bytes
> per path element and 4096 bytes for file name?
Seen the sample file name it seems truncated rather near 256 bytes (~
128 UTF-16 chars) than 4096 bytes...
>
>> If you need access to UTF-16 filenames with more characters, you
>> can switch to a one-byte charset temporarily, e.g.
>
>> $ LC_ALL=ru_RU your_app
>
>> to switch to iso-8859-5 or
>
>> $ LC_ALL=ru_RU.CP1251
>
>> to switch to Windows codepage 1251. See
>> https://cygwin.com/cygwin-ug-net/setup-locale.html
>
>
>> HTH, Corinna
>
>
>
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: stat() lstat() not able to read long filename with cyrillic chars?
2015-12-26 21:57 ` Denis Corbin
@ 2016-01-07 12:48 ` Corinna Vinschen
0 siblings, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-01-07 12:48 UTC (permalink / raw)
To: cygwin
[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]
On Dec 26 22:57, Denis Corbin wrote:
> On 25/12/2015 01:04, Andrey Repin wrote:
> > Greetings, Corinna Vinschen!
> >
> >>> First, I have read the FAQ and this mailing archive :)
> >>>
> [..]
> >
> >> NAME_MAX is 255. On Windows this is the number of UTF-16 chars
> >> unfortunately. On POSIX systems (as on Cygwin) this is the
> >> number of bytes. Long UTF-16 strings in cyrillic take twice as
> >> much UTF-8 chars as it has UTF-16 chars, so NAME_MAX in utf-8
> >> cyrillics translates into a maximum of 127 UTF-16 chars.
>
> Ok, I understand. Thanks for your explanation.
>
> >
> > Aren't POSIX restrictions are a bit different? Namely 128 bytes
> > per path element and 4096 bytes for file name?
>
> Seen the sample file name it seems truncated rather near 256 bytes (~
> 128 UTF-16 chars) than 4096 bytes...
NAME_MAX/_POSIX_NAME_MAX are 255, PATH_MAX/_POSIX_PATH_MAX are 4096.
NAME_MAX defines the maximum length of a path component, PATH_MAX the
maximum length of an entire path. In bytes.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-07 12:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-23 19:44 stat() lstat() not able to read long filename with cyrillic chars? Denis Corbin
2015-12-24 19:24 ` Corinna Vinschen
2015-12-25 0:05 ` Andrey Repin
2015-12-26 21:57 ` Denis Corbin
2016-01-07 12:48 ` Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).