public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* stat() lstat() not able to read long filename with cyrillic chars?
@ 2015-12-23 19:44 Denis Corbin
  2015-12-24 19:24 ` Corinna Vinschen
  0 siblings, 1 reply; 5+ messages in thread
From: Denis Corbin @ 2015-12-23 19:44 UTC (permalink / raw)
  To: cygwin

Hi,

First, I have read the FAQ and this mailing archive :)

Here is the problem I meet:

In a directory are placed three files using windows 8's explorer:
- a short Cyrillic filename "абваб.txt"
- a long Cyrillic filename
"абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
- a long Latin filename
"ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"


From a C program compiled under Cygwin, I can obtain the corresponding
filename strings using readdir_r()...

"\320\260\320\261\320\262\320\260\320\261.txt"
"\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
"abababababaababababa [snipped]"

... but passing these strings in turn to lstat() or stat() returns 0 as
expected for all except for the long Cyrillic filename. For for this
string a get a negative value from lstat() and stat() and errno is set
to ENOENT (while the entry is still present).

using "ls" instead of my own program gives something similar: the long
Cyrillic filename is listed but no permission, username, groupname or
data are displayed, these are replaced by question marks.

Is there something special to do and that I missed in order to read long
Cyrillic filenames from a C program under Cygwin?

Thanks for any help,

Regards,
Denis.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat() lstat() not able to read long filename with cyrillic chars?
  2015-12-23 19:44 stat() lstat() not able to read long filename with cyrillic chars? Denis Corbin
@ 2015-12-24 19:24 ` Corinna Vinschen
  2015-12-25  0:05   ` Andrey Repin
  0 siblings, 1 reply; 5+ messages in thread
From: Corinna Vinschen @ 2015-12-24 19:24 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]

On Dec 23 20:44, Denis Corbin wrote:
> Hi,
> 
> First, I have read the FAQ and this mailing archive :)
> 
> Here is the problem I meet:
> 
> In a directory are placed three files using windows 8's explorer:
> - a short Cyrillic filename "абваб.txt"
> - a long Cyrillic filename
> "абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
> - a long Latin filename
> "ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
> 
> 
> >From a C program compiled under Cygwin, I can obtain the corresponding
> filename strings using readdir_r()...
> 
> "\320\260\320\261\320\262\320\260\320\261.txt"
> "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
> "abababababaababababa [snipped]"
> 
> ... but passing these strings in turn to lstat() or stat() returns 0 as
> expected for all except for the long Cyrillic filename.

NAME_MAX is 255.  On Windows this is the number of UTF-16 chars
unfortunately.  On POSIX systems (as on Cygwin) this is the number of
bytes.  Long UTF-16 strings in cyrillic take twice as much UTF-8 chars
as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into
a maximum of 127 UTF-16 chars.

If you need access to UTF-16 filenames with more characters, you can
switch to a one-byte charset temporarily, e.g.

  $ LC_ALL=ru_RU your_app

to switch to iso-8859-5 or

  $ LC_ALL=ru_RU.CP1251

to switch to Windows codepage 1251.  See
https://cygwin.com/cygwin-ug-net/setup-locale.html


HTH,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat() lstat() not able to read long filename with cyrillic chars?
  2015-12-24 19:24 ` Corinna Vinschen
@ 2015-12-25  0:05   ` Andrey Repin
  2015-12-26 21:57     ` Denis Corbin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrey Repin @ 2015-12-25  0:05 UTC (permalink / raw)
  To: Corinna Vinschen, cygwin

Greetings, Corinna Vinschen!

>> First, I have read the FAQ and this mailing archive :)
>> 
>> Here is the problem I meet:
>> 
>> In a directory are placed three files using windows 8's explorer:
>> - a short Cyrillic filename "абваб.txt"
>> - a long Cyrillic filename
>> "абвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабвабваб.txt"
>> - a long Latin filename
>> "ababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababa.txt"
>> 
>> 
>> >From a C program compiled under Cygwin, I can obtain the corresponding
>> filename strings using readdir_r()...
>> 
>> "\320\260\320\261\320\262\320\260\320\261.txt"
>> "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snipped]"
>> "abababababaababababa [snipped]"
>> 
>> ... but passing these strings in turn to lstat() or stat() returns 0 as
>> expected for all except for the long Cyrillic filename.

> NAME_MAX is 255.  On Windows this is the number of UTF-16 chars
> unfortunately.  On POSIX systems (as on Cygwin) this is the number of
> bytes.  Long UTF-16 strings in cyrillic take twice as much UTF-8 chars
> as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into
> a maximum of 127 UTF-16 chars.

Aren't POSIX restrictions are a bit different?
Namely 128 bytes per path element and 4096 bytes for file name?

> If you need access to UTF-16 filenames with more characters, you can
> switch to a one-byte charset temporarily, e.g.

>   $ LC_ALL=ru_RU your_app

> to switch to iso-8859-5 or

>   $ LC_ALL=ru_RU.CP1251

> to switch to Windows codepage 1251.  See
> https://cygwin.com/cygwin-ug-net/setup-locale.html


> HTH,
> Corinna



-- 
With best regards,
Andrey Repin
Friday, December 25, 2015 03:03:51

Sorry for my terrible english...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat() lstat() not able to read long filename with cyrillic chars?
  2015-12-25  0:05   ` Andrey Repin
@ 2015-12-26 21:57     ` Denis Corbin
  2016-01-07 12:48       ` Corinna Vinschen
  0 siblings, 1 reply; 5+ messages in thread
From: Denis Corbin @ 2015-12-26 21:57 UTC (permalink / raw)
  To: cygwin

On 25/12/2015 01:04, Andrey Repin wrote:
> Greetings, Corinna Vinschen!
> 
>>> First, I have read the FAQ and this mailing archive :)
>>> 
[..]
> 
>> NAME_MAX is 255.  On Windows this is the number of UTF-16 chars 
>> unfortunately.  On POSIX systems (as on Cygwin) this is the 
>> number of bytes.  Long UTF-16 strings in cyrillic take twice as 
>> much UTF-8 chars as it has UTF-16 chars, so NAME_MAX in utf-8 
>> cyrillics translates into a maximum of 127 UTF-16 chars.

Ok, I understand. Thanks for your explanation.

> 
> Aren't POSIX restrictions are a bit different? Namely 128 bytes
> per path element and 4096 bytes for file name?

Seen the sample file name it seems truncated rather near 256 bytes (~
128 UTF-16 chars) than 4096 bytes...

> 
>> If you need access to UTF-16 filenames with more characters, you 
>> can switch to a one-byte charset temporarily, e.g.
> 
>> $ LC_ALL=ru_RU your_app
> 
>> to switch to iso-8859-5 or
> 
>> $ LC_ALL=ru_RU.CP1251
> 
>> to switch to Windows codepage 1251.  See 
>> https://cygwin.com/cygwin-ug-net/setup-locale.html
> 
> 
>> HTH, Corinna
> 
> 
> 

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: stat() lstat() not able to read long filename with cyrillic chars?
  2015-12-26 21:57     ` Denis Corbin
@ 2016-01-07 12:48       ` Corinna Vinschen
  0 siblings, 0 replies; 5+ messages in thread
From: Corinna Vinschen @ 2016-01-07 12:48 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Dec 26 22:57, Denis Corbin wrote:
> On 25/12/2015 01:04, Andrey Repin wrote:
> > Greetings, Corinna Vinschen!
> > 
> >>> First, I have read the FAQ and this mailing archive :)
> >>> 
> [..]
> > 
> >> NAME_MAX is 255.  On Windows this is the number of UTF-16 chars 
> >> unfortunately.  On POSIX systems (as on Cygwin) this is the 
> >> number of bytes.  Long UTF-16 strings in cyrillic take twice as 
> >> much UTF-8 chars as it has UTF-16 chars, so NAME_MAX in utf-8 
> >> cyrillics translates into a maximum of 127 UTF-16 chars.
> 
> Ok, I understand. Thanks for your explanation.
> 
> > 
> > Aren't POSIX restrictions are a bit different? Namely 128 bytes
> > per path element and 4096 bytes for file name?
> 
> Seen the sample file name it seems truncated rather near 256 bytes (~
> 128 UTF-16 chars) than 4096 bytes...

NAME_MAX/_POSIX_NAME_MAX are 255, PATH_MAX/_POSIX_PATH_MAX are 4096.
NAME_MAX defines the maximum length of a path component, PATH_MAX the
maximum length of an entire path.  In bytes.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-07 12:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-23 19:44 stat() lstat() not able to read long filename with cyrillic chars? Denis Corbin
2015-12-24 19:24 ` Corinna Vinschen
2015-12-25  0:05   ` Andrey Repin
2015-12-26 21:57     ` Denis Corbin
2016-01-07 12:48       ` Corinna Vinschen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).