From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from syping.de (unknown [IPv6:2a01:4f8:212:2f1d::2]) by sourceware.org (Postfix) with ESMTPS id CA666385F017 for ; Sat, 26 Jun 2021 01:53:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA666385F017 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=syping.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=syping.de Received: from [192.168.0.9] (ip5f5af97f.dynamic.kabel-deutschland.de [95.90.249.127]) by syping.de (Postfix) with ESMTPSA id 677A6198521B for ; Sat, 26 Jun 2021 03:53:37 +0200 (CEST) To: cygwin@cygwin.com From: Vadim Subject: Cygwin, Unicode and "long" path names Message-ID: <952ad3ba-34f4-c3a4-450c-263b16795c8d@syping.de> Date: Sat, 26 Jun 2021 03:53:29 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=3.4 required=5.0 tests=BAYES_50, BODY_8BITS, KAM_DMARC_STATUS, MAY_BE_FORGED, SPF_FAIL, SPF_HELO_FAIL, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jun 2021 01:53:40 -0000 Ah, this beautiful topic. Windows 7 x64. This is the summary written as post-scriptum, tests and findings below: 1) Cygwin limits individual names to 255 bytes, Windows seems to follow UTF-16 chars and work fine: 256 bytes in 108 characters works. Basically, this becomes a bytes vs characters story. 2) Bash file name auto-expansion detects the file of that name, but it gets truncated to 255 bytes. find's behaviour is the same ("No such file or directory" due to trying to access a non-existing truncated name) 2.1) If you try to correct the above mistake by adding truncated characters, then the program (cat) will complain about "File name too long" 2.2) If there exists a folder with a 255-byte name, equal to the truncated name, then "find ." will do a listing on that folder twice (effectively hiding the long-named folder from tools without leaving an error message) 3) UNC Paths get the same treatment: File name too long. I expected Cygwin to handle these names without problems just like Windows, Explorer, cmd etc. do. Is this particular problem new or known? All I could find on the mailing list is around the time when Cygwin hadn't yet implemented Unicode support (UTF-8?), ~2004-2008. These names were created by youtube-dl.exe executed from within Cygwin. - Vadim --- This file name is 255 bytes long and works: s123點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt This is 256 bytes and works perfectly normal in Windows (explorer, can paste and "dir " in cmd despite showing [] block chars), but not Cygwin terminal (I used s123/s1234 as a prefix for easy auto-expansion): s1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt If I try to use tab-expansion in the terminal (mintty, bash) the problem becomes apparent ("xt" missing at the end): $ cat s1234點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.t cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.t': No such file or directory However, with one fewer byte it expands properly: $ cat s123點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃\ 民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt hello MAX_PATH? Yes, 255 bytes. Why then does the full file/folder name work in Windows? This is the full name (a folder), 257 bytes: 20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞 And it can get longer! In fact, I can bump the total path to 396 bytes or "Column 249" as Notepad++ counts the characters (individual folder name is 359b or 211 chars, "column 212"): D:/abcdefgh/Local_TEMP/cygwinunicode/1_123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789020210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞 NTFS allows up to 255 UTF-16 for an individual path segment and this seems to align with the Windows tooling: cmd and Explorer can browse these fine, but the included file in the folder spills beyond the limit and you run into the usual 'total path too long' problem). Whether you manually add the missing "xt" to the tab-completion or use UNC paths, the result is the same: $ cat s1234點半蘋果新聞報道\ 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場\ O記轉介律政司︱新巴車長被判不小心駕駛罪成 ︱深圳賽格大樓離奇劇晃\ 民眾慌忙逃走︱蘋果日報\ Apple\ Daily\ #香港新聞.txt cat: 's1234點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long $ cat '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt' cat: '\\?\D:\abcdefgh\Local_TEMP\cygwinunicode\20210518_9點半蘋果新聞報道 字幕版重溫(2021年5月18日)︱蔡展鵬光顧賣淫骨場 O記轉介律政司︱新巴車長被判不小心駕駛罪成︱深圳賽格大樓離奇劇晃 民眾慌忙逃走︱蘋果日報 Apple Daily #香港新聞.txt': File name too long