From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out-so.shaw.ca (smtp-out-so.shaw.ca [64.59.136.137]) by sourceware.org (Postfix) with ESMTPS id BB5AC3857812; Tue, 2 Mar 2021 04:27:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BB5AC3857812 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=SystematicSw.ab.ca Authentication-Results: sourceware.org; spf=none smtp.mailfrom=brian.inglis@systematicsw.ab.ca Received: from [192.168.1.104] ([68.147.0.90]) by shaw.ca with ESMTP id GwczldU0LnRGtGwd0lnJ7O; Mon, 01 Mar 2021 21:27:22 -0700 X-Authority-Analysis: v=2.4 cv=cagXElPM c=1 sm=1 tr=0 ts=603dbeaa a=T+ovY1NZ+FAi/xYICV7Bgg==:117 a=T+ovY1NZ+FAi/xYICV7Bgg==:17 a=IkcTkHD0fZMA:10 a=w_pzkKWiAAAA:8 a=te1EGT4yAAAA:8 a=8pif782wAAAA:8 a=TImcKGuyeGIbufSLrCcA:9 a=QEXdDO2ut3YA:10 a=zvuQ_vupYScA:10 a=bG9rKQxxVKkA:10 a=sRI3_1zDfAgwuvI8zelB:22 a=RRElR4r2U1jGY2dU47NL:22 Reply-To: Brian.Inglis@SystematicSw.ab.ca To: cygwin@cygwin.com, cygwin-apps@cygwin.com References: From: Brian Inglis Organization: Systematic Software Subject: Re: cygutils cygstart displays PUA code points in messages when wild cards not found Message-ID: <0ad0c6fe-c5ea-51dc-f612-c0563d928547@SystematicSw.ab.ca> Date: Mon, 1 Mar 2021 21:27:21 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-CA Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4xfOqi2rdZHKiaXJlDGVFuhsfWdTqFkRuFk3WDaKxpH8tZ03sAjHn4ZwzaZS85LQxiRzOEMbxtpQPzCInkn26hinyKGEHM/EhWoIpqzud/pXemCj8xsmZr spHZUbMWTUtHlPNSaaqMPXI2NlT+ZIWam1UF/lNMII0wco7FAI6T0RClbpj1ZtR+rbUREqr6/2vAn5b867aNlAJ4Gzmw2JcV/npbjw2qdlq3N1i2gHhYblJK X-Spam-Status: No, score=2.4 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Mar 2021 04:27:25 -0000 On 2021-03-01 08:06, Brian Inglis wrote: > On 2021-03-01 04:17, John Vincent via Cygwin wrote: >> I'm running cygwin on Windows 10, using UTF8 in English. I run cygwin bash >> inside a cygwin mintty terminal. I've noticed a minor problem when using >> cygstart with wildcard parameters. >> I type: >>     $ cygstart *.??p >> If there is a matching file then everything works as I expect. However if >> there is no matching file I get an error message as follows: >> Unable to start '.p': The specified file was not found. >> When I look at this using the "od" command I see the following: >> $ cygstart *.??p 2>&1 | od -tx1 -c >> 0000000  55  6e  61  62  6c  65  20  74  6f  20  73  74  61  72  74  20 >>           U   n   a   b   l   e       t   o       s   t   a   r   t >> 0000020  27  ef  80  aa  2e  ef  80  bf  ef  80  bf  70  27  3a  20  54 >>           ' 357 200 252   . 357 200 277 357 200 277   p   '   :       T >> 0000040  68  65  20  73  70  65  63  69  66  69  65  64  20  66  69  6c >>           h   e       s   p   e   c   i   f   i   e   d       f   i   l >> 0000060  65  20  77  61  73  20  6e  6f  74  20  66  6f  75  6e  64  2e >>           e       w   a   s       n   o   t       f   o   u   n   d   . >> 0000100  0a >>          \n >> It looks to me like cygstart is not outputting the correct UTF-8 for either >> the * character or the ? character. I think this is a bug. > To support POSIX path names, Cygwin allows any characters other than \0 and /, > so it maps Windows special characters into the UTF-8 BMP PUA: > > https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-specialchars > > http://www.unicode.org/faq/private_use.html > > https://en.wikipedia.org/wiki/Private_Use_Areas > > It may also prefix unsupported codes in a code page with CAN/0x18. > > The bug is in displaying in the error message the remapped string with > undisplayable PUA characters, rather than either the reverse mapped string or > the original input path name. As above and: $ cygstart ?*?.log Unable to start '.log': The specified file was not found. $ cygstart ?*?.log |& xxd -g1 00000000: 55 6e 61 62 6c 65 20 74 6f 20 73 74 61 72 74 20 Unable to start 00000010: 27 ef 80 bf ef 80 aa ef 80 bf 2e 6c 6f 67 27 3a '..........log': 00000020: 20 54 68 65 20 73 70 65 63 69 66 69 65 64 20 66 The specified f 00000030: 69 6c 65 20 77 61 73 20 6e 6f 74 20 66 6f 75 6e ile was not foun 00000040: 64 2e 0a d.. ?*? 0x3f2a3f --> 0xf03f 0xf02a 0xf03f -> 0xef 0x80 0xbf 0xef 0x80 0xaa 0xef 0x80 0xbf -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.]