From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id 72A80385800E for ; Mon, 5 Jul 2021 10:04:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 72A80385800E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 165A4Tag048312 for ; Mon, 5 Jul 2021 03:04:29 -0700 (PDT) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdzhd48L; Mon Jul 5 03:04:22 2021 Subject: Re: getclip and putclip garble unicode characters From: Mark Geisert To: "cygwin@cygwin.com" References: Message-ID: Date: Mon, 5 Jul 2021 03:04:21 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jul 2021 10:04:32 -0000 Replying to myself... Mark Geisert wrote: > Hi Leonid (?), > > Миронов Леонид Владимирович via Cygwin wrote: >> getclip and putclip from cygutils-extra garble unicode characters: non-latin >> characters copied to clipboard in windows are replaced with question marks when >> retrieved with getclip in cygwin, and non-latin characters copied to clipboard >> using putclip are pasted it in windows looking like utf-8 displayed in cp1252 >> but can be retrieved with getclip exactly as pasted, so it looks like the >> problem is not in the way the data is copied but in the way cygwin and windows >> communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI >> codepage is set to cp1251 - 1251, not 1252. > > Thanks for the report.  I will investigate. I believe I have a local testcase similar to your report: If I select a region of text on a message displayed from the Cygwin mailing list digest, and that message has Cyrillic characters in it, getclip replaces those characters with '?' on output. Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that as well and see that here UTF-8 is output and the Cyrillic characters are intact. So I've modified getclip to understand what MS calls CF_UNICODETEXT from the clipboard and have it converted to UTF-8 for output. Thus my new getclip can duplicate what the alternative does. (What getclip could understand previously was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format that makes your putclip + getclip example work)). How about I generate a test version of the cygutils package with this updated getclip and you can see if it solves your issue? Stay tuned, ..mark