public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* getclip and putclip garble unicode characters
@ 2021-06-23 13:45 Миронов Леонид Владимирович
  2021-06-23 22:27 ` Mark Geisert
  2021-06-24  6:35 ` Andrey Repin
  0 siblings, 2 replies; 7+ messages in thread
From: Миронов Леонид Владимирович @ 2021-06-23 13:45 UTC (permalink / raw)
  To: cygwin

getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: getclip and putclip garble unicode characters
  2021-06-23 13:45 getclip and putclip garble unicode characters Миронов Леонид Владимирович
@ 2021-06-23 22:27 ` Mark Geisert
  2021-07-05 10:04   ` Mark Geisert
  2021-06-24  6:35 ` Andrey Repin
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Geisert @ 2021-06-23 22:27 UTC (permalink / raw)
  To: cygwin

Hi Leonid (?),

Миронов Леонид Владимирович via Cygwin wrote:
> getclip and putclip from cygutils-extra garble unicode characters: non-latin characters copied to clipboard in windows are replaced with question marks when retrieved with getclip in cygwin, and non-latin characters copied to clipboard using putclip are pasted it in windows looking like utf-8 displayed in cp1252 but can be retrieved with getclip exactly as pasted, so it looks like the problem is not in the way the data is copied but in the way cygwin and windows communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.

Thanks for the report.  I will investigate.

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: getclip and putclip garble unicode characters
  2021-06-23 13:45 getclip and putclip garble unicode characters Миронов Леонид Владимирович
  2021-06-23 22:27 ` Mark Geisert
@ 2021-06-24  6:35 ` Andrey Repin
  2021-06-25  9:00   ` Миронов Леонид Владимирович
  2021-06-25 18:01   ` Thomas Wolff
  1 sibling, 2 replies; 7+ messages in thread
From: Andrey Repin @ 2021-06-24  6:35 UTC (permalink / raw)
  To: Миронов
	Леонид
	Владимирович,
	cygwin

Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with
> question marks when retrieved with getclip in cygwin, and non-latin
> characters copied to clipboard using putclip are pasted it in windows
> looking like utf-8 displayed in cp1252 but can be retrieved with getclip
> exactly as pasted, so it looks like the problem is not in the way the data
> is copied but in the way cygwin and windows communicate text encoding to
> each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode
clipboard. To achieve better results, switch your input language/keyboard to
matching language before copying text from application. I.e. switch to
Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


-- 
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: getclip and putclip garble unicode characters
  2021-06-24  6:35 ` Andrey Repin
@ 2021-06-25  9:00   ` Миронов Леонид Владимирович
  2021-06-25 18:01   ` Thomas Wolff
  1 sibling, 0 replies; 7+ messages in thread
From: Миронов Леонид Владимирович @ 2021-06-25  9:00 UTC (permalink / raw)
  To: cygwin

As far as copying from cygwin to windows is concerned, it happens in exactly the same way in all windows programs I tried pasting data to - word, outlook, chrome, console, you name it. Changing windows keyboard language has no effect either, windows still stubbornly treats clipboard contents as cp1252 (don't quite see how it is supposed to help - data on the clipboard is not limited to one single-byte codepage anyway). 

At first I missed that when copying from windows to cygwin getclip actually gets data in cp1251 (windows ANSI codepage), thus cyrillic characters can be at least recovered with iconv, but non-cyrillic non-latin characters - e.g. greek, are replaced with question marks and are lost although in windows everything can be pasted back without issues, again regardless of the program and keyboard language.

So in a nutshell, when copy-pasting from cygwin putclip to windows unicode is treated as cp1252 while copy-pasting from windows to cygwin getclip unicode is treated as cp1251.

Sorry for top-posting.

-----Original Message-----
From: Andrey Repin <anrdaemon@yandex.ru> 
Sent: Thursday, June 24, 2021 9:36 AM
To: Миронов Леонид Владимирович <lv.mironov@severstal.com>; cygwin@cygwin.com
Subject: Re: getclip and putclip garble unicode characters

Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with 
> question marks when retrieved with getclip in cygwin, and non-latin 
> characters copied to clipboard using putclip are pasted it in windows 
> looking like utf-8 displayed in cp1252 but can be retrieved with 
> getclip exactly as pasted, so it looks like the problem is not in the 
> way the data is copied but in the way cygwin and windows communicate 
> text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


--
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: getclip and putclip garble unicode characters
  2021-06-24  6:35 ` Andrey Repin
  2021-06-25  9:00   ` Миронов Леонид Владимирович
@ 2021-06-25 18:01   ` Thomas Wolff
  2021-06-25 18:54     ` Brian Inglis
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Wolff @ 2021-06-25 18:01 UTC (permalink / raw)
  To: cygwin



Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:
> Greetings, Миронов Леонид Владимирович!
>
>> getclip and putclip from cygutils-extra garble unicode characters:
>> non-latin characters copied to clipboard in windows are replaced with
>> question marks when retrieved with getclip in cygwin, and non-latin
>> characters copied to clipboard using putclip are pasted it in windows
>> looking like utf-8 displayed in cp1252 but can be retrieved with getclip
>> exactly as pasted, so it looks like the problem is not in the way the data
>> is copied but in the way cygwin and windows communicate text encoding to
>> each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.
> This looks like you are using a program incapable of dealing with unicode
> clipboard. To achieve better results, switch your input language/keyboard to
> matching language before copying text from application. I.e. switch to
> Russian then copy text, then check what is returned by getclip.
> But then, why LC_CTYPE is en_US?
getclip and putclip are just broken, they don't even work in a pure 
UTF-8 environment.
Already noticed 9 years ago... 
https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html
including a script-based replacement.
Thomas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: getclip and putclip garble unicode characters
  2021-06-25 18:01   ` Thomas Wolff
@ 2021-06-25 18:54     ` Brian Inglis
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Inglis @ 2021-06-25 18:54 UTC (permalink / raw)
  To: cygwin

On 2021-06-25 12:01, Thomas Wolff wrote:
> Am 24.06.2021 um 08:35 schrieb Andrey Repin via Cygwin:
>> Greetings, Миронов Леонид Владимирович!
>>> getclip and putclip from cygutils-extra garble unicode characters:
>>> non-latin characters copied to clipboard in windows are replaced with
>>> question marks when retrieved with getclip in cygwin, and non-latin
>>> characters copied to clipboard using putclip are pasted it in windows
>>> looking like utf-8 displayed in cp1252 but can be retrieved with getclip
>>> exactly as pasted, so it looks like the problem is not in the way the 
>>> data
>>> is copied but in the way cygwin and windows communicate text encoding to
>>> each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to 
>>> cp1251 - 1251, not 1252.
>> This looks like you are using a program incapable of dealing with unicode
>> clipboard. To achieve better results, switch your input 
>> language/keyboard to
>> matching language before copying text from application. I.e. switch to
>> Russian then copy text, then check what is returned by getclip.
>> But then, why LC_CTYPE is en_US?
> getclip and putclip are just broken, they don't even work in a pure 
> UTF-8 environment.
> Already noticed 9 years ago... 
> https://sourceware.org/legacy-ml/cygwin/2012-03/msg00648.html
> including a script-based replacement.

Just cat [<>] /dev/clipboard: recent Windows changes may have affected 
Windows<->X copy and paste transparency.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: getclip and putclip garble unicode characters
  2021-06-23 22:27 ` Mark Geisert
@ 2021-07-05 10:04   ` Mark Geisert
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Geisert @ 2021-07-05 10:04 UTC (permalink / raw)
  To: cygwin

Replying to myself...

Mark Geisert wrote:
> Hi Leonid (?),
> 
> Миронов Леонид Владимирович via Cygwin wrote:
>> getclip and putclip from cygutils-extra garble unicode characters: non-latin 
>> characters copied to clipboard in windows are replaced with question marks when 
>> retrieved with getclip in cygwin, and non-latin characters copied to clipboard 
>> using putclip are pasted it in windows looking like utf-8 displayed in cp1252 
>> but can be retrieved with getclip exactly as pasted, so it looks like the 
>> problem is not in the way the data is copied but in the way cygwin and windows 
>> communicate text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI 
>> codepage is set to cp1251 - 1251, not 1252.
> 
> Thanks for the report.  I will investigate.

I believe I have a local testcase similar to your report: If I select a region of 
text on a message displayed from the Cygwin mailing list digest, and that message 
has Cyrillic characters in it, getclip replaces those characters with '?' on output.

Since Thomas suggested an alternative, using 'cat < /dev/clipboard', I tried that 
as well and see that here UTF-8 is output and the Cyrillic characters are intact.

So I've modified getclip to understand what MS calls CF_UNICODETEXT from the 
clipboard and have it converted to UTF-8 for output.  Thus my new getclip can 
duplicate what the alternative does.  (What getclip could understand previously 
was CF_TEXT ("normal" ANSI characters) or CYGWIN_NATIVE (an internal Cygwin format 
that makes your putclip + getclip example work)).

How about I generate a test version of the cygutils package with this updated 
getclip and you can see if it solves your issue?
Stay tuned,

..mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-05 10:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23 13:45 getclip and putclip garble unicode characters Миронов Леонид Владимирович
2021-06-23 22:27 ` Mark Geisert
2021-07-05 10:04   ` Mark Geisert
2021-06-24  6:35 ` Andrey Repin
2021-06-25  9:00   ` Миронов Леонид Владимирович
2021-06-25 18:01   ` Thomas Wolff
2021-06-25 18:54     ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).