* non-BMP character width @ 2009-09-16 11:48 Thomas Wolff 2009-09-21 16:34 ` Corinna Vinschen 0 siblings, 1 reply; 8+ messages in thread From: Thomas Wolff @ 2009-09-16 11:48 UTC (permalink / raw) To: cygwin Hi, I see one small remaining glitch with Unicode display; non-BMP characters (those with Unicode value > 0xFFFF) are displayed as two boxes. The reason is probably related to their representation as two surrogates at some point. I do not expect to have visible display of non-BMP in the cygwin console, esp. as the two available console fonts, Raster and Lucida, don't support them anyway. But they should at least have the proper width, i.e. one box instead of two. Kink regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-16 11:48 non-BMP character width Thomas Wolff @ 2009-09-21 16:34 ` Corinna Vinschen 2009-09-21 16:53 ` Lapo Luchini 0 siblings, 1 reply; 8+ messages in thread From: Corinna Vinschen @ 2009-09-21 16:34 UTC (permalink / raw) To: cygwin On Sep 16 13:48, Thomas Wolff wrote: > Hi, > I see one small remaining glitch with Unicode display; non-BMP characters > (those with Unicode value > 0xFFFF) are displayed as two boxes. > The reason is probably related to their representation as two > surrogates at some point. > I do not expect to have visible display of non-BMP in the cygwin > console, esp. as the two available console fonts, Raster and Lucida, > don't support them anyway. But they should at least have the proper > width, i.e. one box instead of two. Can you please create a simple self-contained testcase? I'm not exactly sure how this is supposed to work and if a solution exists. Is that a problem for the non-UTF-8 case, too, or for UTF-8 only? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-21 16:34 ` Corinna Vinschen @ 2009-09-21 16:53 ` Lapo Luchini 2009-09-21 17:58 ` Corinna Vinschen 0 siblings, 1 reply; 8+ messages in thread From: Lapo Luchini @ 2009-09-21 16:53 UTC (permalink / raw) To: cygwin Corinna Vinschen wrote: > On Sep 16 13:48, Thomas Wolff wrote: >> Hi, >> I see one small remaining glitch with Unicode display; non-BMP characters >> (those with Unicode value > 0xFFFF) are displayed as two boxes. > > Can you please create a simple self-contained testcase? I'm not exactly > sure how this is supposed to work and if a solution exists. Is that a > problem for the non-UTF-8 case, too, or for UTF-8 only? I guess he meant anything like U+10001, which seems to be assigned to linear-B charset on the DecodeUnicode database: ð = http://www.decodeunicode.org/U+10001 UTF-8 as F0 90 80 81 Or this (Iguess that's traditional Chinese?) example taken from en.wiki: 𤢠= http://www.decodeunicode.org/U+24B62 UTF-8 as F0 A4 AD A2 -- Lapo Luchini - http://lapo.it/ -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-21 16:53 ` Lapo Luchini @ 2009-09-21 17:58 ` Corinna Vinschen 2009-09-22 4:57 ` Lapo Luchini 0 siblings, 1 reply; 8+ messages in thread From: Corinna Vinschen @ 2009-09-21 17:58 UTC (permalink / raw) To: cygwin On Sep 21 18:52, Lapo Luchini wrote: > Corinna Vinschen wrote: > > On Sep 16 13:48, Thomas Wolff wrote: > >> Hi, > >> I see one small remaining glitch with Unicode display; non-BMP characters > >> (those with Unicode value > 0xFFFF) are displayed as two boxes. > > > > Can you please create a simple self-contained testcase? I'm not exactly > > sure how this is supposed to work and if a solution exists. Is that a > > problem for the non-UTF-8 case, too, or for UTF-8 only? > > I guess he meant anything like U+10001, which seems to be assigned to > linear-B charset on the DecodeUnicode database: Sure. I was specificially asking for a testcase, preferrably in plain C, which allows to reproduce this under a debugger. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-21 17:58 ` Corinna Vinschen @ 2009-09-22 4:57 ` Lapo Luchini 2009-09-22 9:57 ` Corinna Vinschen 2009-09-24 15:14 ` Thomas Wolff 0 siblings, 2 replies; 8+ messages in thread From: Lapo Luchini @ 2009-09-22 4:57 UTC (permalink / raw) To: [ML] CygWin , Thomas Wolff Corinna Vinschen wrote: > Sure. I was specificially asking for a testcase, preferrably in > plain C, which allows to reproduce this under a debugger. Actually, I can't reproduce that, but I guess it's a problem of the specific console he's using (Thomas, which one is that?): on mintty it works ok (I'm not really sure it outputs U+10001, but it surely shows a single box) and on rxvt it just shows as four ISO-8859-1 chars: (es expected, as native rxvt doesn't support Unicode) mintty% echo "-\xF0\x90\x80\x81-" -�- rxvt% echo "-\xF0\x90\x80\x81-" -ðÂâ¬Â- Also ok on `ls`: % cat s.c int main() { fopen("a-\xF0\x90\x80\x81", "w"); return 0; } % ./s % ls -l|fgrep a- -rw-r--r-- 1 lapo None 0 22 Sep 06:50 a-� -- Lapo Luchini - http://lapo.it/ âThe future is not google-able.â (William Gibson, 2004-02-05) -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-22 4:57 ` Lapo Luchini @ 2009-09-22 9:57 ` Corinna Vinschen 2009-09-24 15:14 ` Thomas Wolff 1 sibling, 0 replies; 8+ messages in thread From: Corinna Vinschen @ 2009-09-22 9:57 UTC (permalink / raw) To: cygwin On Sep 22 06:57, Lapo Luchini wrote: > Corinna Vinschen wrote: > > Sure. I was specificially asking for a testcase, preferrably in > > plain C, which allows to reproduce this under a debugger. > > Actually, I can't reproduce that, but I guess it's a problem of the > specific console he's using (Thomas, which one is that?): on mintty it > works ok (I'm not really sure it outputs U+10001, but it surely shows a > single box) and on rxvt it just shows as four ISO-8859-1 chars: > (es expected, as native rxvt doesn't support Unicode) > > mintty% echo "-\xF0\x90\x80\x81-" > -???- > rxvt% echo "-\xF0\x90\x80\x81-" > -ðÂ???Â- > > Also ok on `ls`: > > % cat s.c > int main() { > fopen("a-\xF0\x90\x80\x81", "w"); > return 0; > } > % ./s > % ls -l|fgrep a- > -rw-r--r-- 1 lapo None 0 22 Sep 06:50 a-??? Uh, I see. That occurs in the normal Windows console. This is not Cygwin's fault. Cygwin's console code converts the multibyte string to the WCHAR representation and prints it to the console using the WriteConsoleW function. That function prints two blocks/question marks for a surrogate pair. Look at the file in a cmd shell, it will also print two blocks/question marks for the surrogate pair. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-22 4:57 ` Lapo Luchini 2009-09-22 9:57 ` Corinna Vinschen @ 2009-09-24 15:14 ` Thomas Wolff 2009-09-24 15:33 ` andy.koppe 1 sibling, 1 reply; 8+ messages in thread From: Thomas Wolff @ 2009-09-24 15:14 UTC (permalink / raw) To: cygwin Corinna Vinschen wrote: > Can you please create a simple self-contained testcase? I'm not exactly > sure how this is supposed to work and if a solution exists. Is that a > problem for the non-UTF-8 case, too, or for UTF-8 only? Sorry for the late response; I see you reproduced the case meanwhile - anyway, here is a test case, to be used with gcc or just with cat: /* print U+20000 ð */ int main () { printf ("<U+20000> is <ð >\n"); } where you could enter the character in mined with Control-V #20000 Enter :) About non-UTF-8, I tried to test in Big5, using character 0x8750 which is U+242BF, and the test suggests it's OK (in cygwin console, mintty, and rxvt-unicode); however, that may not be significant since although its Unicode code point is non-BMP, the Big5 character is only 16 bits and Windows, having supported CJK before Unicode, probably doesn't handle this via Unicode. I also tried to test eucJP, but that doesn't seem to work at all and mintty crashes... See my other comment below, please. On Sep 22 06:57, Lapo Luchini wrote: > ... > Actually, I can't reproduce that, but I guess it's a problem of the > specific console he's using (Thomas, which one is that?): on mintty it > works ok (I'm not really sure it outputs U+10001, but it surely shows a > single box)... The problem used to be in mintty as well until I pointed it out and Andy was so ambitious to find a workaround - maybe he could supply a code snipplet which would fix this in the cygwin console too, despite the bug origin being in the Windows API... > and on rxvt it just shows as four ISO-8859-1 chars: > (es expected, as native rxvt doesn't support Unicode) You would have to test this with rxvt-unicode (urxvt in cygwin) where the test case passes (one box). (Not very relevant maybe, if reports are true that rxvt is not maintained anymore.) Corinna wrote: > > ... > Uh, I see. That occurs in the normal Windows console. This is not > Cygwin's fault. Cygwin's console code converts the multibyte string to > the WCHAR representation and prints it to the console using the > WriteConsoleW function. That function prints two blocks/question marks > for a surrogate pair. Look at the file in a cmd shell, it will also > print two blocks/question marks for the surrogate pair. I was assuming that, like for mintty, the fault was not in the cygwin domain, however, as there is a workaround, I thought it would be nice for the cygwin console as well. Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: non-BMP character width 2009-09-24 15:14 ` Thomas Wolff @ 2009-09-24 15:33 ` andy.koppe 0 siblings, 0 replies; 8+ messages in thread From: andy.koppe @ 2009-09-24 15:33 UTC (permalink / raw) To: cygwin 2009/9/24 Thomas Wolff: > I also tried to test eucJP, but that doesn't seem to work at all and mintty crashes... Ouch. Details? > The problem used to be in mintty as well until I pointed it out and > Andy was so ambitious to find a workaround Yep, given a font that actually supports them, e.g. SimSunExtB, non-BMP chars should display correctly in mintty 0.5. > - maybe he could supply a > code snipplet which would fix this in the cygwin console too, despite > the bug origin being in the Windows API... 'fraid not. Mintty uses the Win32 GUI function ExtTextOut to paint characters in its window, and that function does support surrogates. The Cygwin DLL uses WriteConsole, which apparently doesn't support them, and only MS can change that. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-09-24 15:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-09-16 11:48 non-BMP character width Thomas Wolff 2009-09-21 16:34 ` Corinna Vinschen 2009-09-21 16:53 ` Lapo Luchini 2009-09-21 17:58 ` Corinna Vinschen 2009-09-22 4:57 ` Lapo Luchini 2009-09-22 9:57 ` Corinna Vinschen 2009-09-24 15:14 ` Thomas Wolff 2009-09-24 15:33 ` andy.koppe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).