public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
@ 2016-11-25 16:17 Mike Frysinger
  2016-11-25 16:26 ` Florian Weimer
  2016-11-28  9:23 ` Andreas Schwab
  0 siblings, 2 replies; 7+ messages in thread
From: Mike Frysinger @ 2016-11-25 16:17 UTC (permalink / raw)
  To: libc-alpha

Microsoft long ago added a mapping for 0x80 to the Euro sign to their
CP936.  While GBK 1.0 doesn't include this mapping, it is compatible,
and Microsoft and glibc alias the two codepages.  We could split them
apart so GBK wouldn't include the mapping, but that seems like a lot
of work for little gain.

2016-11-25  Mike Frysinger  <vapier@gentoo.org>

	[BZ #20864]
	* localedata/GBK: Map 0x80 to U20AC.
---
 localedata/charmaps/GBK | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/localedata/charmaps/GBK b/localedata/charmaps/GBK
index 4c35f1426c64..4186896f2a9b 100644
--- a/localedata/charmaps/GBK
+++ b/localedata/charmaps/GBK
@@ -3,10 +3,15 @@
 <mb_cur_min> 1
 <comment_char> %
 <escape_char> /
+% https://en.wikipedia.org/wiki/GBK
 
+% https://www-01.ibm.com/software/globalization/ccsid/ccsid936.html
 % alias CP936
+
+% ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit
 % alias MS936
 % alias WINDOWS-936
+
 CHARMAP
 <U0000>     /x00         NULL (NUL)
 <U0001>     /x01         START OF HEADING (SOH)
@@ -136,6 +141,8 @@ CHARMAP
 <U007D>     /x7d         RIGHT CURLY BRACKET
 <U007E>     /x7e         TILDE
 <U007F>     /x7f         DELETE (DEL)
+% This isn't in GBK 1.0, but Microsoft added it to their CP936 page.
+<U20AC>     /x80         EURO SIGN
 
 <U4E02>     /x81/x40     <CJK>
 <U4E04>     /x81/x41     <CJK>
-- 
2.10.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-25 16:17 [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864] Mike Frysinger
@ 2016-11-25 16:26 ` Florian Weimer
  2016-11-28  9:23 ` Andreas Schwab
  1 sibling, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2016-11-25 16:26 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha

On 11/25/2016 05:17 PM, Mike Frysinger wrote:
> Microsoft long ago added a mapping for 0x80 to the Euro sign to their
> CP936.  While GBK 1.0 doesn't include this mapping, it is compatible,
> and Microsoft and glibc alias the two codepages.  We could split them
> apart so GBK wouldn't include the mapping, but that seems like a lot
> of work for little gain.
>
> 2016-11-25  Mike Frysinger  <vapier@gentoo.org>
>
> 	[BZ #20864]
> 	* localedata/GBK: Map 0x80 to U20AC.

Looks okay to me.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-25 16:17 [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864] Mike Frysinger
  2016-11-25 16:26 ` Florian Weimer
@ 2016-11-28  9:23 ` Andreas Schwab
  2016-11-29 10:09   ` Florian Weimer
  1 sibling, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2016-11-28  9:23 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: libc-alpha

FAIL: iconvdata/tst-tables
failed: ./tst-table.sh /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ GBK 

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-28  9:23 ` Andreas Schwab
@ 2016-11-29 10:09   ` Florian Weimer
  2016-11-29 14:28     ` Zack Weinberg
  2016-11-29 17:04     ` Mike Frysinger
  0 siblings, 2 replies; 7+ messages in thread
From: Florian Weimer @ 2016-11-29 10:09 UTC (permalink / raw)
  To: Andreas Schwab, Mike Frysinger; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

On 11/28/2016 10:23 AM, Andreas Schwab wrote:
> FAIL: iconvdata/tst-tables
> failed: ./tst-table.sh /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/ /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/ GBK

Here's one way to fix it.  Doesn't look really pretty, but it seems that 
this new 0x80 mapping is quite an outlier.

Thanks,
Florian


[-- Attachment #2: gbk.patch --]
[-- Type: text/x-patch, Size: 1571 bytes --]

gconv: Adjust GBK to support the Euro sign

Commit aa4d00ca39e604ac4e9fead401ccd4483e11a281 only updated the
data used by locales.

2016-11-29  Florian Weimer  <fweimer@redhat.com>

	* iconvdata/gbk.c (BODY): Add Euro sign support (both directions).

diff --git a/iconvdata/gbk.c b/iconvdata/gbk.c
index fc32a50..d39e398 100644
--- a/iconvdata/gbk.c
+++ b/iconvdata/gbk.c
@@ -13148,8 +13148,17 @@ static const char __gbk_from_ucs4_tab12[][2] =
       if (__builtin_expect (ch <= 0x80, 0)				      \
 	  || __builtin_expect (ch > 0xfe, 0))				      \
 	{								      \
-	  /* This is illegal.  */					      \
-	  STANDARD_FROM_LOOP_ERR_HANDLER (1);				      \
+	  if (__glibc_likely (ch == 0x80))				      \
+	    {								      \
+	      /* Exception for the Euro sign (see CP936).  */		      \
+	      ch = 0x20AC;						      \
+	      ++inptr;							      \
+	    }								      \
+	  else								      \
+	    {								      \
+	      /* This is illegal.  */					      \
+	      STANDARD_FROM_LOOP_ERR_HANDLER (1);			      \
+	    }								      \
 	}								      \
       else								      \
 	{								      \
@@ -13292,6 +13301,10 @@ static const char __gbk_from_ucs4_tab12[][2] =
 	case 0x2010 ... 0x203b:						      \
 	  cp = __gbk_from_ucs4_tab4[ch - 0x2010];			      \
 	  break;							      \
+	case 0x20AC:							      \
+	  /* Exception for the Euro sign (see CP396).  */		      \
+	  cp = "\x80";							      \
+	  break;							      \
 	case 0x2103 ... 0x22bf:						      \
 	  cp = __gbk_from_ucs4_tab5[ch - 0x2103];			      \
 	  break;							      \

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-29 10:09   ` Florian Weimer
@ 2016-11-29 14:28     ` Zack Weinberg
  2016-11-29 17:04     ` Mike Frysinger
  1 sibling, 0 replies; 7+ messages in thread
From: Zack Weinberg @ 2016-11-29 14:28 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Andreas Schwab, Mike Frysinger, GNU C Library

On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/28/2016 10:23 AM, Andreas Schwab wrote:
>>
>> FAIL: iconvdata/tst-tables
>> failed: ./tst-table.sh
>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/
>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/
>> GBK
>
> Here's one way to fix it.  Doesn't look really pretty, but it seems that
> this new 0x80 mapping is quite an outlier.

+1 from me - this is consistent with the behavior specified for GBK in
the web "Encoding Standard"
(https://encoding.spec.whatwg.org/#gbk-decoder) which has been
obsessively tuned for maximum compatibility with existing content.

zw

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-29 10:09   ` Florian Weimer
  2016-11-29 14:28     ` Zack Weinberg
@ 2016-11-29 17:04     ` Mike Frysinger
  2016-11-29 17:38       ` Florian Weimer
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Frysinger @ 2016-11-29 17:04 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Andreas Schwab, GNU C Library

On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/28/2016 10:23 AM, Andreas Schwab wrote:
>> FAIL: iconvdata/tst-tables
>> failed: ./tst-table.sh
>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/
>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/
>> GBK
>
> Here's one way to fix it.  Doesn't look really pretty, but it seems that
> this new 0x80 mapping is quite an outlier.

i'm not super familiar with the gconv modules, but this seems to make sense

should use lowercase hex constants though to match surrounding code style

my dev box is offline pending a move, so i won't be able to push
anything for a while
-mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864]
  2016-11-29 17:04     ` Mike Frysinger
@ 2016-11-29 17:38       ` Florian Weimer
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2016-11-29 17:38 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Andreas Schwab, GNU C Library

On 11/29/2016 06:04 PM, Mike Frysinger wrote:
> On Tue, Nov 29, 2016 at 5:08 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 11/28/2016 10:23 AM, Andreas Schwab wrote:
>>> FAIL: iconvdata/tst-tables
>>> failed: ./tst-table.sh
>>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/
>>> /home/abuild/rpmbuild/BUILD/glibc-2.24.90.20161127.gb964e06/cc-base/iconvdata/
>>> GBK
>>
>> Here's one way to fix it.  Doesn't look really pretty, but it seems that
>> this new 0x80 mapping is quite an outlier.
>
> i'm not super familiar with the gconv modules, but this seems to make sense
>
> should use lowercase hex constants though to match surrounding code style

Right, I'll fix this before committing it.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-11-29 17:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-25 16:17 [PATCH] localedata: GBK: add mapping for 0x80->Euro sign [BZ #20864] Mike Frysinger
2016-11-25 16:26 ` Florian Weimer
2016-11-28  9:23 ` Andreas Schwab
2016-11-29 10:09   ` Florian Weimer
2016-11-29 14:28     ` Zack Weinberg
2016-11-29 17:04     ` Mike Frysinger
2016-11-29 17:38       ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).