public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
@ 2020-06-16  8:24 Mike FABIAN
  2020-06-23  9:30 ` [PATCH v3] " Mike FABIAN
  0 siblings, 1 reply; 8+ messages in thread
From: Mike FABIAN @ 2020-06-16  8:24 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: 0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch --]
[-- Type: text/x-patch, Size: 1417 bytes --]

From 94aa93f64d88dd52353480f3d425c6cc064b0d81 Mon Sep 17 00:00:00 2001
From: Mike FABIAN <mfabian@redhat.com>
Date: Tue, 16 Jun 2020 08:29:40 +0200
Subject: [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to
 UD7FB to 0 [BZ #26120]

---
 localedata/charmaps/UTF-8          | 1 +
 localedata/unicode-gen/utf8_gen.py | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..56327e7830 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,7 @@ WIDTH
 <UABE8>	0
 <UABED>	0
 <UAC00>...<UD7A3>	2
+<UD7B0>...<UD7FB>	0
 <UF900>...<UFA6D>	2
 <UFA70>...<UFAD9>	2
 <UFB1E>	0
diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 17b99ee88d..f83cb3c5e5 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -258,6 +258,10 @@ def process_width(outfile, ulines, elines, plines):
         if key in width_dict:
             del width_dict[key] # default width is 1
     for key in list(range(0x1160, 0x1200)):
+        # Hangul jungseong and jongseong:
+        width_dict[key] = 0
+    for key in list(range(0xD7B0, 0xD7FC)):
+        # Hangul jungseong and jongseong:
         width_dict[key] = 0
     for key in list(range(0x3248, 0x3250)):
         # These are “A” which means we can decide whether to treat them
-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-16  8:24 [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] Mike FABIAN
@ 2020-06-23  9:30 ` Mike FABIAN
  2020-06-23 15:17   ` Carlos O'Donell
  0 siblings, 1 reply; 8+ messages in thread
From: Mike FABIAN @ 2020-06-23  9:30 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

I skipped unassigned characters and ended the range at U+D7FF even
though U+D7FC .. U+D7FF are currently unassigned. But because
the script now skips the unassigned characters it is OK to end the range
for the Hangul Jamo at U+D7FF, if these characters ever happen to get
assigned in future, they will probably be Hangul Jamo because of
Block.txt.

After each Unicode update, manual checking is good anyway, but ending
the range in the script at U+D7FF seems more likely to do the right
thing already if these characters ever get assigned.


[-- Attachment #2: 0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch --]
[-- Type: text/x-patch, Size: 2055 bytes --]

From 2c4ad3b5c7d6ffa0190a2c60bffdf1203e10b6c8 Mon Sep 17 00:00:00 2001
From: Mike FABIAN <mfabian@redhat.com>
Date: Tue, 16 Jun 2020 08:29:40 +0200
Subject: [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to
 UD7FB to 0 [BZ #26120]

---
 localedata/charmaps/UTF-8          | 2 ++
 localedata/unicode-gen/utf8_gen.py | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
 <UABE8>	0
 <UABED>	0
 <UAC00>...<UD7A3>	2
+<UD7B0>...<UD7C6>	0
+<UD7CB>...<UD7FB>	0
 <UF900>...<UFA6D>	2
 <UFA70>...<UFAD9>	2
 <UFB1E>	0
diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 17b99ee88d..11c906b92f 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
         if key in width_dict:
             del width_dict[key] # default width is 1
     for key in list(range(0x1160, 0x1200)):
-        width_dict[key] = 0
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
+    for key in list(range(0xD7B0, 0xD800)):
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
     for key in list(range(0x3248, 0x3250)):
         # These are “A” which means we can decide whether to treat them
         # as “W” or “N” based on context:
@@ -327,6 +333,7 @@ if __name__ == "__main__":
         help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
+    unicode_utils.fill_attributes(ARGS.unicode_data_file)
     with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
         UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
     with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-23  9:30 ` [PATCH v3] " Mike FABIAN
@ 2020-06-23 15:17   ` Carlos O'Donell
  2020-06-25  8:32     ` Mike FABIAN
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2020-06-23 15:17 UTC (permalink / raw)
  To: Mike FABIAN, libc-alpha

On 6/23/20 5:30 AM, Mike FABIAN via Libc-alpha wrote:
> I skipped unassigned characters and ended the range at U+D7FF even
> though U+D7FC .. U+D7FF are currently unassigned. But because
> the script now skips the unassigned characters it is OK to end the range
> for the Hangul Jamo at U+D7FF, if these characters ever happen to get
> assigned in future, they will probably be Hangul Jamo because of
> Block.txt.
> 
> After each Unicode update, manual checking is good anyway, but ending
> the range in the script at U+D7FF seems more likely to do the right
> thing already if these characters ever get assigned.
> 

You change the generator but all the files that are generated by the
generator do not appear regenerated in your patch.

Can you please post exactly what you plan to commit, that way we can
review the results?

I'm expecting:
- generator change.
- all files updated with date changes.
- some files have more than date changes.

This way we keep the generated files consistent.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-23 15:17   ` Carlos O'Donell
@ 2020-06-25  8:32     ` Mike FABIAN
  2020-06-25 12:35       ` Carlos O'Donell
  0 siblings, 1 reply; 8+ messages in thread
From: Mike FABIAN @ 2020-06-25  8:32 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

Carlos O'Donell <carlos@redhat.com> さんはかきました:

> On 6/23/20 5:30 AM, Mike FABIAN via Libc-alpha wrote:
>> I skipped unassigned characters and ended the range at U+D7FF even
>> though U+D7FC .. U+D7FF are currently unassigned. But because
>> the script now skips the unassigned characters it is OK to end the range
>> for the Hangul Jamo at U+D7FF, if these characters ever happen to get
>> assigned in future, they will probably be Hangul Jamo because of
>> Block.txt.
>> 
>> After each Unicode update, manual checking is good anyway, but ending
>> the range in the script at U+D7FF seems more likely to do the right
>> thing already if these characters ever get assigned.
>> 
>
> You change the generator but all the files that are generated by the
> generator do not appear regenerated in your patch.

> Can you please post exactly what you plan to commit, that way we can
> review the results?

The patch did contain everything.

> I'm expecting:
> - generator change.

This part is the generator change:

diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 17b99ee88d..11c906b92f 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
         if key in width_dict:
             del width_dict[key] # default width is 1
     for key in list(range(0x1160, 0x1200)):
-        width_dict[key] = 0
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
+    for key in list(range(0xD7B0, 0xD800)):
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
     for key in list(range(0x3248, 0x3250)):
         # These are “A” which means we can decide whether to treat them
         # as “W” or “N” based on context:
@@ -327,6 +333,7 @@ if __name__ == "__main__":
         help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
+    unicode_utils.fill_attributes(ARGS.unicode_data_file)
     with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
         UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
     with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:

> - all files updated with date changes.

And the UTF-8 file in charmaps is the only file which changed, only in
the WIDTH section:

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
 <UABE8>        0
 <UABED>        0
 <UAC00>...<UD7A3>      2
+<UD7B0>...<UD7C6>      0
+<UD7CB>...<UD7FB>      0
 <UF900>...<UFA6D>      2
 <UFA70>...<UFAD9>      2
 <UFB1E>        0


> - some files have more than date changes.

No other files are changed and the UTF-8 file in charmaps does not
contain a generation date.

> This way we keep the generated files consistent.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-25  8:32     ` Mike FABIAN
@ 2020-06-25 12:35       ` Carlos O'Donell
  2020-06-25 13:00         ` Mike FABIAN
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2020-06-25 12:35 UTC (permalink / raw)
  To: Mike FABIAN; +Cc: libc-alpha

On 6/25/20 4:32 AM, Mike FABIAN wrote:
> Carlos O'Donell <carlos@redhat.com> さんはかきました:
> 
>> On 6/23/20 5:30 AM, Mike FABIAN via Libc-alpha wrote:
>>> I skipped unassigned characters and ended the range at U+D7FF even
>>> though U+D7FC .. U+D7FF are currently unassigned. But because
>>> the script now skips the unassigned characters it is OK to end the range
>>> for the Hangul Jamo at U+D7FF, if these characters ever happen to get
>>> assigned in future, they will probably be Hangul Jamo because of
>>> Block.txt.
>>>
>>> After each Unicode update, manual checking is good anyway, but ending
>>> the range in the script at U+D7FF seems more likely to do the right
>>> thing already if these characters ever get assigned.
>>>
>>
>> You change the generator but all the files that are generated by the
>> generator do not appear regenerated in your patch.
> 
>> Can you please post exactly what you plan to commit, that way we can
>> review the results?
> 
> The patch did contain everything.

My expectation was that we do this:

(a) Make all your source changes.
(b) Regenerate.
cd ~/src/glibc/localedata/unicode-gen
make
make-intstall

When you do it this way *all* files get updates.

You're cheating here because you know apriori which files need updating
and you appear to have carried out that updated by hand. I dislike doing
this kind of "by hand" process and was expecting the more general update
as I describe above. This ensures that all files are at the same "regenerated"
state and the date for all of them in their datestamps matches the date
you regenerated everything that is generated by unicode-gen. That means a
reviewer doesn't have to verify that no other files might have been changed
by your code change.

I'm requesting that we follow a simpler procedure that regenerates the files
using the expected process, and post patches based on that.

Does that clarify my position?

Please feel free to justify a different position.
 
>> I'm expecting:
>> - generator change.
> 
> This part is the generator change:
> 
> diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
> index 17b99ee88d..11c906b92f 100755
> --- a/localedata/unicode-gen/utf8_gen.py
> +++ b/localedata/unicode-gen/utf8_gen.py
> @@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
>          if key in width_dict:
>              del width_dict[key] # default width is 1
>      for key in list(range(0x1160, 0x1200)):
> -        width_dict[key] = 0
> +        # Hangul jungseong and jongseong:
> +        if key in unicode_utils.UNICODE_ATTRIBUTES:
> +            width_dict[key] = 0
> +    for key in list(range(0xD7B0, 0xD800)):
> +        # Hangul jungseong and jongseong:
> +        if key in unicode_utils.UNICODE_ATTRIBUTES:
> +            width_dict[key] = 0

OK.

>      for key in list(range(0x3248, 0x3250)):
>          # These are “A” which means we can decide whether to treat them
>          # as “W” or “N” based on context:
> @@ -327,6 +333,7 @@ if __name__ == "__main__":
>          help='The Unicode version of the input files used.')
>      ARGS = PARSER.parse_args()
>  
> +    unicode_utils.fill_attributes(ARGS.unicode_data_file)

OK.

>      with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
>          UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
>      with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
> 
>> - all files updated with date changes.
> 
> And the UTF-8 file in charmaps is the only file which changed, only in
> the WIDTH section:

If you ran make; make install; from unicode-gen all the other files would have
changed their timestamps to show that they were updated and had no functional
change. Those patches are missing. Without them I don't know that the generator
actually generated "zero functional changes" for those files.

> diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
> index 14c5d4fa33..8cce47cd97 100644
> --- a/localedata/charmaps/UTF-8
> +++ b/localedata/charmaps/UTF-8
> @@ -48920,6 +48920,8 @@ WIDTH
>  <UABE8>        0
>  <UABED>        0
>  <UAC00>...<UD7A3>      2
> +<UD7B0>...<UD7C6>      0
> +<UD7CB>...<UD7FB>      0

OK.

>  <UF900>...<UFA6D>      2
>  <UFA70>...<UFAD9>      2
>  <UFB1E>        0
> 
> 
>> - some files have more than date changes.
> 
> No other files are changed and the UTF-8 file in charmaps does not
> contain a generation date.
> 
>> This way we keep the generated files consistent.
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-25 12:35       ` Carlos O'Donell
@ 2020-06-25 13:00         ` Mike FABIAN
  2020-06-25 13:07           ` Mike FABIAN
  0 siblings, 1 reply; 8+ messages in thread
From: Mike FABIAN @ 2020-06-25 13:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

Carlos O'Donell <carlos@redhat.com> さんはかきました:

> My expectation was that we do this:
>
> (a) Make all your source changes.
> (b) Regenerate.
> cd ~/src/glibc/localedata/unicode-gen
> make
> make-intstall
>
> When you do it this way *all* files get updates.
>
> You're cheating here because you know apriori which files need updating
> and you appear to have carried out that updated by hand. I dislike doing
> this kind of "by hand" process and was expecting the more general update
> as I describe above. This ensures that all files are at the same "regenerated"
> state and the date for all of them in their datestamps matches the date
> you regenerated everything that is generated by unicode-gen. That means a
> reviewer doesn't have to verify that no other files might have been changed
> by your code change.
>
> I'm requesting that we follow a simpler procedure that regenerates the files
> using the expected process, and post patches based on that.
>
> Does that clarify my position?

Yes, now I understand what you mean.

[...]

> If you ran make; make install; from unicode-gen all the other files would have
> changed their timestamps to show that they were updated and had no functional
> change. Those patches are missing. Without them I don't know that the generator
> actually generated "zero functional changes" for those files.

I knew because I only changed utf8_gen.py and this generator script
changes only the UTF-8 file.

I'll do it the "make install" way ...

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-25 13:00         ` Mike FABIAN
@ 2020-06-25 13:07           ` Mike FABIAN
  2020-06-25 18:57             ` Carlos O'Donell
  0 siblings, 1 reply; 8+ messages in thread
From: Mike FABIAN @ 2020-06-25 13:07 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 101 bytes --]

Mike FABIAN <mfabian@redhat.com> さんはかきました:

> I'll do it the "make install" way ...


[-- Attachment #2: 0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch --]
[-- Type: text/x-patch, Size: 6720 bytes --]

From 32ee1eb4037ce5c777322a5043a48b8627610d8d Mon Sep 17 00:00:00 2001
From: Mike FABIAN <mfabian@redhat.com>
Date: Tue, 16 Jun 2020 08:29:40 +0200
Subject: [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to
 UD7FB to 0 [BZ #26120]

---
 localedata/charmaps/UTF-8              | 2 ++
 localedata/locales/i18n_ctype          | 2 +-
 localedata/locales/tr_TR               | 2 +-
 localedata/locales/translit_circle     | 2 +-
 localedata/locales/translit_cjk_compat | 2 +-
 localedata/locales/translit_combining  | 2 +-
 localedata/locales/translit_compat     | 2 +-
 localedata/locales/translit_font       | 2 +-
 localedata/locales/translit_fraction   | 2 +-
 localedata/unicode-gen/utf8_gen.py     | 9 ++++++++-
 10 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
 <UABE8>	0
 <UABED>	0
 <UAC00>...<UD7A3>	2
+<UD7B0>...<UD7C6>	0
+<UD7CB>...<UD7FB>	0
 <UF900>...<UFA6D>	2
 <UFA70>...<UFAD9>	2
 <UFB1E>	0
diff --git a/localedata/locales/i18n_ctype b/localedata/locales/i18n_ctype
index 6f078a101d..c63e0790fc 100644
--- a/localedata/locales/i18n_ctype
+++ b/localedata/locales/i18n_ctype
@@ -26,7 +26,7 @@ fax       ""
 language  ""
 territory "Earth"
 revision  "13.0.0"
-date      "2020-04-14"
+date      "2020-06-25"
 category  "i18n:2012";LC_CTYPE
 END LC_IDENTIFICATION
 
diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
index d5785ceca1..7dbb923228 100644
--- a/localedata/locales/tr_TR
+++ b/localedata/locales/tr_TR
@@ -43,7 +43,7 @@ fax        ""
 language   "Turkish"
 territory  "Turkey"
 revision   "1.0"
-date       "2020-04-14"
+date       "2020-06-25"
 
 category "i18n:2012";LC_IDENTIFICATION
 category "i18n:2012";LC_CTYPE
diff --git a/localedata/locales/translit_circle b/localedata/locales/translit_circle
index 0f1e81541c..5c07b44532 100644
--- a/localedata/locales/translit_circle
+++ b/localedata/locales/translit_circle
@@ -9,7 +9,7 @@ comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of encircled characters.
-% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2020-06-25 for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_cjk_compat b/localedata/locales/translit_cjk_compat
index 17b74134fc..ee0d7f83c6 100644
--- a/localedata/locales/translit_cjk_compat
+++ b/localedata/locales/translit_cjk_compat
@@ -9,7 +9,7 @@ comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of CJK compatibility characters.
-% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2020-06-25 for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining
index d5c8bbfe8f..36128f097a 100644
--- a/localedata/locales/translit_combining
+++ b/localedata/locales/translit_combining
@@ -10,7 +10,7 @@ comment_char %
 
 % Transliterations that remove all combining characters (accents,
 % pronounciation marks, etc.).
-% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2020-06-25 for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_compat b/localedata/locales/translit_compat
index ff18b02ea3..ac24c4e938 100644
--- a/localedata/locales/translit_compat
+++ b/localedata/locales/translit_compat
@@ -9,7 +9,7 @@ comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of compatibility characters and ligatures.
-% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2020-06-25 for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_font b/localedata/locales/translit_font
index e79b0d83f5..680c4ed426 100644
--- a/localedata/locales/translit_font
+++ b/localedata/locales/translit_font
@@ -9,7 +9,7 @@ comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of font equivalents.
-% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2020-06-25 for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_fraction b/localedata/locales/translit_fraction
index 197d57a644..b52244969e 100644
--- a/localedata/locales/translit_fraction
+++ b/localedata/locales/translit_fraction
@@ -9,7 +9,7 @@ comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of fractions.
-% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2020-04-14 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2020-06-25 for Unicode 13.0.0.
 % The replacements have been surrounded with spaces, because fractions are
 % often preceded by a decimal number and followed by a unit or a math symbol.
 
diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 17b99ee88d..11c906b92f 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
         if key in width_dict:
             del width_dict[key] # default width is 1
     for key in list(range(0x1160, 0x1200)):
-        width_dict[key] = 0
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
+    for key in list(range(0xD7B0, 0xD800)):
+        # Hangul jungseong and jongseong:
+        if key in unicode_utils.UNICODE_ATTRIBUTES:
+            width_dict[key] = 0
     for key in list(range(0x3248, 0x3250)):
         # These are “A” which means we can decide whether to treat them
         # as “W” or “N” based on context:
@@ -327,6 +333,7 @@ if __name__ == "__main__":
         help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
+    unicode_utils.fill_attributes(ARGS.unicode_data_file)
     with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
         UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
     with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
  2020-06-25 13:07           ` Mike FABIAN
@ 2020-06-25 18:57             ` Carlos O'Donell
  0 siblings, 0 replies; 8+ messages in thread
From: Carlos O'Donell @ 2020-06-25 18:57 UTC (permalink / raw)
  To: Mike FABIAN; +Cc: libc-alpha

On 6/25/20 9:07 AM, Mike FABIAN wrote:
> From 32ee1eb4037ce5c777322a5043a48b8627610d8d Mon Sep 17 00:00:00 2001
> From: Mike FABIAN <mfabian@redhat.com>
> Date: Tue, 16 Jun 2020 08:29:40 +0200
> Subject: [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to
>  UD7FB to 0 [BZ #26120]

OK for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
 
> ---
>  localedata/charmaps/UTF-8              | 2 ++
>  localedata/locales/i18n_ctype          | 2 +-
>  localedata/locales/tr_TR               | 2 +-
>  localedata/locales/translit_circle     | 2 +-
>  localedata/locales/translit_cjk_compat | 2 +-
>  localedata/locales/translit_combining  | 2 +-
>  localedata/locales/translit_compat     | 2 +-
>  localedata/locales/translit_font       | 2 +-
>  localedata/locales/translit_fraction   | 2 +-
>  localedata/unicode-gen/utf8_gen.py     | 9 ++++++++-
>  10 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
> index 14c5d4fa33..8cce47cd97 100644
> --- a/localedata/charmaps/UTF-8
> +++ b/localedata/charmaps/UTF-8
> @@ -48920,6 +48920,8 @@ WIDTH
>  <UABE8>	0
>  <UABED>	0
>  <UAC00>...<UD7A3>	2
> +<UD7B0>...<UD7C6>	0
> +<UD7CB>...<UD7FB>	0

OK. Expected output by generator.

>  <UF900>...<UFA6D>	2
>  <UFA70>...<UFAD9>	2
>  <UFB1E>	0
> diff --git a/localedata/locales/i18n_ctype b/localedata/locales/i18n_ctype
> index 6f078a101d..c63e0790fc 100644
> --- a/localedata/locales/i18n_ctype
> +++ b/localedata/locales/i18n_ctype
> @@ -26,7 +26,7 @@ fax       ""
>  language  ""
>  territory "Earth"
>  revision  "13.0.0"
> -date      "2020-04-14"
> +date      "2020-06-25"

OK. No change. Expected.

>  category  "i18n:2012";LC_CTYPE
>  END LC_IDENTIFICATION
>  
> diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
> index d5785ceca1..7dbb923228 100644
> --- a/localedata/locales/tr_TR
> +++ b/localedata/locales/tr_TR
> @@ -43,7 +43,7 @@ fax        ""
>  language   "Turkish"
>  territory  "Turkey"
>  revision   "1.0"
> -date       "2020-04-14"
> +date       "2020-06-25"

OK. No change. Expected.

>  
>  category "i18n:2012";LC_IDENTIFICATION
>  category "i18n:2012";LC_CTYPE
> diff --git a/localedata/locales/translit_circle b/localedata/locales/translit_circle
> index 0f1e81541c..5c07b44532 100644
> --- a/localedata/locales/translit_circle
> +++ b/localedata/locales/translit_circle
> @@ -9,7 +9,7 @@ comment_char %
>  % otherwise be governed by that license.
>  
>  % Transliterations of encircled characters.
> -% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2020-04-14 for Unicode 13.0.0.
> +% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  
>  LC_CTYPE
>  
> diff --git a/localedata/locales/translit_cjk_compat b/localedata/locales/translit_cjk_compat
> index 17b74134fc..ee0d7f83c6 100644
> --- a/localedata/locales/translit_cjk_compat
> +++ b/localedata/locales/translit_cjk_compat
> @@ -9,7 +9,7 @@ comment_char %
>  % otherwise be governed by that license.
>  
>  % Transliterations of CJK compatibility characters.
> -% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2020-04-14 for Unicode 13.0.0.
> +% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  
>  LC_CTYPE
>  
> diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining
> index d5c8bbfe8f..36128f097a 100644
> --- a/localedata/locales/translit_combining
> +++ b/localedata/locales/translit_combining
> @@ -10,7 +10,7 @@ comment_char %
>  
>  % Transliterations that remove all combining characters (accents,
>  % pronounciation marks, etc.).
> -% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2020-04-14 for Unicode 13.0.0.
> +% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  
>  LC_CTYPE
>  
> diff --git a/localedata/locales/translit_compat b/localedata/locales/translit_compat
> index ff18b02ea3..ac24c4e938 100644
> --- a/localedata/locales/translit_compat
> +++ b/localedata/locales/translit_compat
> @@ -9,7 +9,7 @@ comment_char %
>  % otherwise be governed by that license.
>  
>  % Transliterations of compatibility characters and ligatures.
> -% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2020-04-14 for Unicode 13.0.0.
> +% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  
>  LC_CTYPE
>  
> diff --git a/localedata/locales/translit_font b/localedata/locales/translit_font
> index e79b0d83f5..680c4ed426 100644
> --- a/localedata/locales/translit_font
> +++ b/localedata/locales/translit_font
> @@ -9,7 +9,7 @@ comment_char %
>  % otherwise be governed by that license.
>  
>  % Transliterations of font equivalents.
> -% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2020-04-14 for Unicode 13.0.0.
> +% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  
>  LC_CTYPE
>  
> diff --git a/localedata/locales/translit_fraction b/localedata/locales/translit_fraction
> index 197d57a644..b52244969e 100644
> --- a/localedata/locales/translit_fraction
> +++ b/localedata/locales/translit_fraction
> @@ -9,7 +9,7 @@ comment_char %
>  % otherwise be governed by that license.
>  
>  % Transliterations of fractions.
> -% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2020-04-14 for Unicode 13.0.0.> +% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2020-06-25 for Unicode 13.0.0.

OK. No change. Expected.

>  % The replacements have been surrounded with spaces, because fractions are
>  % often preceded by a decimal number and followed by a unit or a math symbol.
>  
> diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
> index 17b99ee88d..11c906b92f 100755
> --- a/localedata/unicode-gen/utf8_gen.py
> +++ b/localedata/unicode-gen/utf8_gen.py
> @@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines):
>          if key in width_dict:
>              del width_dict[key] # default width is 1
>      for key in list(range(0x1160, 0x1200)):
> -        width_dict[key] = 0
> +        # Hangul jungseong and jongseong:
> +        if key in unicode_utils.UNICODE_ATTRIBUTES:
> +            width_dict[key] = 0
> +    for key in list(range(0xD7B0, 0xD800)):
> +        # Hangul jungseong and jongseong:
> +        if key in unicode_utils.UNICODE_ATTRIBUTES:
> +            width_dict[key] = 0

OK. Expected per bug.

>      for key in list(range(0x3248, 0x3250)):
>          # These are “A” which means we can decide whether to treat them
>          # as “W” or “N” based on context:
> @@ -327,6 +333,7 @@ if __name__ == "__main__":
>          help='The Unicode version of the input files used.')
>      ARGS = PARSER.parse_args()
>  
> +    unicode_utils.fill_attributes(ARGS.unicode_data_file)

OK.

>      with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
>          UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
>      with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
> -- 2.26.2


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-25 18:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  8:24 [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] Mike FABIAN
2020-06-23  9:30 ` [PATCH v3] " Mike FABIAN
2020-06-23 15:17   ` Carlos O'Donell
2020-06-25  8:32     ` Mike FABIAN
2020-06-25 12:35       ` Carlos O'Donell
2020-06-25 13:00         ` Mike FABIAN
2020-06-25 13:07           ` Mike FABIAN
2020-06-25 18:57             ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).