From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by sourceware.org (Postfix) with ESMTP id 343BF393C8A4 for ; Tue, 23 Jun 2020 09:30:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 343BF393C8A4 Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-358-C-t8jKyLN_yIn1wEWGdvdQ-1; Tue, 23 Jun 2020 05:30:26 -0400 X-MC-Unique: C-t8jKyLN_yIn1wEWGdvdQ-1 Received: by mail-wm1-f71.google.com with SMTP id a7so3253153wmf.1 for ; Tue, 23 Jun 2020 02:30:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version; bh=cBc1cnvAYqzfHrkD0SWDjS3oNaUPhaP7/HEhYwzZ1xI=; b=S3sUIuNVwJGEQABTIOUDsKX1CnQ1r1FdKYxycChnVTrbHrPmeYjdNhdJSvez6zVhi8 OXvakdjMgmVy0r1Ma39W0V2uV7b2IeGqStl4n8IJBqcgnyAFaQRSlDAJD8wm48EvK3U3 HBL4FNrtFrpxY6zMqtkjdRE4fNv+IG7q2ygDO9nhCqharvqdGp+N4O8yoFowvxs5FMeJ fEaBRvt2tdUyW3D2WMFoItTfJxNE/kfJ02I3HJxEOo01U2mV1sSlWfaGZWU61Awf/+VF VED5ezW5KkDP8u6zWsts/cixsjGkIU+32atIfPKQ+UsoXgtk5qt0+Bdk0uuYsBT3tbks 0gAQ== X-Gm-Message-State: AOAM533hEdNiBKInu5OgO4gAD2yXUkz1iaHfJse5qdFmmDR32bGUV8T3 VOUkC9aPkbZKjY87WM9QL+Mkfv9qmEgMJuNZyg6Er0wBI3Azf+TP84vmlHpa0Udvc5DAL5poqAP bFiNwiycs9Ty7ikro1Ns= X-Received: by 2002:adf:e901:: with SMTP id f1mr16872767wrm.80.1592904625543; Tue, 23 Jun 2020 02:30:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQMA/ENc635P0N6X9VszeNmIhwle/NpfMpE5yTkXSPCbiz3er8NxKU0M5fdAGvGtkCmZy8DQ== X-Received: by 2002:adf:e901:: with SMTP id f1mr16872751wrm.80.1592904625320; Tue, 23 Jun 2020 02:30:25 -0700 (PDT) Received: from taka.site (ppp-46-244-205-19.dynamic.mnet-online.de. [46.244.205.19]) by smtp.gmail.com with ESMTPSA id z7sm2932461wmb.42.2020.06.23.02.30.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jun 2020 02:30:24 -0700 (PDT) Received: from taka.site (localhost [IPv6:::1]) by taka.site (Postfix) with ESMTP id C0C4542E; Tue, 23 Jun 2020 11:30:23 +0200 (CEST) From: Mike FABIAN To: libc-alpha@sourceware.org Subject: Re: [PATCH v3] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] Organization: Red Hat References: X-Face: "'; oPz9V1+<,`}1ZuxRv~EiSusWq*{Yjr"Sdvbhq'?q=2R\\6Y9O/,SAE`{J|6I=|w/sQg< rW_N'E3IV6~f8?\l#Es`]S`mv',PY(`8{$$R?+gLu}Qv/Mn>)?uladFjJ@yl!_p_Jh; 5QxlD6zL:?r IXe4FfK$C^mWhh$o`yt; .r.FLZLQOWBt> (Mike FABIAN's message of "Tue, 16 Jun 2020 10:24:59 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jun 2020 09:30:32 -0000 --=-=-= Content-Type: text/plain I skipped unassigned characters and ended the range at U+D7FF even though U+D7FC .. U+D7FF are currently unassigned. But because the script now skips the unassigned characters it is OK to end the range for the Hangul Jamo at U+D7FF, if these characters ever happen to get assigned in future, they will probably be Hangul Jamo because of Block.txt. After each Unicode update, manual checking is good anyway, but ending the range in the script at U+D7FF seems more likely to do the right thing already if these characters ever get assigned. --=-=-= Content-Type: text/x-patch; charset=UTF-8 Content-Disposition: inline; filename=0001-Set-width-of-JUNGSEONG-JONGSEONG-characters-from-UD7.patch Content-Transfer-Encoding: 8bit >From 2c4ad3b5c7d6ffa0190a2c60bffdf1203e10b6c8 Mon Sep 17 00:00:00 2001 From: Mike FABIAN Date: Tue, 16 Jun 2020 08:29:40 +0200 Subject: [PATCH] Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] --- localedata/charmaps/UTF-8 | 2 ++ localedata/unicode-gen/utf8_gen.py | 9 ++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8 index 14c5d4fa33..8cce47cd97 100644 --- a/localedata/charmaps/UTF-8 +++ b/localedata/charmaps/UTF-8 @@ -48920,6 +48920,8 @@ WIDTH 0 0 ... 2 +... 0 +... 0 ... 2 ... 2 0 diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py index 17b99ee88d..11c906b92f 100755 --- a/localedata/unicode-gen/utf8_gen.py +++ b/localedata/unicode-gen/utf8_gen.py @@ -258,7 +258,13 @@ def process_width(outfile, ulines, elines, plines): if key in width_dict: del width_dict[key] # default width is 1 for key in list(range(0x1160, 0x1200)): - width_dict[key] = 0 + # Hangul jungseong and jongseong: + if key in unicode_utils.UNICODE_ATTRIBUTES: + width_dict[key] = 0 + for key in list(range(0xD7B0, 0xD800)): + # Hangul jungseong and jongseong: + if key in unicode_utils.UNICODE_ATTRIBUTES: + width_dict[key] = 0 for key in list(range(0x3248, 0x3250)): # These are “A” which means we can decide whether to treat them # as “W” or “N” based on context: @@ -327,6 +333,7 @@ if __name__ == "__main__": help='The Unicode version of the input files used.') ARGS = PARSER.parse_args() + unicode_utils.fill_attributes(ARGS.unicode_data_file) with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE: UNICODE_DATA_LINES = UNIDATA_FILE.readlines() with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE: -- 2.26.2 --=-=-=--