public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Lewis Hyatt <lhyatt@gmail.com>
Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v2] libstdc++: Add Unicode-aware width estimation for std::format
Date: Sat, 6 Jan 2024 21:11:29 +0000	[thread overview]
Message-ID: <CACb0b4nYbDPFdMWc55ckKxXMr-9adThok9UiaqygeZ4Sy=atLw@mail.gmail.com> (raw)
In-Reply-To: <CACb0b4kjvnWYnn41ZFTX8GBu4v4V2uXmCiZm=CVHvA01Yy9OAQ@mail.gmail.com>

On Sat, 6 Jan 2024 at 17:03, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Sat, 6 Jan 2024 at 16:57, Lewis Hyatt <lhyatt@gmail.com> wrote:
> >
> > On Sat, Jan 6, 2024 at 11:40 AM Jonathan Wakely <jwakely@redhat.com> wrote:
> > >
> > > Here's a V2 patch which addresses the two things I mentioned: the new
> > > Python script now generates a complete file that can just be included by
> > > <bits/unicode.h>, and the full Unicode 15.1.0 grapheme cluster break
> > > rules are supported (I think ... more testing needed for some of the
> > > complex rules).
> > >
> > > -- >8 --
> >
> > Thanks, by the way, for fixing the typo in gen_wcwidth.py.
> > One thing I wanted to point out, the file contrib/unicode/README
> > contains a list of steps to follow in order to update to a new Unicode
> > version. There are 10 or so steps to generate everything libcpp and
> > diagnostics care about. Do you think it's worth adding something for
> > the new libstdc++ parts there too?
>
> Ah, thanks for pointing that out. Yes, I should add to that.

Here's what I suggest adding to the README:

--- a/contrib/unicode/README
+++ b/contrib/unicode/README
@@ -16,7 +16,12 @@
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
ftp://ftp.unicode.org/Public/UNIDATA/NameAliases.txt

-These files have been added to source control in this directory;
+Two additional files are needed for lookup tables in libstdc++:
+
+ftp://ftp.unicode.org/Public/UNIDATA/auxiliary/GraphemeBreakProperty.txt
+ftp://ftp.unicode.org/Public/UNIDATA/emoji/emoji-data.txt
+
+All these files have been added to source control in this directory;
please see unicode-license.txt for the relevant copyright information.

In order to keep in sync with glibc's wcwidth as much as possible, it is
@@ -24,7 +29,7 @@ desirable for the logic that processes the Unicode
data to be the same as
glibc's.  To that end, we also put in this directory, in the from_glibc/
directory, the glibc python code that implements their logic.  This code was
copied verbatim from glibc, and it can be updated at any time from the glibc
-source code repository.  The files copied from that respository are:
+source code repository.  The files copied from that repository are:

localedata/unicode-gen/unicode_utils.py
localedata/unicode-gen/utf8_gen.py
@@ -71,3 +76,6 @@ The procedure to update GCC's Unicode support is the
following:
9:  Generate uname2c.h as follows:
      ../../libcpp/makeuname2c UnicodeData.txt NameAliases.txt \
       > ../../libcpp/uname2c.h
+
+See gen_libstdcxx_unicode_data.py for instructions on updating the lookup
+tables in libstdc++.


That refers to gen_libstdcxx_unicode_data.py which I think is a better
name than gen_std_format_width.py so I've renamed the new script in my
local tree.


  reply	other threads:[~2024-01-06 21:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05 14:36 [PATCH] " Jonathan Wakely
2024-01-05 15:33 ` Jonathan Wakely
2024-01-06 15:17   ` [PATCH v2] " Jonathan Wakely
2024-01-06 16:57     ` Lewis Hyatt
2024-01-06 17:03       ` Jonathan Wakely
2024-01-06 21:11         ` Jonathan Wakely [this message]
2024-01-08  1:13     ` Jonathan Wakely
2024-01-08  1:17       ` [committed V3] " Jonathan Wakely
2024-01-08 22:56         ` Jonathan Wakely
2024-01-08  1:22       ` [PATCH v2] " Jonathan Wakely
2024-01-08  1:25         ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACb0b4nYbDPFdMWc55ckKxXMr-9adThok9UiaqygeZ4Sy=atLw@mail.gmail.com' \
    --to=jwakely@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=lhyatt@gmail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).