From: Paul Eggert <eggert@cs.ucla.edu>
To: Joseph Myers <joseph@codesourcery.com>,
GNU C Library <libc-alpha@sourceware.org>
Subject: Re: UTF-8 in glibc commit messages
Date: Wed, 14 Apr 2021 11:08:47 -0700 [thread overview]
Message-ID: <4878f4cd-529d-fa8a-6394-d7ae6a69c824@cs.ucla.edu> (raw)
In-Reply-To: <YHcDsTB/pSUnb2l0@vapier>
On 4/14/21 8:01 AM, Mike Frysinger wrote:
> can't we be proactive ? let's go all-in on UTF-8.
A problem with "all-in" is that UTF-8 has weird characters that can mess
things up. The commit message check was originally put in because
someone copy-pasted U+2069 POP DIRECTIONAL ISOLATE into a commit message
without realizing it. That invisible character breaks simple searches
like 'grep -w'.
glibc's current check isn't quite right either, as it allows lines like
this:
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
in which each "space" is actually U+00A0 NO-BREAK SPACE. Although that's
valid ISO-8895-15, U+00A0 is another weird character that we arguably
shouldn't allow as it can also mess up searches (it's even blacklisted
in URLs by some browsers because of the potential for phishing).
It'd be better to come up with an exact list of acceptable Unicode
characters (probably a set of categories with some exceptions). This
would be better than the current approach which is either too-generous
or (mostly) too-restrictive. But it'd be some work.
next prev parent reply other threads:[~2021-04-14 18:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-13 19:21 Paul Eggert
2021-04-13 20:19 ` Joseph Myers
2021-04-14 0:06 ` Paul Eggert
2021-04-14 15:01 ` Mike Frysinger
2021-04-14 17:41 ` DJ Delorie
2021-04-14 18:08 ` Paul Eggert [this message]
2021-04-14 18:16 ` Adhemerval Zanella
2021-04-14 18:24 ` Paul Eggert
2021-04-14 20:28 ` Mike Frysinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4878f4cd-529d-fa8a-6394-d7ae6a69c824@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=joseph@codesourcery.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).