public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Initial implementation of -Whomoglyph [PR preprocessor/103027]
Date: Tue, 2 Nov 2021 13:06:05 +0100	[thread overview]
Message-ID: <20211102120605.GB3230972@tucnak> (raw)
In-Reply-To: <20211102115652.GD304296@tucnak>

On Tue, Nov 02, 2021 at 12:56:53PM +0100, Jakub Jelinek wrote:
> Consider attached testcases Whomoglyph1.C and Whomoglyph2.C.
> On Whomoglyph1.C testcase, I'd expect a warning, because there is a clear
> confusion for the reader, something that isn't visible in any of emacs, vim,
> joe editors or on the terminal, when f3 uses scope identifier, the casual
> reader will expect that it uses N1::N2::scope, but there is no such
> variable, only one N1::N2::ѕсоре that visually looks the same, but has
> different UTF-8 chars in it.  So, name lookup will instead find N1::scope
> and use that.
> But Whomoglyph2.C will emit warnings that are IMHO not appropriate,
> I believe there is no confusion at all there, e.g. for both C and C++,
> the f5/f6 case, it doesn't really matter how each of the function names its
> own parameter, one can never access another function's parameter.
> Ditto for different namespace provided that both namespaces aren't searched
> in the same name lookup, or similarly classes etc.
> So, IMNSHO that warning belongs to name-lookup (cp/name-lookup.c for the C++
> FE).
> And, another important thing is that most users don't really use unicode in
> identifiers, I bet over 99.9% of identifiers don't have any >= 0x80
> characters in it and even when people do use them, confusable identifiers
> during the same lookup are even far more unlikely.
> So, I think we should optimize for the common case, ASCII only identifiers
> and spend as little compile time as possible on this stuff.

If we keep doing it in the stringpool, then e.g. one couldn't
#include <zlib.h>
in a program with Russian/Ukrainian/Serbian etc. identifiers where some parameter
or automatic variable etc. in some function in that file is called
с (Cyrillic letter es), etc. just because in zlib.h one of the arguments
in one of the function prototypes is called c (latin small letter c).
I'd be afraid most of the users that actually want to use UTF-8 or UCNs in
their identifiers would then just need to disable this warning...

	Jakub


  reply	other threads:[~2021-11-02 12:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 21:14 David Malcolm
2021-11-02 11:56 ` Jakub Jelinek
2021-11-02 12:06   ` Jakub Jelinek [this message]
2021-11-02 19:49 ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211102120605.GB3230972@tucnak \
    --to=jakub@redhat.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).