From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <polacek@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id C86DE385840A
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 03:05:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C86DE385840A
Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com
 [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-259-jVb0pe2vMVCg8ogUMlVQzw-1; Tue, 16 Nov 2021 22:05:24 -0500
X-MC-Unique: jVb0pe2vMVCg8ogUMlVQzw-1
Received: by mail-qt1-f198.google.com with SMTP id
 u14-20020a05622a198e00b002b2f35a6dcfso944957qtc.21
 for <gcc-patches@gcc.gnu.org>; Tue, 16 Nov 2021 19:05:24 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to:user-agent;
 bh=UWIZol8J67uaPD/M4MNtUBvJIo0Wa2JiBV+kD7/2aR0=;
 b=CJKH3YXziLrKMcQgX02MVu+3qLfagoz94mINVU6O7J9/UIJ79Am6YMuhu+vIsssgtu
 TPBY7joHaQ5pBa+K0e4JehseuKaNVw74LNS0M//M4TY/B7m+zy01nXz1fU4D6ctZ/+f1
 RjXUkmPFg+mBNurVJeYuzLgjRdHJiXmTDLBIzB3T8lMctYFC1ovHDU4JMSD0PMBYmoHH
 kwU7teBjLmpq3UJ+0Ek97o7oyKibp0VnzDgd9Nxb5czsbZTNYmeQ2pJzhWxaMqlMlvwy
 6iiWWf8UW1vkZoTJz3R4bCFmadwB4YTJfcBXtoNQRoj2gvkrvNpDdTDNaeh3zsipa4JH
 Jhyw==
X-Gm-Message-State: AOAM532loZ+047URk0xtBN2TGzfwNHjj47pvOfI/UT1Wu/5nlfBUYS9s
 0drIV/Rua/2uamA+em+iLPt0pT3Q9i1Wg7XTAS+mZSequTrNR/vKmxIJMMRJgbqa3/oosiPADaf
 cE/y0AKZaDJEqHoOBhw==
X-Received: by 2002:ac8:5e46:: with SMTP id i6mr12893082qtx.7.1637118323863;
 Tue, 16 Nov 2021 19:05:23 -0800 (PST)
X-Google-Smtp-Source: ABdhPJxK/eAqK32ojNu8ZC8OxuOMrtR1OsPRGkY6aWOVqKgJ2pjtAe+zYw/vNN7do+NgRaLQChJ3vQ==
X-Received: by 2002:ac8:5e46:: with SMTP id i6mr12893057qtx.7.1637118323626;
 Tue, 16 Nov 2021 19:05:23 -0800 (PST)
Received: from redhat.com ([2601:184:4780:4310::aac2])
 by smtp.gmail.com with ESMTPSA id bq43sm7502025qkb.32.2021.11.16.19.05.22
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 16 Nov 2021 19:05:23 -0800 (PST)
Date: Tue, 16 Nov 2021 22:05:21 -0500
From: Marek Polacek <polacek@redhat.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>,
 Jakub Jelinek <jakub@redhat.com>, Martin Sebor <msebor@redhat.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v3] libcpp: Implement -Wbidi-chars for CVE-2021-42574
 [PR103026]
Message-ID: <YZRxcWcQgVRRZz6p@redhat.com>
References: <YYFzA3Z4rPwTo927@redhat.com>
 <49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com>
 <YYGW5eiZhSxq6aHl@redhat.com> <YYmXt6jB3Qq3qPIW@redhat.com>
 <YZKY2b8utf/5/f6o@redhat.com>
 <a4463efe9cd50165a6fcd1ca05470a03595792f3.camel@redhat.com>
 <YZQLlHfq740IOtje@redhat.com>
 <5422a6074f9a8877158b733c5acb2a9eecd4c000.camel@redhat.com>
 <YZROxBnxZQCTYdzl@redhat.com>
 <e06ba686a6439dc04a5a48efd424c45f3335b8f8.camel@redhat.com>
MIME-Version: 1.0
In-Reply-To: <e06ba686a6439dc04a5a48efd424c45f3335b8f8.camel@redhat.com>
User-Agent: Mutt/2.1.3 (2021-09-10)
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2021 03:05:27 -0000

On Tue, Nov 16, 2021 at 09:28:21PM -0500, David Malcolm wrote:
> On Tue, 2021-11-16 at 19:37 -0500, Marek Polacek wrote:
> > Sorry for a dumb question, but is this what you have in mind?
> > 
> > /* LRE
> >    PDF */
> > /* FSI
> >    PDI */
> > and check that we warn for these?
> 
> I mean something like the following multiline comments in which lines
> within them at the start, middle and end have unpaired constructs
> within a given line:
> 
> 
> /* RLI
>  *
>  */
> 
> /*
>  * RLI
>  */
> 
> /*
>  *  
>  * RLI */
> 
> and that we should warn for each case at the line containing the
> unpaired control character.
> 
> (the above lines don't have the actual chars, just "RLI")
> 
> Mostly this is just me trying to think about it from a black-box
> testing perspective, or in case we ever touch this code in the future
> (perhaps it's obviously correct by inspection of the implementation
> now, but let's have regression tests for these cases).
> 
> Sorry to add more work, but here's an idea for another test case:
> multiple comments on one line:
> 
>   /* RLI */  /* PDF */
> 
> where the closure of a comment should trigger closing a "context", so
> we should complain about the above.

No problem, I've added these.
 
> > 
> > > > > > @@ -1505,13 +1855,17 @@ lex_identifier (cpp_reader *pfile,
> > > > > > const uchar *base, bool starts_ucn,
> > > > > >      {
> > > > > >        /* Slower version for identifiers containing UCNs
> > > > > >          or extended chars (including $).  */
> > > > > > -      do {
> > > > > > -       while (ISIDNUM (*pfile->buffer->cur))
> > > > > > -         {
> > > > > > -           NORMALIZE_STATE_UPDATE_IDNUM (nst, *pfile->buffer-
> > > > > > >cur);
> > > > > > -           pfile->buffer->cur++;
> > > > > > -         }
> > > > > > -      } while (forms_identifier_p (pfile, false, nst));
> > > > > > +      do
> > > > > > +       {
> > > > > > +         while (ISIDNUM (*pfile->buffer->cur))
> > > > > > +           {
> > > > > > +             NORMALIZE_STATE_UPDATE_IDNUM (nst, *pfile-
> > > > > > >buffer->cur);
> > > > > > +             pfile->buffer->cur++;
> > > > > > +           }
> > > > > > +       }
> > > > > > +      while (forms_identifier_p (pfile, false, nst));
> > > > > 
> > > > > Is the above purely a whitespace change?
> > > > 
> > > > Yes.
> > > 
> > > If I'm reading things correctly, these lines in the existing code
> > > were
> > > correctly indented, so is there a purpose to this change?  If not,
> > > please can you remove this change from the patch (to minimize the
> > > change to the history).
> > 
> > I dropped that change then.  Sometimes it's hard to resist fixing
> > formatting.  ;)
> 
> Thanks.  But I don't think the existing formatting in the code *was*
> broken; I thought the patch was taking correct formatting and breaking
> it (hence my objection to a whitespace change).  If I misread this,
> sorry.

I think it was, we're supposed to format do-while as

  do
    {
    }
  while (...);

but it's obviously not a big deal.

> Hopefully the above makes sense and is constructive; let me know when
> you push your patch so that I can work on my followup.

Pushed now.  Thanks!

Marek