From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id C86DE385840A for ; Wed, 17 Nov 2021 03:05:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C86DE385840A Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-259-jVb0pe2vMVCg8ogUMlVQzw-1; Tue, 16 Nov 2021 22:05:24 -0500 X-MC-Unique: jVb0pe2vMVCg8ogUMlVQzw-1 Received: by mail-qt1-f198.google.com with SMTP id u14-20020a05622a198e00b002b2f35a6dcfso944957qtc.21 for ; Tue, 16 Nov 2021 19:05:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=UWIZol8J67uaPD/M4MNtUBvJIo0Wa2JiBV+kD7/2aR0=; b=CJKH3YXziLrKMcQgX02MVu+3qLfagoz94mINVU6O7J9/UIJ79Am6YMuhu+vIsssgtu TPBY7joHaQ5pBa+K0e4JehseuKaNVw74LNS0M//M4TY/B7m+zy01nXz1fU4D6ctZ/+f1 RjXUkmPFg+mBNurVJeYuzLgjRdHJiXmTDLBIzB3T8lMctYFC1ovHDU4JMSD0PMBYmoHH kwU7teBjLmpq3UJ+0Ek97o7oyKibp0VnzDgd9Nxb5czsbZTNYmeQ2pJzhWxaMqlMlvwy 6iiWWf8UW1vkZoTJz3R4bCFmadwB4YTJfcBXtoNQRoj2gvkrvNpDdTDNaeh3zsipa4JH Jhyw== X-Gm-Message-State: AOAM532loZ+047URk0xtBN2TGzfwNHjj47pvOfI/UT1Wu/5nlfBUYS9s 0drIV/Rua/2uamA+em+iLPt0pT3Q9i1Wg7XTAS+mZSequTrNR/vKmxIJMMRJgbqa3/oosiPADaf cE/y0AKZaDJEqHoOBhw== X-Received: by 2002:ac8:5e46:: with SMTP id i6mr12893082qtx.7.1637118323863; Tue, 16 Nov 2021 19:05:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJxK/eAqK32ojNu8ZC8OxuOMrtR1OsPRGkY6aWOVqKgJ2pjtAe+zYw/vNN7do+NgRaLQChJ3vQ== X-Received: by 2002:ac8:5e46:: with SMTP id i6mr12893057qtx.7.1637118323626; Tue, 16 Nov 2021 19:05:23 -0800 (PST) Received: from redhat.com ([2601:184:4780:4310::aac2]) by smtp.gmail.com with ESMTPSA id bq43sm7502025qkb.32.2021.11.16.19.05.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Nov 2021 19:05:23 -0800 (PST) Date: Tue, 16 Nov 2021 22:05:21 -0500 From: Marek Polacek To: David Malcolm Cc: Joseph Myers , Jakub Jelinek , Martin Sebor , GCC Patches Subject: Re: [PATCH v3] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] Message-ID: References: <49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com> <5422a6074f9a8877158b733c5acb2a9eecd4c000.camel@redhat.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.1.3 (2021-09-10) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2021 03:05:27 -0000 On Tue, Nov 16, 2021 at 09:28:21PM -0500, David Malcolm wrote: > On Tue, 2021-11-16 at 19:37 -0500, Marek Polacek wrote: > > Sorry for a dumb question, but is this what you have in mind? > > > > /* LRE > >    PDF */ > > /* FSI > >    PDI */ > > and check that we warn for these? > > I mean something like the following multiline comments in which lines > within them at the start, middle and end have unpaired constructs > within a given line: > > > /* RLI > * > */ > > /* > * RLI > */ > > /* > *   > * RLI */ > > and that we should warn for each case at the line containing the > unpaired control character. > > (the above lines don't have the actual chars, just "RLI") > > Mostly this is just me trying to think about it from a black-box > testing perspective, or in case we ever touch this code in the future > (perhaps it's obviously correct by inspection of the implementation > now, but let's have regression tests for these cases). > > Sorry to add more work, but here's an idea for another test case: > multiple comments on one line: > > /* RLI */ /* PDF */ > > where the closure of a comment should trigger closing a "context", so > we should complain about the above. No problem, I've added these. > > > > > > > > @@ -1505,13 +1855,17 @@ lex_identifier (cpp_reader *pfile, > > > > > > const uchar *base, bool starts_ucn, > > > > > >      { > > > > > >        /* Slower version for identifiers containing UCNs > > > > > >          or extended chars (including $).  */ > > > > > > -      do { > > > > > > -       while (ISIDNUM (*pfile->buffer->cur)) > > > > > > -         { > > > > > > -           NORMALIZE_STATE_UPDATE_IDNUM (nst, *pfile->buffer- > > > > > > >cur); > > > > > > -           pfile->buffer->cur++; > > > > > > -         } > > > > > > -      } while (forms_identifier_p (pfile, false, nst)); > > > > > > +      do > > > > > > +       { > > > > > > +         while (ISIDNUM (*pfile->buffer->cur)) > > > > > > +           { > > > > > > +             NORMALIZE_STATE_UPDATE_IDNUM (nst, *pfile- > > > > > > >buffer->cur); > > > > > > +             pfile->buffer->cur++; > > > > > > +           } > > > > > > +       } > > > > > > +      while (forms_identifier_p (pfile, false, nst)); > > > > > > > > > > Is the above purely a whitespace change? > > > > > > > > Yes. > > > > > > If I'm reading things correctly, these lines in the existing code > > > were > > > correctly indented, so is there a purpose to this change?  If not, > > > please can you remove this change from the patch (to minimize the > > > change to the history). > > > > I dropped that change then.  Sometimes it's hard to resist fixing > > formatting.  ;) > > Thanks. But I don't think the existing formatting in the code *was* > broken; I thought the patch was taking correct formatting and breaking > it (hence my objection to a whitespace change). If I misread this, > sorry. I think it was, we're supposed to format do-while as do { } while (...); but it's obviously not a big deal. > Hopefully the above makes sense and is constructive; let me know when > you push your patch so that I can work on my followup. Pushed now. Thanks! Marek