From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 89EA1385AC36 for ; Tue, 30 Nov 2021 15:27:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 89EA1385AC36 Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-555-fk1Ec1R1MhSqAVL83lzhBw-1; Tue, 30 Nov 2021 10:27:30 -0500 X-MC-Unique: fk1Ec1R1MhSqAVL83lzhBw-1 Received: by mail-qk1-f200.google.com with SMTP id bp17-20020a05620a459100b0045e893f2ed8so28890912qkb.11 for ; Tue, 30 Nov 2021 07:27:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=SjKirRlSJy60EtUVQMWMBPLFmONDFkhygUvzcbmcykE=; b=xNQPeTuzsnr4QbydpFKWw3Kvdud2xqGDOu1E7UNROXA3s2Olfo1cWCu+bTYd2XUIyt SWSVO4uXi6aAOy0k+z3qesyApcVYOQLALXDUG+Ktg9IQgvS3lq2TCkh2zizyBC6cAoBg m9KTg7aHzxSBzkf2SXXOWjEffRA4tS5SHxn70/MsZQlAsDotKNnJW9iu+1teQ3oihT13 E3cN5Kd812nm+KnkgmE9ud2nlhri6iyEGjeIGrsS+gP9ov0n/uMWHH1D6oTHb+lmQf9z ydEYYYdkn1J+rGG2GlBvzDaY8ZmHJUzO8U0U2DcG46d0fZztKlwH7z+UPyYV6EgT7mTY ADBA== X-Gm-Message-State: AOAM530Hw0qG46GOi/y8pyJiiLYXJHApkKzNLB30m/ATZiEE+6ujwZFI +i26zWes4z33xY9WvpqAmJ8MWl1bq5kJWZD+BPlUJ+oCAhFByErGPfzom/VkhmRbTiy5naAm1QE dekJdFs/ToaL9FhUG+g== X-Received: by 2002:a05:622a:454:: with SMTP id o20mr42740618qtx.633.1638286049257; Tue, 30 Nov 2021 07:27:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJx1hsrzLVGSmVTZ6UH7vPAdEejB6ycVH5wpaA5cy8GISpjd/BLJusRvpjqLXvyOwwvD3dkTlA== X-Received: by 2002:a05:622a:454:: with SMTP id o20mr42740584qtx.633.1638286049013; Tue, 30 Nov 2021 07:27:29 -0800 (PST) Received: from redhat.com ([2601:184:4780:4310::aac2]) by smtp.gmail.com with ESMTPSA id u18sm10573150qki.69.2021.11.30.07.27.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Nov 2021 07:27:27 -0800 (PST) Date: Tue, 30 Nov 2021 10:27:26 -0500 From: Marek Polacek To: Stephan Bergmann Cc: Jakub Jelinek , Martin Sebor , GCC Patches , Joseph Myers Subject: Re: [PATCH] libcpp: Implement -Wbidi-chars for CVE-2021-42574 [PR103026] Message-ID: References: <20211101163652.36794-1-polacek@redhat.com> <49fc8d97-4cc7-5c78-d65a-02ea1419f483@gmail.com> <1ab9d323-af24-c2a0-0c6a-596b3f68c605@redhat.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.1.3 (2021-09-10) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Nov 2021 15:27:36 -0000 On Tue, Nov 30, 2021 at 04:00:01PM +0100, Stephan Bergmann wrote: > On 30/11/2021 14:26, Marek Polacek wrote: > > On Tue, Nov 30, 2021 at 09:38:57AM +0100, Stephan Bergmann wrote: > > > On 15/11/2021 18:28, Marek Polacek via Gcc-patches wrote: > > > > On Mon, Nov 08, 2021 at 04:33:43PM -0500, Marek Polacek wrote: > > > > > Ping, can we conclude on the name? IMHO, -Wbidirectional is just fine, > > > > > but changing the name is a trivial operation. > > > > > > > > Here's a patch with a better name (suggested by Jonathan W.). Otherwise no > > > > changes. > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > > > > > -- >8 -- > > > > From a link below: > > > > "An issue was discovered in the Bidirectional Algorithm in the Unicode > > > > Specification through 14.0. It permits the visual reordering of > > > > characters via control sequences, which can be used to craft source code > > > > that renders different logic than the logical ordering of tokens > > > > ingested by compilers and interpreters. Adversaries can leverage this to > > > > encode source code for compilers accepting Unicode such that targeted > > > > vulnerabilities are introduced invisibly to human reviewers." > > > > > > > > More info: > > > > https://nvd.nist.gov/vuln/detail/CVE-2021-42574 > > > > https://trojansource.codes/ > > > > > > > > This is not a compiler bug. However, to mitigate the problem, this patch > > > > implements -Wbidi-chars=[none|unpaired|any] to warn about possibly > > > > misleading Unicode bidirectional characters the preprocessor may encounter. > > > > > > > > The default is =unpaired, which warns about improperly terminated > > > > bidirectional characters; e.g. a LRE without its appertaining PDF. The > > > > level =any warns about any use of bidirectional characters. > > > > > > > > This patch handles both UCNs and UTF-8 characters. UCNs designating > > > > bidi characters in identifiers are accepted since r204886. Then r217144 > > > > enabled -fextended-identifiers by default. Extended characters in C/C++ > > > > identifiers have been accepted since r275979. However, this patch still > > > > warns about mixing UTF-8 and UCN bidi characters; there seems to be no > > > > good reason to allow mixing them. > > > > > > I wonder what the rationale is to warn about UCNs, like in > > > > > > > aText = u"\u202D" + aText; > > > > > > (as found in the LibreOffice source code). > > > > Is this line mixing a UCN and a UTF-8? Or is it just that you're > > prepending a LRO to aText? We warn because the LRO is not "closed" > > in the context of its string literal, which was part of the Trojan > > source attack. So "\u202D ... \u202C" would not warn. > > > > I'm not sure what workaround I could offer. Maybe provide an option not to > > warn about UCNs at all, though even that is potentially dangerous -- while > > you can see UCNs in the source code, if you print strings containing them, > > they won't be visible anymore. > > I'm not sure what you mean with "mixing a UCN and a UTF-8", but what the > code apparently does is programmatically constructing a larger piece of text > by prepending LRO to an existing piece of text. > > My understanding is that Trojan Source is concerned with presentation of > program source code and not with properties of Unicode text constructed > during the execution of such a program, and from the documentation quoted > above I understand that -Wbidi-chars is meant to address Trojan Source, so I > don't understand why you're concerned here with what happens "if you print > strings containing [UCNs in the source code]". > > Short of a source code viewer that interprets UCNs in C/C++ source code and > renders them in the same way as their corresponding Unicode characters, I > don't think that UCNs are relevant for Trojan Source, and don't understand > why -Wbidi-chars would warn about them. I guess we were concerned with programs that generate other programs. Maybe UCNs should be ignored by default. There's still time to adjust the behavior. > (Also, I noticed that it doesn't work to silence -Werror=bidi-chars= with a > > > #pragma GCC diagnostic ignored "-Wbidi-chars" Yeah, it doesn't work with C++, it's https://gcc.gnu.org/PR53431 :( Marek