From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gandi.kataplop.net (gandi.kataplop.net [46.226.111.114]) by sourceware.org (Postfix) with ESMTPS id 489D63861C54 for ; Tue, 6 Jul 2021 08:31:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 489D63861C54 Received: from [176.191.105.132] (helo=arrakis) by gandi.kataplop.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m0gUS-0000Gs-43; Tue, 06 Jul 2021 10:31:36 +0200 From: Marc To: Mark Wielaard Cc: gcc-rust@gcc.gnu.org Subject: Re: UTF-8 BOM handling References: <20210705193748.124938-1-mark@klomp.org> Date: Tue, 06 Jul 2021 10:31:28 +0200 In-Reply-To: <20210705193748.124938-1-mark@klomp.org> (Mark Wielaard's message of "Mon, 5 Jul 2021 21:37:46 +0200") Message-ID: <877di3268v.fsf@arrakis.kataplop.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam_score: -1.0 X-Spam_bar: - X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2021 08:31:39 -0000 Mark Wielaard writes: > Hi, > > A rust source file can start with a UTF-8 BOM sequence (EF BB > BF). This simply indicates that the file is encoded as UTF-8 (all rust > input is interpreted as asequence of Unicode code points encoded in > UTF-8) so can be skipped before starting real lexing. > > It isn't necessary to keep track of the BOM in the AST or HIR Crate > classes. So I removed the has_utf8bom flag. > > Also included are a couple of simple tests to show we handle the BOM > correctly now. Merged : https://github.com/Rust-GCC/gccrs/pull/552 Marc