From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id 3A0113855000 for ; Mon, 5 Jul 2021 19:38:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3A0113855000 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x01.wildebeest.org [172.31.17.131]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id BC98E302FBA6 for ; Mon, 5 Jul 2021 21:37:59 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 5088A2E80FEC; Mon, 5 Jul 2021 21:37:59 +0200 (CEST) From: Mark Wielaard To: gcc-rust@gcc.gnu.org Subject: UTF-8 BOM handling Date: Mon, 5 Jul 2021 21:37:46 +0200 Message-Id: <20210705193748.124938-1-mark@klomp.org> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jul 2021 19:38:02 -0000 Hi, A rust source file can start with a UTF-8 BOM sequence (EF BB BF). This simply indicates that the file is encoded as UTF-8 (all rust input is interpreted as asequence of Unicode code points encoded in UTF-8) so can be skipped before starting real lexing. It isn't necessary to keep track of the BOM in the AST or HIR Crate classes. So I removed the has_utf8bom flag. Also included are a couple of simple tests to show we handle the BOM correctly now. [PATCH 1/2] Handle UTF-8 BOM in lexer [PATCH 2/2] Remove has_utf8bom flag from AST and HIR Crate classes Cheers, Mark