From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (wildebeest.demon.nl [212.238.236.112]) by sourceware.org (Postfix) with ESMTPS id 0E5E63858404 for ; Sat, 25 Sep 2021 11:53:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E5E63858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x0a.wildebeest.org [172.31.17.140]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 78D6230008AE; Sat, 25 Sep 2021 13:53:08 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 234752E80BE3; Sat, 25 Sep 2021 13:53:08 +0200 (CEST) Date: Sat, 25 Sep 2021 13:53:08 +0200 From: Mark Wielaard To: Philip Herron Cc: Arthur Cohen , gcc-rust@gcc.gnu.org, Thomas Schwinge Subject: Re: byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code) Message-ID: References: <20210921225430.166550-1-mark@klomp.org> <87k0j9ym7r.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-rust@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: gcc-rust mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Sep 2021 11:53:11 -0000 Hi Philip, On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote: > This is really useful information, will this mean that the lexer token will > need to represent strings differently as well? Or is the std::string in the > lexer still ok? I think the respresentation as std::string is fine. As long as we don't mix std::strings between different types (byte strings may contain sequences of chars that aren't valid utf-8 sequenecs). > The change you made above has the problem that reference types like, arrays > are forms of what rust calls covariant types since they might contain an > inference variable, so they require lookup to determine the base type. Its > likely there is a reference cycle here. Though this change will not be > correct for type checking purposes. The design of the type system is purely > about rust type checking and inferring types. OK, so how do I represent an reference to an array type that doesn't contain any inference variables? When we see a b"hello" byte string that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8]; So we know this is &[u8;5] and if we write: let a = b"hello"; We want to infer that a has type &[u8;5]. > So for example this change will break the case of: > > ``` > let a:str = "test"; > ``` > > Since the TypePath of str can't know the size of the expected array at > compilation time. And the error message will end up with something like > "expected str got [i8, 4]"; Right, but that is for "proper strings". It is somewhat unfortunate that Rust calls byte strings also "strings", but they really aren't. b"abc" is static array of u8, not a &str (containing utf-8). I have to think about the slicing of "proper strings", which sound more complicated than slicing of byte strings, because I don't think you want to chop up a utf-8 sequence. For now I would simply try to get the type of byte strings like b"test" correct. Cheers, Mark