From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <philip.herron@embecosm.com>
Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com
 [IPv6:2a00:1450:4864:20::52d])
 by sourceware.org (Postfix) with ESMTPS id 0C6DC3858C2C
 for <gcc-rust@gcc.gnu.org>; Thu, 30 Sep 2021 10:46:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0C6DC3858C2C
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=embecosm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com
Received: by mail-ed1-x52d.google.com with SMTP id l8so20681378edw.2
 for <gcc-rust@gcc.gnu.org>; Thu, 30 Sep 2021 03:46:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=/QZ+7/2zOkD9mQPd4MV8yMGQmOijX3bgDAv/rmNVImw=;
 b=XzymgbFOpQiUdsIX+c+Ei0R6TakptaFZHv/D9S2YZKPIw3UsdcGnQ95T9JcdAa7S/i
 7feeKC6gZLfXZb0xqri4tAqdnahZ8Z5jA+bDcvsdczYWXnUGPeto6lU33RME+rkguTmc
 ZI6xNmqr/sYCsNFTJe+z1opuHLM61wYt3xQvAeEtu1FB19jcrgboinbTsoYQfg7bq6UF
 j2XlKiSNRv9/saacIiVyljZnR8l6n9AXJWivhTV1gyd+SLkLKNt6f6OVqy1QLVUtW89c
 Ja2yLEJ3Q+w4fC553ZsYuzgC6ccSBYqMP4nrO8yNPkKITeh8Hb4Q8hqTt+m7H/AGjlFc
 GckQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=/QZ+7/2zOkD9mQPd4MV8yMGQmOijX3bgDAv/rmNVImw=;
 b=vWRxWJPNTZ3/IR055kL6/u2PulNR8sH2KCCJFVU82WfntsxNjS5HvqNZhmtb56jOIW
 AxrpuGBpSeT0VKrXr7xg6JGjj0/SwkhQg755TaoM5Fq9smhO+yJinEcH+1jHfXr+9Y7k
 XxGNQuO8H+L80zZU1ytCVQG402VBEtUIW5a2VflG/IPT3rhGxuqRkk5fBuM+6IzzvYVC
 d62P3O50FjbACHmeBDlE9vRtHHyYqRkT73CgvOoqhj8I7ctQOWzvoGlxy3byFUt5U/dY
 G9VZszyXTtf7Zf+AM16CcnUHzc3P7hTr7sNyRZzXz2w4RbJf9j1anEHzYi689CMr1HvA
 47bQ==
X-Gm-Message-State: AOAM530D2oVMijG3Xew433UwdW7+AznSWIgDba72qemSvAXDtmYXO491
 zCVcalzzmDSlyo52e8QZxn9hSPtd+4j7bgDTO6H+Cg==
X-Google-Smtp-Source: ABdhPJydhHf7ZsCQCfG0sWZGzInonBtqUxgcq4V3+OfL8hgsfObDgFI8xrj0On4bQp7sGrDINQRT3Wt5vnJg2REROPM=
X-Received: by 2002:a17:906:60c7:: with SMTP id
 f7mr5920576ejk.57.1632998801038; 
 Thu, 30 Sep 2021 03:46:41 -0700 (PDT)
MIME-Version: 1.0
References: <20210921225430.166550-1-mark@klomp.org>
 <87k0j9ym7r.fsf@euler.schwinge.homeip.net>
 <YUuUEnzWnUMmjFhg@wildebeest.org>
 <CAB2u+n2vGM32BKmb0yJnZMJirxPtKFzximSH8YU4M_SiKpHOnw@mail.gmail.com>
 <CAKNo5ARVKRpr378OsNRP5xU4QxpL9f98qtEbdGLSikUALfTQ+Q@mail.gmail.com>
 <YUzpSop/pE8TVKlh@wildebeest.org>
 <CAB2u+n11kpd0KwsZZu6cXCsqHcALmFRfQOiQA=L1NRW8-faCjQ@mail.gmail.com>
 <YU8NpNfIgnJRoxbi@wildebeest.org>
In-Reply-To: <YU8NpNfIgnJRoxbi@wildebeest.org>
From: Philip Herron <philip.herron@embecosm.com>
Date: Thu, 30 Sep 2021 11:46:30 +0100
Message-ID: <CAB2u+n31F2RdraZsytXG-Fu9+w3cg2bKrjqsrJ8BqVe8PXJZrQ@mail.gmail.com>
Subject: Re: byte/char string representation (Was: [PATCH] Fix byte char and
 byte string lexing code)
To: Mark Wielaard <mark@klomp.org>
Cc: Arthur Cohen <cohenarthur.dev@gmail.com>, gcc-rust@gcc.gnu.org, 
 Thomas Schwinge <thomas@codesourcery.com>
Content-Type: multipart/alternative; boundary="00000000000074761b05cd34298e"
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-rust@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: gcc-rust mailing list <gcc-rust.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-rust>,
 <mailto:gcc-rust-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-rust/>
List-Post: <mailto:gcc-rust@gcc.gnu.org>
List-Help: <mailto:gcc-rust-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-rust>,
 <mailto:gcc-rust-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2021 10:46:43 -0000

--00000000000074761b05cd34298e
Content-Type: text/plain; charset="UTF-8"

Hi Mark,

Thanks for clarifying this, I was getting mixed up between normal str's and
byte strings. Your patch was 99% of the way there to fix the type
resolution so I finished it off for you:

https://github.com/Rust-GCC/gccrs/pull/698/files

The missing piece was that References and Array's are a type of covariant
type so that an array type can look like this: [_, capacity], so the
inference variable here is the variant so that we need to make sure it has
its own implicit mapping id. You just needed to create one more mapping to
get that implicit id so that the reference type similarly doesn't get into
a loop of looking up itself. Creating implicit types like this could be
made easier, so we should likely add some helpers for this scenario.

Let me know what you think.

Thanks

--Phil

On Sat, 25 Sept 2021 at 12:53, Mark Wielaard <mark@klomp.org> wrote:

> Hi Philip,
>
> On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:
> > This is really useful information, will this mean that the lexer token
> will
> > need to represent strings differently as well? Or is the std::string in
> the
> > lexer still ok?
>
> I think the respresentation as std::string is fine. As long as we
> don't mix std::strings between different types (byte strings may
> contain sequences of chars that aren't valid utf-8 sequenecs).
>
> > The change you made above has the problem that reference types like,
> arrays
> > are forms of what rust calls covariant types since they might contain an
> > inference variable, so they require lookup to determine the base type.
> Its
> > likely there is a reference cycle here. Though this change will not be
> > correct for type checking purposes. The design of the type system is
> purely
> > about rust type checking and inferring types.
>
> OK, so how do I represent an reference to an array type that doesn't
> contain any inference variables? When we see a b"hello" byte string
> that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is
> the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];
>
> So we know this is &[u8;5] and if we write:
>
> let a = b"hello";
>
> We want to infer that a has type &[u8;5].
>
> > So for example this change will break the case of:
> >
> > ```
> >   let a:str = "test";
> > ```
> >
> > Since the TypePath of str can't know the size of the expected array at
> > compilation time. And the error message will end up with something like
> > "expected str got [i8, 4]";
>
> Right, but that is for "proper strings". It is somewhat unfortunate
> that Rust calls byte strings also "strings", but they really
> aren't. b"abc" is static array of u8, not a &str (containing utf-8).
>
> I have to think about the slicing of "proper strings", which sound
> more complicated than slicing of byte strings, because I don't think
> you want to chop up a utf-8 sequence. For now I would simply try to
> get the type of byte strings like b"test" correct.
>
> Cheers,
>
> Mark
>
>

--00000000000074761b05cd34298e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi Mark,</div><div><br></div><div>Thanks for clarifyi=
ng this, I was getting mixed up between normal str&#39;s and byte strings. =
Your patch was 99% of the way there to fix the type resolution so I finishe=
d it off for you:<br><br><a href=3D"https://github.com/Rust-GCC/gccrs/pull/=
698/files">https://github.com/Rust-GCC/gccrs/pull/698/files</a></div><div><=
br></div><div>The missing piece was that References and Array&#39;s are a t=
ype of covariant type so that an array type can look like this: [_, capacit=
y], so the inference variable here is the variant so that we need to make s=
ure it has its own implicit mapping id. You just needed to create one more =
mapping to get that implicit id so that the reference type similarly doesn&=
#39;t get into a loop of looking up itself. Creating implicit types like th=
is could be made easier, so we should likely add some helpers for this scen=
ario.</div><div><br></div><div>Let me know what you think.</div><div><br></=
div><div>Thanks</div><div><br></div><div>--Phil<br></div></div><br><div cla=
ss=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sat, 25 Sept 20=
21 at 12:53, Mark Wielaard &lt;<a href=3D"mailto:mark@klomp.org">mark@klomp=
.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex">Hi Philip,<br>
<br>
On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:<br>
&gt; This is really useful information, will this mean that the lexer token=
 will<br>
&gt; need to represent strings differently as well? Or is the std::string i=
n the<br>
&gt; lexer still ok?<br>
<br>
I think the respresentation as std::string is fine. As long as we<br>
don&#39;t mix std::strings between different types (byte strings may<br>
contain sequences of chars that aren&#39;t valid utf-8 sequenecs).<br>
<br>
&gt; The change you made above has the problem that reference types like, a=
rrays<br>
&gt; are forms of what rust calls covariant types since they might contain =
an<br>
&gt; inference variable, so they require lookup to determine the base type.=
 Its<br>
&gt; likely there is a reference cycle here. Though this change will not be=
<br>
&gt; correct for type checking purposes. The design of the type system is p=
urely<br>
&gt; about rust type checking and inferring types.<br>
<br>
OK, so how do I represent an reference to an array type that doesn&#39;t<br=
>
contain any inference variables? When we see a b&quot;hello&quot; byte stri=
ng<br>
that is the same as seeing &amp;[b&#39;h&#39;, b&#39;e&#39;, b&#39;l&#39;, =
b&#39;l&#39;, b&#39;o&#39;] which is<br>
the same as seeing &amp;[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];<br>
<br>
So we know this is &amp;[u8;5] and if we write:<br>
<br>
let a =3D b&quot;hello&quot;;<br>
<br>
We want to infer that a has type &amp;[u8;5].<br>
<br>
&gt; So for example this change will break the case of:<br>
&gt; <br>
&gt; ```<br>
&gt;=C2=A0 =C2=A0let a:str =3D &quot;test&quot;;<br>
&gt; ```<br>
&gt; <br>
&gt; Since the TypePath of str can&#39;t know the size of the expected arra=
y at<br>
&gt; compilation time. And the error message will end up with something lik=
e<br>
&gt; &quot;expected str got [i8, 4]&quot;;<br>
<br>
Right, but that is for &quot;proper strings&quot;. It is somewhat unfortuna=
te<br>
that Rust calls byte strings also &quot;strings&quot;, but they really<br>
aren&#39;t. b&quot;abc&quot; is static array of u8, not a &amp;str (contain=
ing utf-8).<br>
<br>
I have to think about the slicing of &quot;proper strings&quot;, which soun=
d<br>
more complicated than slicing of byte strings, because I don&#39;t think<br=
>
you want to chop up a utf-8 sequence. For now I would simply try to<br>
get the type of byte strings like b&quot;test&quot; correct.<br>
<br>
Cheers,<br>
<br>
Mark<br>
<br>
</blockquote></div>

--00000000000074761b05cd34298e--