public inbox for gcc-rust@gcc.gnu.org
 help / color / mirror / Atom feed
From: Philip Herron <philip.herron@embecosm.com>
To: Mark Wielaard <mark@klomp.org>
Cc: Arthur Cohen <cohenarthur.dev@gmail.com>,
	gcc-rust@gcc.gnu.org,  Thomas Schwinge <thomas@codesourcery.com>
Subject: Re: byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code)
Date: Thu, 30 Sep 2021 11:46:30 +0100	[thread overview]
Message-ID: <CAB2u+n31F2RdraZsytXG-Fu9+w3cg2bKrjqsrJ8BqVe8PXJZrQ@mail.gmail.com> (raw)
In-Reply-To: <YU8NpNfIgnJRoxbi@wildebeest.org>

[-- Attachment #1: Type: text/plain, Size: 2963 bytes --]

Hi Mark,

Thanks for clarifying this, I was getting mixed up between normal str's and
byte strings. Your patch was 99% of the way there to fix the type
resolution so I finished it off for you:

https://github.com/Rust-GCC/gccrs/pull/698/files

The missing piece was that References and Array's are a type of covariant
type so that an array type can look like this: [_, capacity], so the
inference variable here is the variant so that we need to make sure it has
its own implicit mapping id. You just needed to create one more mapping to
get that implicit id so that the reference type similarly doesn't get into
a loop of looking up itself. Creating implicit types like this could be
made easier, so we should likely add some helpers for this scenario.

Let me know what you think.

Thanks

--Phil

On Sat, 25 Sept 2021 at 12:53, Mark Wielaard <mark@klomp.org> wrote:

> Hi Philip,
>
> On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:
> > This is really useful information, will this mean that the lexer token
> will
> > need to represent strings differently as well? Or is the std::string in
> the
> > lexer still ok?
>
> I think the respresentation as std::string is fine. As long as we
> don't mix std::strings between different types (byte strings may
> contain sequences of chars that aren't valid utf-8 sequenecs).
>
> > The change you made above has the problem that reference types like,
> arrays
> > are forms of what rust calls covariant types since they might contain an
> > inference variable, so they require lookup to determine the base type.
> Its
> > likely there is a reference cycle here. Though this change will not be
> > correct for type checking purposes. The design of the type system is
> purely
> > about rust type checking and inferring types.
>
> OK, so how do I represent an reference to an array type that doesn't
> contain any inference variables? When we see a b"hello" byte string
> that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is
> the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];
>
> So we know this is &[u8;5] and if we write:
>
> let a = b"hello";
>
> We want to infer that a has type &[u8;5].
>
> > So for example this change will break the case of:
> >
> > ```
> >   let a:str = "test";
> > ```
> >
> > Since the TypePath of str can't know the size of the expected array at
> > compilation time. And the error message will end up with something like
> > "expected str got [i8, 4]";
>
> Right, but that is for "proper strings". It is somewhat unfortunate
> that Rust calls byte strings also "strings", but they really
> aren't. b"abc" is static array of u8, not a &str (containing utf-8).
>
> I have to think about the slicing of "proper strings", which sound
> more complicated than slicing of byte strings, because I don't think
> you want to chop up a utf-8 sequence. For now I would simply try to
> get the type of byte strings like b"test" correct.
>
> Cheers,
>
> Mark
>
>

[-- Attachment #2: Type: text/html, Size: 3842 bytes --]

  reply	other threads:[~2021-09-30 10:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 22:54 [PATCH] Fix byte char and byte string lexing code Mark Wielaard
2021-09-22  9:48 ` Thomas Schwinge
2021-09-22 20:37   ` Mark Wielaard
2021-09-23 11:43     ` Philip Herron
2021-09-23 14:10       ` Arthur Cohen
2021-09-23 20:53         ` byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code) Mark Wielaard
2021-09-24 11:01           ` Philip Herron
2021-09-25 11:53             ` Mark Wielaard
2021-09-30 10:46               ` Philip Herron [this message]
2021-10-03 22:04                 ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAB2u+n31F2RdraZsytXG-Fu9+w3cg2bKrjqsrJ8BqVe8PXJZrQ@mail.gmail.com \
    --to=philip.herron@embecosm.com \
    --cc=cohenarthur.dev@gmail.com \
    --cc=gcc-rust@gcc.gnu.org \
    --cc=mark@klomp.org \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).