public inbox for gcc-rust@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: Philip Herron <philip.herron@embecosm.com>
Cc: Arthur Cohen <cohenarthur.dev@gmail.com>,
	gcc-rust@gcc.gnu.org, Thomas Schwinge <thomas@codesourcery.com>
Subject: Re: byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code)
Date: Sat, 25 Sep 2021 13:53:08 +0200	[thread overview]
Message-ID: <YU8NpNfIgnJRoxbi@wildebeest.org> (raw)
In-Reply-To: <CAB2u+n11kpd0KwsZZu6cXCsqHcALmFRfQOiQA=L1NRW8-faCjQ@mail.gmail.com>

Hi Philip,

On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:
> This is really useful information, will this mean that the lexer token will
> need to represent strings differently as well? Or is the std::string in the
> lexer still ok?

I think the respresentation as std::string is fine. As long as we
don't mix std::strings between different types (byte strings may
contain sequences of chars that aren't valid utf-8 sequenecs).

> The change you made above has the problem that reference types like, arrays
> are forms of what rust calls covariant types since they might contain an
> inference variable, so they require lookup to determine the base type. Its
> likely there is a reference cycle here. Though this change will not be
> correct for type checking purposes. The design of the type system is purely
> about rust type checking and inferring types.

OK, so how do I represent an reference to an array type that doesn't
contain any inference variables? When we see a b"hello" byte string
that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is
the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];

So we know this is &[u8;5] and if we write:

let a = b"hello";

We want to infer that a has type &[u8;5].

> So for example this change will break the case of:
> 
> ```
>   let a:str = "test";
> ```
> 
> Since the TypePath of str can't know the size of the expected array at
> compilation time. And the error message will end up with something like
> "expected str got [i8, 4]";

Right, but that is for "proper strings". It is somewhat unfortunate
that Rust calls byte strings also "strings", but they really
aren't. b"abc" is static array of u8, not a &str (containing utf-8).

I have to think about the slicing of "proper strings", which sound
more complicated than slicing of byte strings, because I don't think
you want to chop up a utf-8 sequence. For now I would simply try to
get the type of byte strings like b"test" correct.

Cheers,

Mark


  reply	other threads:[~2021-09-25 11:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 22:54 [PATCH] Fix byte char and byte string lexing code Mark Wielaard
2021-09-22  9:48 ` Thomas Schwinge
2021-09-22 20:37   ` Mark Wielaard
2021-09-23 11:43     ` Philip Herron
2021-09-23 14:10       ` Arthur Cohen
2021-09-23 20:53         ` byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code) Mark Wielaard
2021-09-24 11:01           ` Philip Herron
2021-09-25 11:53             ` Mark Wielaard [this message]
2021-09-30 10:46               ` Philip Herron
2021-10-03 22:04                 ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YU8NpNfIgnJRoxbi@wildebeest.org \
    --to=mark@klomp.org \
    --cc=cohenarthur.dev@gmail.com \
    --cc=gcc-rust@gcc.gnu.org \
    --cc=philip.herron@embecosm.com \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).