From: Mark Wielaard <mark@klomp.org>
To: Philip Herron <philip.herron@embecosm.com>
Cc: Arthur Cohen <cohenarthur.dev@gmail.com>,
gcc-rust@gcc.gnu.org, Thomas Schwinge <thomas@codesourcery.com>
Subject: Re: byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code)
Date: Sat, 25 Sep 2021 13:53:08 +0200 [thread overview]
Message-ID: <YU8NpNfIgnJRoxbi@wildebeest.org> (raw)
In-Reply-To: <CAB2u+n11kpd0KwsZZu6cXCsqHcALmFRfQOiQA=L1NRW8-faCjQ@mail.gmail.com>
Hi Philip,
On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:
> This is really useful information, will this mean that the lexer token will
> need to represent strings differently as well? Or is the std::string in the
> lexer still ok?
I think the respresentation as std::string is fine. As long as we
don't mix std::strings between different types (byte strings may
contain sequences of chars that aren't valid utf-8 sequenecs).
> The change you made above has the problem that reference types like, arrays
> are forms of what rust calls covariant types since they might contain an
> inference variable, so they require lookup to determine the base type. Its
> likely there is a reference cycle here. Though this change will not be
> correct for type checking purposes. The design of the type system is purely
> about rust type checking and inferring types.
OK, so how do I represent an reference to an array type that doesn't
contain any inference variables? When we see a b"hello" byte string
that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is
the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];
So we know this is &[u8;5] and if we write:
let a = b"hello";
We want to infer that a has type &[u8;5].
> So for example this change will break the case of:
>
> ```
> let a:str = "test";
> ```
>
> Since the TypePath of str can't know the size of the expected array at
> compilation time. And the error message will end up with something like
> "expected str got [i8, 4]";
Right, but that is for "proper strings". It is somewhat unfortunate
that Rust calls byte strings also "strings", but they really
aren't. b"abc" is static array of u8, not a &str (containing utf-8).
I have to think about the slicing of "proper strings", which sound
more complicated than slicing of byte strings, because I don't think
you want to chop up a utf-8 sequence. For now I would simply try to
get the type of byte strings like b"test" correct.
Cheers,
Mark
next prev parent reply other threads:[~2021-09-25 11:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-21 22:54 [PATCH] Fix byte char and byte string lexing code Mark Wielaard
2021-09-22 9:48 ` Thomas Schwinge
2021-09-22 20:37 ` Mark Wielaard
2021-09-23 11:43 ` Philip Herron
2021-09-23 14:10 ` Arthur Cohen
2021-09-23 20:53 ` byte/char string representation (Was: [PATCH] Fix byte char and byte string lexing code) Mark Wielaard
2021-09-24 11:01 ` Philip Herron
2021-09-25 11:53 ` Mark Wielaard [this message]
2021-09-30 10:46 ` Philip Herron
2021-10-03 22:04 ` Mark Wielaard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YU8NpNfIgnJRoxbi@wildebeest.org \
--to=mark@klomp.org \
--cc=cohenarthur.dev@gmail.com \
--cc=gcc-rust@gcc.gnu.org \
--cc=philip.herron@embecosm.com \
--cc=thomas@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).