Hi Philip,
On Fri, Sep 24, 2021 at 12:01:42PM +0100, Philip Herron wrote:
> This is really useful information, will this mean that the lexer token will
> need to represent strings differently as well? Or is the std::string in the
> lexer still ok?
I think the respresentation as std::string is fine. As long as we
don't mix std::strings between different types (byte strings may
contain sequences of chars that aren't valid utf-8 sequenecs).
> The change you made above has the problem that reference types like, arrays
> are forms of what rust calls covariant types since they might contain an
> inference variable, so they require lookup to determine the base type. Its
> likely there is a reference cycle here. Though this change will not be
> correct for type checking purposes. The design of the type system is purely
> about rust type checking and inferring types.
OK, so how do I represent an reference to an array type that doesn't
contain any inference variables? When we see a b"hello" byte string
that is the same as seeing &[b'h', b'e', b'l', b'l', b'o'] which is
the same as seeing &[0x68u8, 0x65u8, 0x6cu8, 0x6cu8, 0x6fu8];
So we know this is &[u8;5] and if we write:
let a = b"hello";
We want to infer that a has type &[u8;5].
> So for example this change will break the case of:
>
> ```
> let a:str = "test";
> ```
>
> Since the TypePath of str can't know the size of the expected array at
> compilation time. And the error message will end up with something like
> "expected str got [i8, 4]";
Right, but that is for "proper strings". It is somewhat unfortunate
that Rust calls byte strings also "strings", but they really
aren't. b"abc" is static array of u8, not a &str (containing utf-8).
I have to think about the slicing of "proper strings", which sound
more complicated than slicing of byte strings, because I don't think
you want to chop up a utf-8 sequence. For now I would simply try to
get the type of byte strings like b"test" correct.
Cheers,
Mark