Re: [PATCH] Allow non-ASCII characters in Rust identifiers

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>
Cc: Tom Tromey <tom@tromey.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers
Date: Mon, 04 Apr 2022 10:10:18 +0100	[thread overview]
Message-ID: <87zgl16wp1.fsf@redhat.com> (raw)
In-Reply-To: <875ynq8418.fsf@redhat.com>

Andrew Burgess <aburgess@redhat.com> writes:

> Tom Tromey <tom@tromey.com> writes:
>
>> Andrew> I'm seeing this test fail.
>>
>> Andrew>  $ rustc --version
>> Andrew>  rustc 1.59.0 (9d1b2106e 2022-02-23)
>>
>> I installed this version with "rustup toolchain install 1.59.0" and set
>> it to be my default.
>>
>> Andrew> I've tested with gdb commit a723766c0e2 and 5187219460c.
>>
>> I tried 552f1157c6262, a recent-ish git master.
>> It works fine for me.
>>
>> Andrew> Do these pass for you?  Any suggestions for where to start looking?
>>
>> I wonder if this line in the .exp isn't having the desired effect:
>>
>>     setenv LC_ALL C.UTF-8
>>
>> Is this happening interactively or in some kind of automation
>> environment?  Are the correct locales installed?  Do other
>> LC_ALL-setting tests fail?
>
> This is when I run under dejagnu.  If I run the test manually, and copy
> the commands from the .exp file by hand, pasting them into my GDB
> session, it all appears to work fine.
>
> I'm not sure how I'd check if the correct locales are installed (I mean,
> I'm not sure what I'd be looking for), but I guess as it passes when run
> manually, then I'm probably OK.
>
> Looking for scripts that set or mention LC_ALL, I found these:
>
>   gdb.base/utf8-identifiers.exp
>   gdb.python/py-source-styling.exp
>   gdb.ada/non-ascii-utf-8.exp
>   gdb.ada/non-ascii-latin-3.exp
>   gdb.ada/non-ascii-latin-1.exp
>
> These all run fine, except for 3 failures in
> gdb.ada/non-ascii-utf-8.exp, which look suspiciously similar:
>
>   print VAR_ð<U+0090><U+0090><U+0081>
>   No definition of "var_ð<U+0090><U+0090><U+0081>" in current context.
>   (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print VAR_ð<U+0090><U+0090><U+0081>
>   print var_ð<U+0090><U+0090>©
>   No definition of "var_ð<U+0090><U+0090>©" in current context.
>   (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print var_ð<U+0090><U+0090>©
>   ... snip ...
>   break FUNC_ð<U+0090><U+0090><U+0081>
>   Function "FUNC_ð<U+0090><U+0090><U+0081>" not defined.
>   Make breakpoint pending on future shared library load? (y or [n]) n
>   (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: setting breakpoint at FUNC_ð<U+0090><U+0090><U+0081>
>
>>
>> Andrew>  print "ð<U+009D><U+0095>¯"
>> Andrew>  $1 = "ð\302\235\302\225¯"
>>
>> One thing I'd suggest is checking by hand if either the 'print' line or
>> the '$1 = ' line has the correct byte values for the UTF-8 encoded form
>> of the character in question.
>
> So, this is weird.  When I look at the .exp file, I see the bytes of the
> unicode character as 0xf0 0x9f 0x95 0xaf, which looks correct:
>
>   https://www.fileformat.info/info/unicode/char/1d56f/index.htm
>
> But, when I look at the gdb.log file, I see the following bytes 0xc3
> 0xb0 0xc2 0x9d 0xc2 0x95 0xc2 0xaf.
>
> Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while
> all the subequent bytes get a 0xc2 byte before them.
>
> Does any of this give any clues to what might be happening?

So I put this into a text file 'unicode.tcl':

  puts "print 𝕯"

(just in case that gets mangled in transit, that's the same unicode
character as is used in the gdb.rust/unicode.exp test)

and then I did:

  $ tclsh unicode.tcl

and I get the same corrupted bytes as I see from the test script (c3 b0
c2 9d c2 95 c2 af).  So the problem appears to be with my build of tcl.

I'm currently running tcl 8.6.  I wonder if you could compare this to
the behaviour of your tclsh.

Thanks,
Andrew

next prev parent reply	other threads:[~2022-04-04  9:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 23:15 Tom Tromey
2022-02-06 20:23 ` Tom Tromey
2022-04-03 16:17   ` Andrew Burgess
2022-04-03 16:51     ` Tom Tromey
2022-04-03 17:34       ` Andrew Burgess
2022-04-04  9:10         ` Andrew Burgess [this message]
2022-04-04  9:48           ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgl16wp1.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).