From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>
Cc: Tom Tromey <tom@tromey.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers
Date: Mon, 04 Apr 2022 10:10:18 +0100 [thread overview]
Message-ID: <87zgl16wp1.fsf@redhat.com> (raw)
In-Reply-To: <875ynq8418.fsf@redhat.com>
Andrew Burgess <aburgess@redhat.com> writes:
> Tom Tromey <tom@tromey.com> writes:
>
>> Andrew> I'm seeing this test fail.
>>
>> Andrew> $ rustc --version
>> Andrew> rustc 1.59.0 (9d1b2106e 2022-02-23)
>>
>> I installed this version with "rustup toolchain install 1.59.0" and set
>> it to be my default.
>>
>> Andrew> I've tested with gdb commit a723766c0e2 and 5187219460c.
>>
>> I tried 552f1157c6262, a recent-ish git master.
>> It works fine for me.
>>
>> Andrew> Do these pass for you? Any suggestions for where to start looking?
>>
>> I wonder if this line in the .exp isn't having the desired effect:
>>
>> setenv LC_ALL C.UTF-8
>>
>> Is this happening interactively or in some kind of automation
>> environment? Are the correct locales installed? Do other
>> LC_ALL-setting tests fail?
>
> This is when I run under dejagnu. If I run the test manually, and copy
> the commands from the .exp file by hand, pasting them into my GDB
> session, it all appears to work fine.
>
> I'm not sure how I'd check if the correct locales are installed (I mean,
> I'm not sure what I'd be looking for), but I guess as it passes when run
> manually, then I'm probably OK.
>
> Looking for scripts that set or mention LC_ALL, I found these:
>
> gdb.base/utf8-identifiers.exp
> gdb.python/py-source-styling.exp
> gdb.ada/non-ascii-utf-8.exp
> gdb.ada/non-ascii-latin-3.exp
> gdb.ada/non-ascii-latin-1.exp
>
> These all run fine, except for 3 failures in
> gdb.ada/non-ascii-utf-8.exp, which look suspiciously similar:
>
> print VAR_ð<U+0090><U+0090><U+0081>
> No definition of "var_ð<U+0090><U+0090><U+0081>" in current context.
> (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print VAR_ð<U+0090><U+0090><U+0081>
> print var_ð<U+0090><U+0090>©
> No definition of "var_ð<U+0090><U+0090>©" in current context.
> (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print var_ð<U+0090><U+0090>©
> ... snip ...
> break FUNC_ð<U+0090><U+0090><U+0081>
> Function "FUNC_ð<U+0090><U+0090><U+0081>" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) n
> (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: setting breakpoint at FUNC_ð<U+0090><U+0090><U+0081>
>
>>
>> Andrew> print "ð<U+009D><U+0095>¯"
>> Andrew> $1 = "ð\302\235\302\225¯"
>>
>> One thing I'd suggest is checking by hand if either the 'print' line or
>> the '$1 = ' line has the correct byte values for the UTF-8 encoded form
>> of the character in question.
>
> So, this is weird. When I look at the .exp file, I see the bytes of the
> unicode character as 0xf0 0x9f 0x95 0xaf, which looks correct:
>
> https://www.fileformat.info/info/unicode/char/1d56f/index.htm
>
> But, when I look at the gdb.log file, I see the following bytes 0xc3
> 0xb0 0xc2 0x9d 0xc2 0x95 0xc2 0xaf.
>
> Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while
> all the subequent bytes get a 0xc2 byte before them.
>
> Does any of this give any clues to what might be happening?
So I put this into a text file 'unicode.tcl':
puts "print 𝕯"
(just in case that gets mangled in transit, that's the same unicode
character as is used in the gdb.rust/unicode.exp test)
and then I did:
$ tclsh unicode.tcl
and I get the same corrupted bytes as I see from the test script (c3 b0
c2 9d c2 95 c2 af). So the problem appears to be with my build of tcl.
I'm currently running tcl 8.6. I wonder if you could compare this to
the behaviour of your tclsh.
Thanks,
Andrew
next prev parent reply other threads:[~2022-04-04 9:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-26 23:15 Tom Tromey
2022-02-06 20:23 ` Tom Tromey
2022-04-03 16:17 ` Andrew Burgess
2022-04-03 16:51 ` Tom Tromey
2022-04-03 17:34 ` Andrew Burgess
2022-04-04 9:10 ` Andrew Burgess [this message]
2022-04-04 9:48 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgl16wp1.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=tom@tromey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).