public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom Tromey <tom@tromey.com>
To: Andrew Burgess <aburgess@redhat.com>
Cc: Tom Tromey <tom@tromey.com>,  gdb-patches@sourceware.org
Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers
Date: Mon, 04 Apr 2022 03:48:48 -0600	[thread overview]
Message-ID: <87a6d16uwv.fsf@tromey.com> (raw)
In-Reply-To: <87zgl16wp1.fsf@redhat.com> (Andrew Burgess's message of "Mon, 04 Apr 2022 10:10:18 +0100")

Andrew> So I put this into a text file 'unicode.tcl':
Andrew>   puts "print 𝕯"
Andrew> (just in case that gets mangled in transit, that's the same unicode
Andrew> character as is used in the gdb.rust/unicode.exp test)

Andrew> I'm currently running tcl 8.6.  I wonder if you could compare this to
Andrew> the behaviour of your tclsh.

This works for me.  I'm using the system tclsh on Fedora 34.
tclsh doesn't seem to support --version, but:

prentzel. rpm -q tcl
tcl-8.6.10-5.fc34.x86_64

Andrew> Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while
Andrew> all the subequent bytes get a 0xc2 byte before them.

Unicode U+00f0 is represented as 0xc3 0xb0 in UTF-8.  So one idea is if
tclsh thinks the input is Latin-1, where the code points generally map
identically to Unicode code points, then this conversion would be done
if converting from the file encoding to UTF-8.

That is, tclsh reads 0xf0, but thinking it is reading a Latin-1
character, converts that to the corresponding Uncode character U+00f0,
and from there to the bytes that are seen.

I have LANG=en_US.UTF-8, which may explain why my default encoding is
UTF-8.  Perhaps this setting is the problem - if you are running in a
Latin-1 locale then the .exp files will be recoded incorrectly as they
are read by the interpreter.

I don't know if there's a way to set the file encoding in a way that Tcl
will recognize.  We could maybe try a UTF-8 BOM.

Tom

      reply	other threads:[~2022-04-04  9:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 23:15 Tom Tromey
2022-02-06 20:23 ` Tom Tromey
2022-04-03 16:17   ` Andrew Burgess
2022-04-03 16:51     ` Tom Tromey
2022-04-03 17:34       ` Andrew Burgess
2022-04-04  9:10         ` Andrew Burgess
2022-04-04  9:48           ` Tom Tromey [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6d16uwv.fsf@tromey.com \
    --to=tom@tromey.com \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).