From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id F23E33858D37 for ; Sun, 3 Apr 2022 17:34:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F23E33858D37 Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-280-gEbJ55TJMXeMnU6i_QBJuQ-1; Sun, 03 Apr 2022 13:34:15 -0400 X-MC-Unique: gEbJ55TJMXeMnU6i_QBJuQ-1 Received: by mail-wm1-f70.google.com with SMTP id 189-20020a1c02c6000000b0038e6c4c6472so1598503wmc.1 for ; Sun, 03 Apr 2022 10:34:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=zYqB9kgHDX/XXCibpbUHU5vgBNbMNZwgPcgaUTKLSVg=; b=BL7FIf5mpRjfqIh0JHCnzz0QDIYNWvkV0CCpwom/Ki7zaLhg5ih6tFkSeJwtpqN9Le yZlq8RfoPUm3Hz4hCBd87BcItAl+ox5gvHyVMKywdmAXXpugBs1pD1mDRiKhIYnfl25D uiUy4wlc02jCcfI9SiuesoH7AbI7OMzmrJi2LZ1I5w+2YJ46ezwkiVnCH6xeNyIDaUD2 Q89J5QSo0a25F6xgyHbvITIEfN7kw6ZEcVx5KSx/yf8UYEtEidsDDdVo8QZnl0qomWB+ F5x6S6AdNFVWHeHa15SATFAa9QyfnuSA0GnEllx53TGwGbNVqhg3gvlWek0GWpzNKVo5 JLbQ== X-Gm-Message-State: AOAM532osXl1uEap4b+7xICooQ0J52Y+hydwOHrVwsPHTpaHO9E8wvP0 OCCN7CB6bI2knMnA34Xat/BpdoUMuIIRFsR5nyKKzGBu8J7/CxeS8Mvz6vYq3rzhqd/6zvZYruz QNrOkTGRDG0B2GbTN0bcNCg== X-Received: by 2002:a5d:5889:0:b0:204:1ccf:a04 with SMTP id n9-20020a5d5889000000b002041ccf0a04mr14306828wrf.197.1649007253791; Sun, 03 Apr 2022 10:34:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzbyQSKnyRBuajQUPCxL7VkUvH2UdW8Z2o5h1u6pSs5icEsceZYrRNZw2D+nu+x3mS5fCFAA== X-Received: by 2002:a5d:5889:0:b0:204:1ccf:a04 with SMTP id n9-20020a5d5889000000b002041ccf0a04mr14306819wrf.197.1649007253526; Sun, 03 Apr 2022 10:34:13 -0700 (PDT) Received: from localhost (host86-169-131-113.range86-169.btcentralplus.com. [86.169.131.113]) by smtp.gmail.com with ESMTPSA id a11-20020a5d456b000000b0020406ce0e06sm7123494wrc.94.2022.04.03.10.34.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Apr 2022 10:34:13 -0700 (PDT) From: Andrew Burgess To: Tom Tromey Cc: Tom Tromey , gdb-patches@sourceware.org Subject: Re: [PATCH] Allow non-ASCII characters in Rust identifiers In-Reply-To: <87mth26rgo.fsf@tromey.com> References: <20220126231501.1031201-1-tom@tromey.com> <87y22nwxqb.fsf@tromey.com> <87ee2e87l6.fsf@redhat.com> <87mth26rgo.fsf@tromey.com> Date: Sun, 03 Apr 2022 18:34:11 +0100 Message-ID: <875ynq8418.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_INFOUSMEBIZ, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Apr 2022 17:34:18 -0000 Tom Tromey writes: > Andrew> I'm seeing this test fail. > > Andrew> $ rustc --version > Andrew> rustc 1.59.0 (9d1b2106e 2022-02-23) > > I installed this version with "rustup toolchain install 1.59.0" and set > it to be my default. > > Andrew> I've tested with gdb commit a723766c0e2 and 5187219460c. > > I tried 552f1157c6262, a recent-ish git master. > It works fine for me. > > Andrew> Do these pass for you? Any suggestions for where to start lookin= g? > > I wonder if this line in the .exp isn't having the desired effect: > > setenv LC_ALL C.UTF-8 > > Is this happening interactively or in some kind of automation > environment? Are the correct locales installed? Do other > LC_ALL-setting tests fail? This is when I run under dejagnu. If I run the test manually, and copy the commands from the .exp file by hand, pasting them into my GDB session, it all appears to work fine. I'm not sure how I'd check if the correct locales are installed (I mean, I'm not sure what I'd be looking for), but I guess as it passes when run manually, then I'm probably OK. Looking for scripts that set or mention LC_ALL, I found these: gdb.base/utf8-identifiers.exp gdb.python/py-source-styling.exp gdb.ada/non-ascii-utf-8.exp gdb.ada/non-ascii-latin-3.exp gdb.ada/non-ascii-latin-1.exp These all run fine, except for 3 failures in gdb.ada/non-ascii-utf-8.exp, which look suspiciously similar: print VAR_=C3=B0 No definition of "var_=C3=B0" in current context. (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print VAR_=C3=B0= print var_=C3=B0=C2=A9 No definition of "var_=C3=B0=C2=A9" in current context. (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: print var_=C3=B0= =C2=A9 ... snip ... break FUNC_=C3=B0 Function "FUNC_=C3=B0" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) FAIL: gdb.ada/non-ascii-utf-8.exp: setting breakpoint at FUNC_=C3= =B0 > > Andrew> print "=C3=B0=C2=AF" > Andrew> $1 =3D "=C3=B0\302\235\302\225=C2=AF" > > One thing I'd suggest is checking by hand if either the 'print' line or > the '$1 =3D ' line has the correct byte values for the UTF-8 encoded form > of the character in question. So, this is weird. When I look at the .exp file, I see the bytes of the unicode character as 0xf0 0x9f 0x95 0xaf, which looks correct: https://www.fileformat.info/info/unicode/char/1d56f/index.htm But, when I look at the gdb.log file, I see the following bytes 0xc3 0xb0 0xc2 0x9d 0xc2 0x95 0xc2 0xaf. Compared to the original, the first '0xf0' changes to '0xc3 0xb0', while all the subequent bytes get a 0xc2 byte before them. Does any of this give any clues to what might be happening? Thanks, Andrew