public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Eduard-Mihai Burtescu" <eddyb@lyken.rs>
To: "Mark Wielaard" <mark@klomp.org>
Cc: gcc-patches@gcc.gnu.org, "Nick Nethercote" <n.nethercote@gmail.com>
Subject: Re: [PATCH] libiberty rust-demangle, ignore .suffix
Date: Fri, 03 Dec 2021 01:14:36 +0200	[thread overview]
Message-ID: <df669de5-f783-4833-b5c1-d0aebfbaf5fd@www.fastmail.com> (raw)
In-Reply-To: <YalDhoFdp5UxFMcA@wildebeest.org>

On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote:
> Hi Eddy,
>
> On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu wrote:
>> On Thu, Dec 2, 2021, at 19:17, Mark Wielaard wrote:
>> > Rust v0 symbols can have a .suffix because if compiler transformations.
>> 
>> For some context, the suffix comes from LLVM (I believe as part of
>> its LTO).  If this were a semantic part of a Rust symbol, it would
>> have an encoding within v0 (as we already do for e.g. shims).
>
> The same is true for gccrs or the libgccjit backend, gcc might clone
> the symbol name and make it unique by appending some .suffix name.
>
>> That also means that for consistency, suffixes like these should be
>> handled uniformly for both v0 and legacy (as rustc-demangle does),
>> since LLVM doesn't distinguish.
>
> The problem with the legacy mangling is that dot '.' is a valid
> character. That is why the patch only handles the v0 mangling case
> (where dot '.' isn't valid).

Thought so, that's an annoying complication - however, see later down
why that's still not a blocker to the way rustc-demangle handles it.

>> You may even be able to get Clang to generate C++ mangled symbols
>> with ".llvm." suffixes, with enough application of LTO.  This is not
>> unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's a
>> way to handle both as "outside the symbol", without hardcoding
>> ".llvm." in the implementation.
>
> We could use the scheme used by c++ where the .suffix is added as "
> [clone .suffix]", it even handles multiple dots, where something like
> _Z3fooi.part.9.165493.constprop.775.31805
> demangles to
> foo(int) [clone .part.9.165493] [clone .constprop.775.31805]
>
> I just don't think that is very useful and a little confusing.

Calling it "clone" is a bit weird, but I just checked what rustc-demangle
does for printing suffixes back out and it's not great either:
- ".llvm." (and everything after it) is completely removed
- any left-over suffixes (after demangling), if they start with ".", are
  not considered errors, but printed out verbatim after the demangling

>> I don't recall the libiberty demangling API having any provisions
>> for the demangler deciding that a mangled symbol "stops early",
>> which would maybe allow for a more general solution.
>
> No, there indeed is no interface. We might introduce a new option flag
> for treating '.' as end of symbol. But do we really need that
> flexibility?

That's not what I meant - a v0 or legacy symbol is self-terminating in
its parsing (or at the very least there are not dots allowed outside
of a length-prefixed identifier), so that when you see the start of
a valid mangled symbol, you can always find its end in the string,
even when that end is half-way through (and is followed by suffixes
or any other unrelated noise).

What I was imagining is a way to return to the caller the number of
chars from the start of the original string, that were demangled,
letting the caller do something else with the rest of that string.
(see below for how rustc-demangle already does something similar)

>> Despite all that, if it helps in practice, I would still not mind
>> this patch landing in its current form, I just wanted to share my
>> perspective on the larger issue.
>
> Thanks for that. Do you happen to know what other rust demanglers do?

rustc-demangle's internal API returns a pair of the demangler and the
"leftover" parts of the original string, after the end of the symbol.
You can see here how that suffix is further checked, and kept:
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138

As mentioned above, ".llvm." is handled differently, just above the snippet
linked - perhaps it was deemed too common to let it pollute the output.

> Cheers,
>
> Mark

Thanks,
- Eddy B.

  reply	other threads:[~2021-12-02 23:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-02 17:17 Mark Wielaard
2021-12-02 17:35 ` Eduard-Mihai Burtescu
2021-12-02 22:07   ` Mark Wielaard
2021-12-02 23:14     ` Eduard-Mihai Burtescu [this message]
2021-12-07 19:16       ` Mark Wielaard
2021-12-20 11:50         ` Eduard-Mihai Burtescu
2022-01-16  0:27           ` Mark Wielaard
2021-12-02 19:58 ` Nicholas Nethercote
2021-12-02 22:54   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df669de5-f783-4833-b5c1-d0aebfbaf5fd@www.fastmail.com \
    --to=eddyb@lyken.rs \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mark@klomp.org \
    --cc=n.nethercote@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).