public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libiberty rust-demangle, ignore .suffix
@ 2021-12-02 17:17 Mark Wielaard
  2021-12-02 17:35 ` Eduard-Mihai Burtescu
  2021-12-02 19:58 ` Nicholas Nethercote
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Wielaard @ 2021-12-02 17:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Eduard-Mihai Burtescu, Nick Nethercote, Mark Wielaard

Rust v0 symbols can have a .suffix because if compiler transformations.
These can be ignored it the demangled name. Which is what this patch
implements). But an alternative implementation could be to follow what
C++ does and represent these as [clone .suffix] tagged onto the
demangled name. But this seems somewhat confusing since it results in
a demangled name that cannot be mangled again. And it would mean
trying to decode compiler internal naming.

https://bugs.kde.org/show_bug.cgi?id=445916
https://github.com/rust-lang/rust/issues/60705

libiberty/Changelog

	* rust-demangle.c (rust_demangle_callback): Ignore everything
	after '.' char in sym for v0.
	* testsuite/rust-demangle-expected: Add new .suffix testcase.
---
 libiberty/rust-demangle.c                  | 4 ++++
 libiberty/testsuite/rust-demangle-expected | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index 449941b56dc1..dee9eb64df79 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -1340,6 +1340,10 @@ rust_demangle_callback (const char *mangled, int options,
   /* Rust symbols (v0) use only [_0-9a-zA-Z] characters. */
   for (p = rdm.sym; *p; p++)
     {
+      /* Rust v0 symbols can have '.' suffixes, ignore those.  */
+      if (rdm.version == 0 && *p == '.')
+	break;
+
       rdm.sym_len++;
 
       if (*p == '_' || ISALNUM (*p))
diff --git a/libiberty/testsuite/rust-demangle-expected b/libiberty/testsuite/rust-demangle-expected
index 7dca315d0054..bc32ed85f882 100644
--- a/libiberty/testsuite/rust-demangle-expected
+++ b/libiberty/testsuite/rust-demangle-expected
@@ -295,3 +295,9 @@ _RMCs4fqI2P2rA04_13const_genericINtB0_4CharKc2202_E
 --format=auto
 _RNvNvMCs4fqI2P2rA04_13const_genericINtB4_3FooKpE3foo3FOO
 <const_generic::Foo<_>>::foo::FOO
+#
+# Suffixes
+#
+--format=rust
+_RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694
+<rustc_data_structures::obligation_forest::ObligationForest<rustc_trait_selection::traits::fulfill::PendingPredicateObligation>>::register_obligation_at
-- 
2.18.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 17:17 [PATCH] libiberty rust-demangle, ignore .suffix Mark Wielaard
@ 2021-12-02 17:35 ` Eduard-Mihai Burtescu
  2021-12-02 22:07   ` Mark Wielaard
  2021-12-02 19:58 ` Nicholas Nethercote
  1 sibling, 1 reply; 9+ messages in thread
From: Eduard-Mihai Burtescu @ 2021-12-02 17:35 UTC (permalink / raw)
  To: Mark Wielaard, gcc-patches; +Cc: Nick Nethercote

On Thu, Dec 2, 2021, at 19:17, Mark Wielaard wrote:
> Rust v0 symbols can have a .suffix because if compiler transformations.

For some context, the suffix comes from LLVM (I believe as part of its LTO).
If this were a semantic part of a Rust symbol, it would have an encoding within v0 (as we already do for e.g. shims).

That also means that for consistency, suffixes like these should be handled uniformly for both v0 and legacy (as rustc-demangle does), since LLVM doesn't distinguish.

You may even be able to get Clang to generate C++ mangled symbols with ".llvm." suffixes, with enough application of LTO.
This is not unlike GCC ".clone" suffixes, AFAIK.
Sadly I don't think there's a way to handle both as "outside the symbol", without hardcoding ".llvm." in the implementation.

I don't recall the libiberty demangling API having any provisions for the demangler deciding that a mangled symbol "stops early", which would maybe allow for a more general solution.

Despite all that, if it helps in practice, I would still not mind this patch landing in its current form, I just wanted to share my perspective on the larger issue.

Thanks,
- Eddy B.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 17:17 [PATCH] libiberty rust-demangle, ignore .suffix Mark Wielaard
  2021-12-02 17:35 ` Eduard-Mihai Burtescu
@ 2021-12-02 19:58 ` Nicholas Nethercote
  2021-12-02 22:54   ` Mark Wielaard
  1 sibling, 1 reply; 9+ messages in thread
From: Nicholas Nethercote @ 2021-12-02 19:58 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: gcc-patches, Eduard-Mihai Burtescu

On Fri, 3 Dec 2021 at 04:17, Mark Wielaard <mark@klomp.org> wrote:

>
>         * rust-demangle.c (rust_demangle_callback): Ignore everything
>         after '.' char in sym for v0.
>

I just applied this change to Valgrind's copy of rust-demangle.c and I can
confirm that it works -- the symbols that were failing to be demangled are
now demangled.


> +      /* Rust v0 symbols can have '.' suffixes, ignore those.  */
> +      if (rdm.version == 0 && *p == '.')
> +       break;
>

Note that the indent on the `break` is 1 char, it should be 2.

Thanks for doing this, Mark!

Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 17:35 ` Eduard-Mihai Burtescu
@ 2021-12-02 22:07   ` Mark Wielaard
  2021-12-02 23:14     ` Eduard-Mihai Burtescu
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2021-12-02 22:07 UTC (permalink / raw)
  To: Eduard-Mihai Burtescu; +Cc: gcc-patches, Nick Nethercote

Hi Eddy,

On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu wrote:
> On Thu, Dec 2, 2021, at 19:17, Mark Wielaard wrote:
> > Rust v0 symbols can have a .suffix because if compiler transformations.
> 
> For some context, the suffix comes from LLVM (I believe as part of
> its LTO).  If this were a semantic part of a Rust symbol, it would
> have an encoding within v0 (as we already do for e.g. shims).

The same is true for gccrs or the libgccjit backend, gcc might clone
the symbol name and make it unique by appending some .suffix name.

> That also means that for consistency, suffixes like these should be
> handled uniformly for both v0 and legacy (as rustc-demangle does),
> since LLVM doesn't distinguish.

The problem with the legacy mangling is that dot '.' is a valid
character. That is why the patch only handles the v0 mangling case
(where dot '.' isn't valid).

> You may even be able to get Clang to generate C++ mangled symbols
> with ".llvm." suffixes, with enough application of LTO.  This is not
> unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's a
> way to handle both as "outside the symbol", without hardcoding
> ".llvm." in the implementation.

We could use the scheme used by c++ where the .suffix is added as "
[clone .suffix]", it even handles multiple dots, where something like
_Z3fooi.part.9.165493.constprop.775.31805
demangles to
foo(int) [clone .part.9.165493] [clone .constprop.775.31805]

I just don't think that is very useful and a little confusing.

> I don't recall the libiberty demangling API having any provisions
> for the demangler deciding that a mangled symbol "stops early",
> which would maybe allow for a more general solution.

No, there indeed is no interface. We might introduce a new option flag
for treating '.' as end of symbol. But do we really need that
flexibility?

> Despite all that, if it helps in practice, I would still not mind
> this patch landing in its current form, I just wanted to share my
> perspective on the larger issue.

Thanks for that. Do you happen to know what other rust demanglers do?

Cheers,

Mark


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 19:58 ` Nicholas Nethercote
@ 2021-12-02 22:54   ` Mark Wielaard
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Wielaard @ 2021-12-02 22:54 UTC (permalink / raw)
  To: Nicholas Nethercote; +Cc: gcc-patches, Eduard-Mihai Burtescu

Hi,

On Fri, Dec 03, 2021 at 06:58:36AM +1100, Nicholas Nethercote wrote:
> On Fri, 3 Dec 2021 at 04:17, Mark Wielaard <mark@klomp.org> wrote:
> >
> >         * rust-demangle.c (rust_demangle_callback): Ignore everything
> >         after '.' char in sym for v0.
> >
> 
> I just applied this change to Valgrind's copy of rust-demangle.c and I can
> confirm that it works -- the symbols that were failing to be demangled are
> now demangled.

Thanks for testing.

> > +      /* Rust v0 symbols can have '.' suffixes, ignore those.  */
> > +      if (rdm.version == 0 && *p == '.')
> > +       break;
> >
> 
> Note that the indent on the `break` is 1 char, it should be 2.

Oops. That is actually because I used a tab for indentation, but this
file doesn't use tabs. Fixed in my local copy
https://code.wildebeest.org/git/user/mjw/gcc/commit/?h=rust-demangle-suffix
Will push with that fix if approved.

Thanks,

Mark


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 22:07   ` Mark Wielaard
@ 2021-12-02 23:14     ` Eduard-Mihai Burtescu
  2021-12-07 19:16       ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Eduard-Mihai Burtescu @ 2021-12-02 23:14 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: gcc-patches, Nick Nethercote

On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote:
> Hi Eddy,
>
> On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu wrote:
>> On Thu, Dec 2, 2021, at 19:17, Mark Wielaard wrote:
>> > Rust v0 symbols can have a .suffix because if compiler transformations.
>> 
>> For some context, the suffix comes from LLVM (I believe as part of
>> its LTO).  If this were a semantic part of a Rust symbol, it would
>> have an encoding within v0 (as we already do for e.g. shims).
>
> The same is true for gccrs or the libgccjit backend, gcc might clone
> the symbol name and make it unique by appending some .suffix name.
>
>> That also means that for consistency, suffixes like these should be
>> handled uniformly for both v0 and legacy (as rustc-demangle does),
>> since LLVM doesn't distinguish.
>
> The problem with the legacy mangling is that dot '.' is a valid
> character. That is why the patch only handles the v0 mangling case
> (where dot '.' isn't valid).

Thought so, that's an annoying complication - however, see later down
why that's still not a blocker to the way rustc-demangle handles it.

>> You may even be able to get Clang to generate C++ mangled symbols
>> with ".llvm." suffixes, with enough application of LTO.  This is not
>> unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's a
>> way to handle both as "outside the symbol", without hardcoding
>> ".llvm." in the implementation.
>
> We could use the scheme used by c++ where the .suffix is added as "
> [clone .suffix]", it even handles multiple dots, where something like
> _Z3fooi.part.9.165493.constprop.775.31805
> demangles to
> foo(int) [clone .part.9.165493] [clone .constprop.775.31805]
>
> I just don't think that is very useful and a little confusing.

Calling it "clone" is a bit weird, but I just checked what rustc-demangle
does for printing suffixes back out and it's not great either:
- ".llvm." (and everything after it) is completely removed
- any left-over suffixes (after demangling), if they start with ".", are
  not considered errors, but printed out verbatim after the demangling

>> I don't recall the libiberty demangling API having any provisions
>> for the demangler deciding that a mangled symbol "stops early",
>> which would maybe allow for a more general solution.
>
> No, there indeed is no interface. We might introduce a new option flag
> for treating '.' as end of symbol. But do we really need that
> flexibility?

That's not what I meant - a v0 or legacy symbol is self-terminating in
its parsing (or at the very least there are not dots allowed outside
of a length-prefixed identifier), so that when you see the start of
a valid mangled symbol, you can always find its end in the string,
even when that end is half-way through (and is followed by suffixes
or any other unrelated noise).

What I was imagining is a way to return to the caller the number of
chars from the start of the original string, that were demangled,
letting the caller do something else with the rest of that string.
(see below for how rustc-demangle already does something similar)

>> Despite all that, if it helps in practice, I would still not mind
>> this patch landing in its current form, I just wanted to share my
>> perspective on the larger issue.
>
> Thanks for that. Do you happen to know what other rust demanglers do?

rustc-demangle's internal API returns a pair of the demangler and the
"leftover" parts of the original string, after the end of the symbol.
You can see here how that suffix is further checked, and kept:
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138

As mentioned above, ".llvm." is handled differently, just above the snippet
linked - perhaps it was deemed too common to let it pollute the output.

> Cheers,
>
> Mark

Thanks,
- Eddy B.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-02 23:14     ` Eduard-Mihai Burtescu
@ 2021-12-07 19:16       ` Mark Wielaard
  2021-12-20 11:50         ` Eduard-Mihai Burtescu
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Wielaard @ 2021-12-07 19:16 UTC (permalink / raw)
  To: Eduard-Mihai Burtescu; +Cc: gcc-patches, Nick Nethercote

[-- Attachment #1: Type: text/plain, Size: 4381 bytes --]

Hi Eddy,

On Fri, 2021-12-03 at 01:14 +0200, Eduard-Mihai Burtescu wrote:
> On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote:
> > On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu
> > wrote:
> > > That also means that for consistency, suffixes like these should
> > > be
> > > handled uniformly for both v0 and legacy (as rustc-demangle
> > > does),
> > > since LLVM doesn't distinguish.
> > 
> > The problem with the legacy mangling is that dot '.' is a valid
> > character. That is why the patch only handles the v0 mangling case
> > (where dot '.' isn't valid).
> 
> Thought so, that's an annoying complication - however, see later down
> why that's still not a blocker to the way rustc-demangle handles it.
> 
> > > You may even be able to get Clang to generate C++ mangled symbols
> > > with ".llvm." suffixes, with enough application of LTO.  This is
> > > not
> > > unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's
> > > a
> > > way to handle both as "outside the symbol", without hardcoding
> > > ".llvm." in the implementation.
> > 
> > We could use the scheme used by c++ where the .suffix is added as "
> > [clone .suffix]", it even handles multiple dots, where something
> > like
> > _Z3fooi.part.9.165493.constprop.775.31805
> > demangles to
> > foo(int) [clone .part.9.165493] [clone .constprop.775.31805]
> > 
> > I just don't think that is very useful and a little confusing.
> 
> Calling it "clone" is a bit weird, but I just checked what rustc-
> demangle
> does for printing suffixes back out and it's not great either:
> - ".llvm." (and everything after it) is completely removed
> - any left-over suffixes (after demangling), if they start with ".",
> are
>   not considered errors, but printed out verbatim after the
> demangling
> 
> > > I don't recall the libiberty demangling API having any provisions
> > > for the demangler deciding that a mangled symbol "stops early",
> > > which would maybe allow for a more general solution.
> > 
> > No, there indeed is no interface. We might introduce a new option
> > flag
> > for treating '.' as end of symbol. But do we really need that
> > flexibility?
> 
> That's not what I meant - a v0 or legacy symbol is self-terminating
> in
> its parsing (or at the very least there are not dots allowed outside
> of a length-prefixed identifier), so that when you see the start of
> a valid mangled symbol, you can always find its end in the string,
> even when that end is half-way through (and is followed by suffixes
> or any other unrelated noise).
> 
> What I was imagining is a way to return to the caller the number of
> chars from the start of the original string, that were demangled,
> letting the caller do something else with the rest of that string.
> (see below for how rustc-demangle already does something similar)
>
> > > Despite all that, if it helps in practice, I would still not mind
> > > this patch landing in its current form, I just wanted to share my
> > > perspective on the larger issue.
> > 
> > Thanks for that. Do you happen to know what other rust demanglers
> > do?
> 
> rustc-demangle's internal API returns a pair of the demangler and the
> "leftover" parts of the original string, after the end of the symbol.
> You can see here how that suffix is further checked, and kept:
> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138

Yes, returning a struct that returns the style detected, the demangled
string and the left over chars makes sense. But we don't have an
interface like that at the moment, and I am not sure we (currently)
have users who want this.

> As mentioned above, ".llvm." is handled differently, just above the snippet
> linked - perhaps it was deemed too common to let it pollute the output.

But that also makes it a slightly odd interface. I would imagine that
people would be interested in the .llvm. part. Now that just gets
dropped.

Since we don't have an interface to return the suffix (and I find the
choice of dropping .llvm. but not other suffixes odd), I think we
should just simply always drop the .suffix. I understand now how to do
that for legacy symbols too, thanks for the hints.

See the attached update to the patch. What do you think?

Thanks,

Mark

[-- Attachment #2: 0001-libiberty-rust-demangle-ignore-.suffix.patch --]
[-- Type: text/x-patch, Size: 3613 bytes --]

From b9ea4a1cc5d139a08fd04bd5c59dae1d1519f82b Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Thu, 2 Dec 2021 18:00:39 +0100
Subject: [PATCH] libiberty rust-demangle, ignore .suffix

Rust symbols can have a .suffix because of compiler transformations.
These can be ignored in the demangled name. Which is what this patch
implements. By stopping at the first dot for v0 symbols and searching
backwards to the ending 'E' for legacy symbols.

An alternative implementation could be to follow what C++ does and
represent these as [clone .suffix] tagged onto the demangled name.
But this seems somewhat confusing since it results in a demangled
name that cannot be mangled again. And it would mean trying to
decode compiler internal naming.

https://bugs.kde.org/show_bug.cgi?id=445916
https://github.com/rust-lang/rust/issues/60705

libiberty/Changelog

	* rust-demangle.c (rust_demangle_callback): Ignore everything
	after '.' char in sym for v0. For legacy symbols search
	backwards to find the last 'E' before any '.'.
	* testsuite/rust-demangle-expected: Add new .suffix testcases.
---
 libiberty/rust-demangle.c                  | 15 ++++++++++++++-
 libiberty/testsuite/rust-demangle-expected |  9 +++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index 449941b56dc1..5c2b77f7a58e 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -1340,6 +1340,10 @@ rust_demangle_callback (const char *mangled, int options,
   /* Rust symbols (v0) use only [_0-9a-zA-Z] characters. */
   for (p = rdm.sym; *p; p++)
     {
+      /* Rust v0 symbols can have '.' suffixes, ignore those.  */
+      if (rdm.version == 0 && *p == '.')
+        break;
+
       rdm.sym_len++;
 
       if (*p == '_' || ISALNUM (*p))
@@ -1355,7 +1359,16 @@ rust_demangle_callback (const char *mangled, int options,
   /* Legacy Rust symbols need to be handled separately. */
   if (rdm.version == -1)
     {
-      /* Legacy Rust symbols always end with E. */
+      /* Legacy Rust symbols always end with E.  But can be followed by a
+         .suffix (which we want to ignore).  */
+      int dot_suffix = 1;
+      while (rdm.sym_len > 0 &&
+             !(dot_suffix && rdm.sym[rdm.sym_len - 1] == 'E'))
+        {
+          dot_suffix = rdm.sym[rdm.sym_len - 1] == '.';
+          rdm.sym_len--;
+        }
+
       if (!(rdm.sym_len > 0 && rdm.sym[rdm.sym_len - 1] == 'E'))
         return 0;
       rdm.sym_len--;
diff --git a/libiberty/testsuite/rust-demangle-expected b/libiberty/testsuite/rust-demangle-expected
index 7dca315d0054..eb73527b3189 100644
--- a/libiberty/testsuite/rust-demangle-expected
+++ b/libiberty/testsuite/rust-demangle-expected
@@ -295,3 +295,12 @@ _RMCs4fqI2P2rA04_13const_genericINtB0_4CharKc2202_E
 --format=auto
 _RNvNvMCs4fqI2P2rA04_13const_genericINtB4_3FooKpE3foo3FOO
 <const_generic::Foo<_>>::foo::FOO
+#
+# Suffixes
+#
+--format=rust
+_RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694
+<rustc_data_structures::obligation_forest::ObligationForest<rustc_trait_selection::traits::fulfill::PendingPredicateObligation>>::register_obligation_at
+--format=rust
+_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17h27f14859c664490dE.llvm.8091179795805947855
+core::ptr::drop_in_place<std::rt::lang_start<()>::{{closure}}>
\ No newline at end of file
-- 
2.18.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-07 19:16       ` Mark Wielaard
@ 2021-12-20 11:50         ` Eduard-Mihai Burtescu
  2022-01-16  0:27           ` Mark Wielaard
  0 siblings, 1 reply; 9+ messages in thread
From: Eduard-Mihai Burtescu @ 2021-12-20 11:50 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: gcc-patches, Nick Nethercote

Apologies for the delay, the email fell through the cracks somehow.

The updated patch looks like it would work alright, only needs a couple tests, e.g.:
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L413-L422
https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/v0.rs#L1442-L1444

Thanks,
- Eddy B.

On Tue, Dec 7, 2021, at 21:16, Mark Wielaard wrote:
> Hi Eddy,
>
> On Fri, 2021-12-03 at 01:14 +0200, Eduard-Mihai Burtescu wrote:
>> On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote:
>> > On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu
>> > wrote:
>> > > That also means that for consistency, suffixes like these should
>> > > be
>> > > handled uniformly for both v0 and legacy (as rustc-demangle
>> > > does),
>> > > since LLVM doesn't distinguish.
>> > 
>> > The problem with the legacy mangling is that dot '.' is a valid
>> > character. That is why the patch only handles the v0 mangling case
>> > (where dot '.' isn't valid).
>> 
>> Thought so, that's an annoying complication - however, see later down
>> why that's still not a blocker to the way rustc-demangle handles it.
>> 
>> > > You may even be able to get Clang to generate C++ mangled symbols
>> > > with ".llvm." suffixes, with enough application of LTO.  This is
>> > > not
>> > > unlike GCC ".clone" suffixes, AFAIK.  Sadly I don't think there's
>> > > a
>> > > way to handle both as "outside the symbol", without hardcoding
>> > > ".llvm." in the implementation.
>> > 
>> > We could use the scheme used by c++ where the .suffix is added as "
>> > [clone .suffix]", it even handles multiple dots, where something
>> > like
>> > _Z3fooi.part.9.165493.constprop.775.31805
>> > demangles to
>> > foo(int) [clone .part.9.165493] [clone .constprop.775.31805]
>> > 
>> > I just don't think that is very useful and a little confusing.
>> 
>> Calling it "clone" is a bit weird, but I just checked what rustc-
>> demangle
>> does for printing suffixes back out and it's not great either:
>> - ".llvm." (and everything after it) is completely removed
>> - any left-over suffixes (after demangling), if they start with ".",
>> are
>>   not considered errors, but printed out verbatim after the
>> demangling
>> 
>> > > I don't recall the libiberty demangling API having any provisions
>> > > for the demangler deciding that a mangled symbol "stops early",
>> > > which would maybe allow for a more general solution.
>> > 
>> > No, there indeed is no interface. We might introduce a new option
>> > flag
>> > for treating '.' as end of symbol. But do we really need that
>> > flexibility?
>> 
>> That's not what I meant - a v0 or legacy symbol is self-terminating
>> in
>> its parsing (or at the very least there are not dots allowed outside
>> of a length-prefixed identifier), so that when you see the start of
>> a valid mangled symbol, you can always find its end in the string,
>> even when that end is half-way through (and is followed by suffixes
>> or any other unrelated noise).
>> 
>> What I was imagining is a way to return to the caller the number of
>> chars from the start of the original string, that were demangled,
>> letting the caller do something else with the rest of that string.
>> (see below for how rustc-demangle already does something similar)
>>
>> > > Despite all that, if it helps in practice, I would still not mind
>> > > this patch landing in its current form, I just wanted to share my
>> > > perspective on the larger issue.
>> > 
>> > Thanks for that. Do you happen to know what other rust demanglers
>> > do?
>> 
>> rustc-demangle's internal API returns a pair of the demangler and the
>> "leftover" parts of the original string, after the end of the symbol.
>> You can see here how that suffix is further checked, and kept:
>> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138
>
> Yes, returning a struct that returns the style detected, the demangled
> string and the left over chars makes sense. But we don't have an
> interface like that at the moment, and I am not sure we (currently)
> have users who want this.
>
>> As mentioned above, ".llvm." is handled differently, just above the snippet
>> linked - perhaps it was deemed too common to let it pollute the output.
>
> But that also makes it a slightly odd interface. I would imagine that
> people would be interested in the .llvm. part. Now that just gets
> dropped.
>
> Since we don't have an interface to return the suffix (and I find the
> choice of dropping .llvm. but not other suffixes odd), I think we
> should just simply always drop the .suffix. I understand now how to do
> that for legacy symbols too, thanks for the hints.
>
> See the attached update to the patch. What do you think?
>
> Thanks,
>
> Mark
>
> Attachments:
> * 0001-libiberty-rust-demangle-ignore-.suffix.patch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] libiberty rust-demangle, ignore .suffix
  2021-12-20 11:50         ` Eduard-Mihai Burtescu
@ 2022-01-16  0:27           ` Mark Wielaard
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Wielaard @ 2022-01-16  0:27 UTC (permalink / raw)
  To: Eduard-Mihai Burtescu; +Cc: gcc-patches, Nick Nethercote

Hi Eddy,

On Mon, Dec 20, 2021 at 01:50:52PM +0200, Eduard-Mihai Burtescu wrote:
> Apologies for the delay, the email fell through the cracks somehow.

And then I went on vacation... Sorry this fairly simple patch takes so
long.

> The updated patch looks like it would work alright, only needs a couple tests, e.g.:
> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L413-L422
> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/v0.rs#L1442-L1444

Thanks. I have added those testcases and noticed we need a tweak for
having an @ in the suffix of a legacy symbol. I'll sent the updated
patch. Hope it can still be pushed even though stage 3 is closing
soon.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-16  0:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 17:17 [PATCH] libiberty rust-demangle, ignore .suffix Mark Wielaard
2021-12-02 17:35 ` Eduard-Mihai Burtescu
2021-12-02 22:07   ` Mark Wielaard
2021-12-02 23:14     ` Eduard-Mihai Burtescu
2021-12-07 19:16       ` Mark Wielaard
2021-12-20 11:50         ` Eduard-Mihai Burtescu
2022-01-16  0:27           ` Mark Wielaard
2021-12-02 19:58 ` Nicholas Nethercote
2021-12-02 22:54   ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).