public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
@ 2023-03-10  9:36 Jan Beulich
  2023-03-10  9:40 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jan Beulich @ 2023-03-10  9:36 UTC (permalink / raw)
  To: Binutils; +Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

The use of a space char there collides with anything using temp_ilp(),
i.e. at present at least with the special % operator available in
alternate macro mode. Further uses of the function may appear at any
time. This likely is a result of write.h's comment not properly
spelling out all the constraints placed on the selection of the special
character used. Then again RISC-V anyway violates a constraint which has
been properly spelled out there: Such labels _do_ appear in assembler
output.
---
RFC: Of course this breaks interoperability between older gas / new
     objdump and vice versa. But I don't see a way to resolve the issue
     at hand without introducing such a discontinuity. To limit "damage"
     a little, riscv_symbol_is_valid() could of course be tought to also
     ignore old style fake label names. (Personally I view this tying of
     functionality between assembler and disassembler as problematic
     anyway.) Thoughts?

I question the use of FAKE_LABEL_NAME in make_internal_label(). The
comment in tc-riscv.h isn't correct anyway because of (at least) this
use - the symbols generated there are never used for Dwarf. And them all
being the same makes it rather hard to associate relocations resolved to
symbol names (e.g. "objdump -dr" output) with the actual instance that's
referenced. Their naming should imo rather follow the model of
{fb,dollar}_label_name().

Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
marked in lex[] just like done here). I can't point out a specific case
though where there could be a problem.

The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
of these characters in quoted symbol names.

--- a/gas/app.c
+++ b/gas/app.c
@@ -200,6 +200,11 @@ do_scrub_begin (int m68k_mri ATTRIBUTE_U
   lex['-'] = LEX_IS_SYMBOL_COMPONENT;
 #endif
 
+  /* If not otherwise used (which it shouldn't be - see write.h), mark the
+     fake label character as a possible part of symbol names.  */
+  if (!lex[(unsigned char) FAKE_LABEL_CHAR])
+    lex[(unsigned char) FAKE_LABEL_CHAR] = LEX_IS_SYMBOL_COMPONENT;
+
 #ifdef H_TICK_HEX
   if (enable_h_tick_hex)
     {
--- a/gas/config/tc-riscv.h
+++ b/gas/config/tc-riscv.h
@@ -38,8 +38,7 @@ struct expressionS;
 #define LOCAL_LABELS_FB 	1
 
 /* Symbols named FAKE_LABEL_NAME are emitted when generating DWARF, so make
-   sure FAKE_LABEL_NAME is printable.  It still must be distinct from any
-   real label name.  So, append a space, which other labels can't contain.  */
+   sure FAKE_LABEL_NAME is printable.  See write.h for constraints.  */
 #define FAKE_LABEL_NAME RISCV_FAKE_LABEL_NAME
 /* Changing the special character in FAKE_LABEL_NAME requires changing
    FAKE_LABEL_CHAR too.  */
--- a/gas/testsuite/gas/all/altmacro.d
+++ b/gas/testsuite/gas/all/altmacro.d
@@ -8,4 +8,4 @@
 
 Contents of section .data:
  0000 01020912 61626331 32332121 3c3e2721 .*
- 0010 3c3e27.*
+ 0010 3c3e27a2 11.*
--- a/gas/testsuite/gas/all/altmacro.s
+++ b/gas/testsuite/gas/all/altmacro.s
@@ -33,3 +33,9 @@ m4	"!!<>'"
 	.altmacro
 
 m3	"!!<>'"
+
+	.macro	m2b v1, v2
+	m1 %v2-v1 17
+	.endm
+
+	m2b 81 243
--- a/gas/write.h
+++ b/gas/write.h
@@ -30,7 +30,8 @@
 /* This is a special character that is used to indicate a fake label.
    It must be present in FAKE_LABEL_NAME, although it does not have to
    be the first character.  It must not be a character that would be
-   found in a valid symbol name.
+   found in a valid symbol name, nor one that starts an operator, nor
+   one that's a separator of some kind.
 
    Also be aware that the function _bfd_elf_is_local_label_name in
    bfd/elf.c has an implicit assumption that FAKE_LABEL_CHAR is '\001'.
--- a/include/opcode/riscv.h
+++ b/include/opcode/riscv.h
@@ -323,9 +323,10 @@ static inline unsigned int riscv_insn_le
 
 /* These fake label defines are use by both the assembler, and
    libopcodes.  The assembler uses this when it needs to generate a fake
-   label, and libopcodes uses it to hide the fake labels in its output.  */
-#define RISCV_FAKE_LABEL_NAME ".L0 "
-#define RISCV_FAKE_LABEL_CHAR ' '
+   label, and libopcodes uses it to hide the fake labels in its output.
+   See gas/write.h for constraints.  */
+#define RISCV_FAKE_LABEL_NAME ".L0?"
+#define RISCV_FAKE_LABEL_CHAR '?'
 
 /* Replace bits MASK << SHIFT of STRUCT with the equivalent bits in
    VALUE << SHIFT.  VALUE is evaluated exactly once.  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-10  9:36 [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME Jan Beulich
@ 2023-03-10  9:40 ` Jan Beulich
  2023-03-13 12:06   ` Nick Clifton
  2023-03-15 16:05 ` Palmer Dabbelt
  2023-03-30 12:01 ` Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-03-10  9:40 UTC (permalink / raw)
  To: Nick Clifton, Alan Modra
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu, Binutils

Nick, Alan,

On 10.03.2023 10:36, Jan Beulich via Binutils wrote:
> Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
> and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
> DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
> marked in lex[] just like done here). I can't point out a specific case
> though where there could be a problem.
> 
> The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
> of these characters in quoted symbol names.

while the rest of the open questions is RISC-V specific, these two items
are not, and since from the title I suppose you may not notice the generic
aspects here, I thought I'd point them out explicitly. Just in case you
have any thoughts there (if you don't, I'm not sure how to progress).

Thanks, Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-10  9:40 ` Jan Beulich
@ 2023-03-13 12:06   ` Nick Clifton
  2023-03-13 12:52     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Clifton @ 2023-03-13 12:06 UTC (permalink / raw)
  To: Jan Beulich, Alan Modra
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu, Binutils

Hi Jan,

> On 10.03.2023 10:36, Jan Beulich via Binutils wrote:
>> Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
>> and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
>> DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
>> marked in lex[] just like done here). I can't point out a specific case
>> though where there could be a problem.
>>
>> The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
>> of these characters in quoted symbol names.
> 
> while the rest of the open questions is RISC-V specific, these two items
> are not, and since from the title I suppose you may not notice the generic
> aspects here, I thought I'd point them out explicitly. Just in case you
> have any thoughts there (if you don't, I'm not sure how to progress).

Thanks for bringing this up, and you are right, I had not noticed these
generic questions.  My thoughts on this topic are as follows:

I think that the intention of FAKE_LABEL_CHAR is to allow backends to
generate labels contains characters that are not normally accepted as
part of a label's name, but which nevertheless should be treated as if
they were real user-provided names.  The LOCAL_LABEL_CHAR and
DOLLAR_LABEL_CHAR characters on the other hand are intended solely for
internal assembler use and should never appear in user-generated, or
backend-generated labels.  Hence it makes sense to check for
FAKE_LABEL_CHAR to be checked in functions that read user labels
(read.c:read_symbol_name(), expr.c:get_symbol_name()) and it also makes
sense to not check for - and by implication, reject - LOCAL_LABEL_CHAR
and DOLLAR_LABEL_CHAR.

There is a special case however.  If a backend does not define
FAKE_LABEL_NAME (and FAKE_LABEL_CHAR) then write.h will define them
instead and it will use the same default character as DOLLAR_LABEL_CHAR.
I am not sure why this was done, possibly it was a mistake - it might
have been better to use a different control character, eg '\0003' instead.
Anyway, the header is written that way at the moment, and this is why
there is a (not actually needed) check in S_IS_LOCAL() for DOLLAR_LABEL_
CHAR and FAKE_LABEL_CHAR being the same.

So in conclusion, I think that unless a backend wants to use a control
character as a fake label name character, there should be no problems.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-13 12:06   ` Nick Clifton
@ 2023-03-13 12:52     ` Jan Beulich
  2023-03-13 14:25       ` Nick Clifton
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-03-13 12:52 UTC (permalink / raw)
  To: Nick Clifton
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu,
	Binutils, Alan Modra

Nick,

On 13.03.2023 13:06, Nick Clifton wrote:
>> On 10.03.2023 10:36, Jan Beulich via Binutils wrote:
>>> Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
>>> and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
>>> DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
>>> marked in lex[] just like done here). I can't point out a specific case
>>> though where there could be a problem.
>>>
>>> The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
>>> of these characters in quoted symbol names.
>>
>> while the rest of the open questions is RISC-V specific, these two items
>> are not, and since from the title I suppose you may not notice the generic
>> aspects here, I thought I'd point them out explicitly. Just in case you
>> have any thoughts there (if you don't, I'm not sure how to progress).
> 
> Thanks for bringing this up, and you are right, I had not noticed these
> generic questions.  My thoughts on this topic are as follows:
> 
> I think that the intention of FAKE_LABEL_CHAR is to allow backends to
> generate labels contains characters that are not normally accepted as
> part of a label's name, but which nevertheless should be treated as if
> they were real user-provided names.  The LOCAL_LABEL_CHAR and
> DOLLAR_LABEL_CHAR characters on the other hand are intended solely for
> internal assembler use and should never appear in user-generated, or
> backend-generated labels.  Hence it makes sense to check for
> FAKE_LABEL_CHAR to be checked in functions that read user labels
> (read.c:read_symbol_name(), expr.c:get_symbol_name()) and it also makes
> sense to not check for - and by implication, reject - LOCAL_LABEL_CHAR
> and DOLLAR_LABEL_CHAR.
> 
> There is a special case however.  If a backend does not define
> FAKE_LABEL_NAME (and FAKE_LABEL_CHAR) then write.h will define them
> instead and it will use the same default character as DOLLAR_LABEL_CHAR.
> I am not sure why this was done, possibly it was a mistake - it might
> have been better to use a different control character, eg '\0003' instead.
> Anyway, the header is written that way at the moment, and this is why
> there is a (not actually needed) check in S_IS_LOCAL() for DOLLAR_LABEL_
> CHAR and FAKE_LABEL_CHAR being the same.
> 
> So in conclusion, I think that unless a backend wants to use a control
> character as a fake label name character, there should be no problems.

okay, this addresses the first of the two points raised and indicates
that while things look somewhat odd, they really are okay in this regard.
I have to admit though that I don't follow why you specially mention
control character use by a backend as fake label name, not the least
since it's the default that a control character is used.

What I don't think you addressed is my 2nd observation wrt quoted symbol
names. There all characters can appear in symbol names, and hence
deriving "local" (or not) based on symbol name looks wrong to me.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-13 12:52     ` Jan Beulich
@ 2023-03-13 14:25       ` Nick Clifton
  2023-03-13 15:57         ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Clifton @ 2023-03-13 14:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu,
	Binutils, Alan Modra

Hi Jan,

> What I don't think you addressed is my 2nd observation wrt quoted symbol
> names. There all characters can appear in symbol names, and hence
> deriving "local" (or not) based on symbol name looks wrong to me.

Ok, that is fair.  Let's see.  So the issue is, what should S_IS_LOCAL do
if a user provides a global, quoted symbol name that includes characters
that match DOLLAR_LABEL_CHAR, LOCAL_LABEL_CHAR or FAKE_LABEL_CHAR, right ?

I would imagine that this is an extremely unusual case, which is why it has
not come up before.

First off, I wonder why S_IS_LOCAL does not check to see if BSF_GLOBAL is
set in flags and return false if that is the case.  If it did, I think that
that would solve the problem.  I cannot think of any situation where you
could have a global symbol that needs to be treated as local for the purposes
of S_IS_LOCAL.

Alternatively, maybe it would be simpler to just document that quoted
symbol names that contain special characters might be treated as if they
were local symbols, and leave it up to the user to decide whether they
want to risk such behaviour.  Not very user friendly, but it would match
the current behaviour.  We could also add a warning to read_symbol_name()
and get_symbol_name() to alert users to the problem.

What do you think ?  Any preferences ?

Cheers
   Nick




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-13 14:25       ` Nick Clifton
@ 2023-03-13 15:57         ` Jan Beulich
  2023-03-15 11:08           ` Nick Clifton
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-03-13 15:57 UTC (permalink / raw)
  To: Nick Clifton
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu,
	Binutils, Alan Modra

Nick,

On 13.03.2023 15:25, Nick Clifton wrote:
>> What I don't think you addressed is my 2nd observation wrt quoted symbol
>> names. There all characters can appear in symbol names, and hence
>> deriving "local" (or not) based on symbol name looks wrong to me.
> 
> Ok, that is fair.  Let's see.  So the issue is, what should S_IS_LOCAL do
> if a user provides a global, quoted symbol name that includes characters
> that match DOLLAR_LABEL_CHAR, LOCAL_LABEL_CHAR or FAKE_LABEL_CHAR, right ?
> 
> I would imagine that this is an extremely unusual case, which is why it has
> not come up before.
> 
> First off, I wonder why S_IS_LOCAL does not check to see if BSF_GLOBAL is
> set in flags and return false if that is the case.  If it did, I think that
> that would solve the problem.  I cannot think of any situation where you
> could have a global symbol that needs to be treated as local for the purposes
> of S_IS_LOCAL.

Well - fundamentally I expect there to be a reason for this name checking.
It may of course simply be a remnant of non-BFD-gas times. It could also
be that this works around certain supposedly-locals being marked global by
mistake. It's also not clear whether e.g. the comment in write_object_file()
regarding the relation of S_IS_LOCAL() and S_IS_EXTERNAL() is merely writing
down the observation of S_IS_LOCAL() does, or whether it means to point out
that this has to be that way for a reason (which then sadly isn't mentioned).

I wonder therefore whether S_IS_LOCAL() and S_IS_EXTERNAL() can't be made
true opposites of one another (with perhaps one of the two simply expanding
to a call to the other, inverting the result).

> Alternatively, maybe it would be simpler to just document that quoted
> symbol names that contain special characters might be treated as if they
> were local symbols, and leave it up to the user to decide whether they
> want to risk such behaviour.  Not very user friendly, but it would match
> the current behaviour.  We could also add a warning to read_symbol_name()
> and get_symbol_name() to alert users to the problem.

That's not good imo, as it might end in surprises / perceived regressions:
We couldn't easily change the special characters we use, or add another one
if needed. So I guess ...

> What do you think ?  Any preferences ?

... my preference is clear - see about making S_IS_LOCAL() and
S_IS_EXTERNAL() as close opposites of one another as possible, if no other
reason for their differences is known. I can give that a try, and see what
running the testsuite says. Yet even if that didn't raise any problems,
there'd be uncertainty left due to insufficient testsuite coverage on many
targets.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-13 15:57         ` Jan Beulich
@ 2023-03-15 11:08           ` Nick Clifton
  2023-03-15 15:45             ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Clifton @ 2023-03-15 11:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu,
	Binutils, Alan Modra

Hi Jan,

> I wonder therefore whether S_IS_LOCAL() and S_IS_EXTERNAL() can't be made
> true opposites of one another (with perhaps one of the two simply expanding
> to a call to the other, inverting the result).

I think that they are actually testing different things.  The way I see it,
S_IS_LOCAL() should return true for any symbol that is not intended to be
seen by anyone disassembling the code.  (Unless of course a command line
has been used to change this behaviour).  So for example labels like .L1
or ^B1.   This does not however mean the same thing as ELF's STB_LOCAL
binding.  Local ELF symbols can be found in disassemblies, should be present
in symbol tables, and represent something intelligible to the user, rather
than something invented by the assembler.

S_IS_EXTERNAL() on the other hand is meant to return true for symbols which
are not only exposed to the disassembler, but which also have visibility
outside of the file which is currently being assembled.  ie in ELF terms
symbols which have a binding other than STB_LOCAL.

That is why, IMHO, the two functions are different.

Cheers
   Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-15 11:08           ` Nick Clifton
@ 2023-03-15 15:45             ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2023-03-15 15:45 UTC (permalink / raw)
  To: Nick Clifton
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu,
	Binutils, Alan Modra

On 15.03.2023 12:08, Nick Clifton wrote:
>> I wonder therefore whether S_IS_LOCAL() and S_IS_EXTERNAL() can't be made
>> true opposites of one another (with perhaps one of the two simply expanding
>> to a call to the other, inverting the result).
> 
> I think that they are actually testing different things.  The way I see it,
> S_IS_LOCAL() should return true for any symbol that is not intended to be
> seen by anyone disassembling the code.  (Unless of course a command line
> has been used to change this behaviour).  So for example labels like .L1
> or ^B1.   This does not however mean the same thing as ELF's STB_LOCAL
> binding.  Local ELF symbols can be found in disassemblies, should be present
> in symbol tables, and represent something intelligible to the user, rather
> than something invented by the assembler.
> 
> S_IS_EXTERNAL() on the other hand is meant to return true for symbols which
> are not only exposed to the disassembler, but which also have visibility
> outside of the file which is currently being assembled.  ie in ELF terms
> symbols which have a binding other than STB_LOCAL.

Oh, I was mislead by the name of the function (would perhaps better be
something like S_IS_INTERNAL()). And I should have looked at the doc of
course ...

So one thing to do would be to, as you did suggest earlier, actually
honor BSF_GLOBAL. Another difference is the treatment of reg_section
symbols - should S_IS_EXTERNAL() perhaps return "false" for those,
ignoring BSF_GLOBAL? It would look to me as if there shouldn't be any
symbol for which both functions can return true (which could be
achieved by having S_IS_EXTERNAL() first call S_IS_LOCAL() and return
"false" if that one returned "true").

Honoring BSF_GLOBAL won't be enough to deal with the quoted symbols
aspect, though. I guess we will need a new flag then which identifies
symbols the user created (and certain equivalents); this might even
be limited to symbols whose names were quoted at the point of creation
(at which point "certain equivalents" would not be relevant as long as
internally created "real" symbols don't contain any special characters,
which I think is the case), as all we need is something to bypass the
looking at their names.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-10  9:36 [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME Jan Beulich
  2023-03-10  9:40 ` Jan Beulich
@ 2023-03-15 16:05 ` Palmer Dabbelt
  2023-03-16 10:35   ` Jan Beulich
  2023-03-30 12:01 ` Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Palmer Dabbelt @ 2023-03-15 16:05 UTC (permalink / raw)
  To: jbeulich; +Cc: binutils, Andrew Waterman, Jim Wilson, nelson

On Fri, 10 Mar 2023 01:36:31 PST (-0800), jbeulich@suse.com wrote:
> The use of a space char there collides with anything using temp_ilp(),
> i.e. at present at least with the special % operator available in
> alternate macro mode. Further uses of the function may appear at any
> time. This likely is a result of write.h's comment not properly
> spelling out all the constraints placed on the selection of the special
> character used. Then again RISC-V anyway violates a constraint which has
> been properly spelled out there: Such labels _do_ appear in assembler
> output.
> ---
> RFC: Of course this breaks interoperability between older gas / new
>      objdump and vice versa. But I don't see a way to resolve the issue
>      at hand without introducing such a discontinuity. To limit "damage"
>      a little, riscv_symbol_is_valid() could of course be tought to also
>      ignore old style fake label names. (Personally I view this tying of
>      functionality between assembler and disassembler as problematic
>      anyway.) Thoughts?

This wouldn't be the first time we've made a change around here in 
RISC-V land, see 2469b3c5844 ("Riscv ld-elf/stab failure and fake label 
cleanup.").  Unless I'm missing something we maintained compatibility 
with the old binaries there, I'd prefer if we can do that here too.  
That said...

> I question the use of FAKE_LABEL_NAME in make_internal_label(). The
> comment in tc-riscv.h isn't correct anyway because of (at least) this
> use - the symbols generated there are never used for Dwarf. And them all
> being the same makes it rather hard to associate relocations resolved to
> symbol names (e.g. "objdump -dr" output) with the actual instance that's
> referenced. Their naming should imo rather follow the model of
> {fb,dollar}_label_name().
>
> Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
> and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
> DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
> marked in lex[] just like done here). I can't point out a specific case
> though where there could be a problem.
>
> The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
> of these characters in quoted symbol names.

... if this is all broken anyway then I guess compatibility doesn't 
matter that much?  I've definately seen some oddness around here, but I 
think it generally works.

> --- a/gas/app.c
> +++ b/gas/app.c
> @@ -200,6 +200,11 @@ do_scrub_begin (int m68k_mri ATTRIBUTE_U
>    lex['-'] = LEX_IS_SYMBOL_COMPONENT;
>  #endif
>
> +  /* If not otherwise used (which it shouldn't be - see write.h), mark the
> +     fake label character as a possible part of symbol names.  */
> +  if (!lex[(unsigned char) FAKE_LABEL_CHAR])
> +    lex[(unsigned char) FAKE_LABEL_CHAR] = LEX_IS_SYMBOL_COMPONENT;
> +
>  #ifdef H_TICK_HEX
>    if (enable_h_tick_hex)
>      {
> --- a/gas/config/tc-riscv.h
> +++ b/gas/config/tc-riscv.h
> @@ -38,8 +38,7 @@ struct expressionS;
>  #define LOCAL_LABELS_FB 	1
>
>  /* Symbols named FAKE_LABEL_NAME are emitted when generating DWARF, so make
> -   sure FAKE_LABEL_NAME is printable.  It still must be distinct from any
> -   real label name.  So, append a space, which other labels can't contain.  */
> +   sure FAKE_LABEL_NAME is printable.  See write.h for constraints.  */
>  #define FAKE_LABEL_NAME RISCV_FAKE_LABEL_NAME
>  /* Changing the special character in FAKE_LABEL_NAME requires changing
>     FAKE_LABEL_CHAR too.  */
> --- a/gas/testsuite/gas/all/altmacro.d
> +++ b/gas/testsuite/gas/all/altmacro.d
> @@ -8,4 +8,4 @@
>
>  Contents of section .data:
>   0000 01020912 61626331 32332121 3c3e2721 .*
> - 0010 3c3e27.*
> + 0010 3c3e27a2 11.*
> --- a/gas/testsuite/gas/all/altmacro.s
> +++ b/gas/testsuite/gas/all/altmacro.s
> @@ -33,3 +33,9 @@ m4	"!!<>'"
>  	.altmacro
>
>  m3	"!!<>'"
> +
> +	.macro	m2b v1, v2
> +	m1 %v2-v1 17
> +	.endm
> +
> +	m2b 81 243
> --- a/gas/write.h
> +++ b/gas/write.h
> @@ -30,7 +30,8 @@
>  /* This is a special character that is used to indicate a fake label.
>     It must be present in FAKE_LABEL_NAME, although it does not have to
>     be the first character.  It must not be a character that would be
> -   found in a valid symbol name.
> +   found in a valid symbol name, nor one that starts an operator, nor
> +   one that's a separator of some kind.
>
>     Also be aware that the function _bfd_elf_is_local_label_name in
>     bfd/elf.c has an implicit assumption that FAKE_LABEL_CHAR is '\001'.
> --- a/include/opcode/riscv.h
> +++ b/include/opcode/riscv.h
> @@ -323,9 +323,10 @@ static inline unsigned int riscv_insn_le
>
>  /* These fake label defines are use by both the assembler, and
>     libopcodes.  The assembler uses this when it needs to generate a fake
> -   label, and libopcodes uses it to hide the fake labels in its output.  */
> -#define RISCV_FAKE_LABEL_NAME ".L0 "
> -#define RISCV_FAKE_LABEL_CHAR ' '
> +   label, and libopcodes uses it to hide the fake labels in its output.
> +   See gas/write.h for constraints.  */
> +#define RISCV_FAKE_LABEL_NAME ".L0?"
> +#define RISCV_FAKE_LABEL_CHAR '?'
>
>  /* Replace bits MASK << SHIFT of STRUCT with the equivalent bits in
>     VALUE << SHIFT.  VALUE is evaluated exactly once.  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-15 16:05 ` Palmer Dabbelt
@ 2023-03-16 10:35   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2023-03-16 10:35 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: binutils, Andrew Waterman, Jim Wilson, nelson

On 15.03.2023 17:05, Palmer Dabbelt wrote:
> On Fri, 10 Mar 2023 01:36:31 PST (-0800), jbeulich@suse.com wrote:
>> The use of a space char there collides with anything using temp_ilp(),
>> i.e. at present at least with the special % operator available in
>> alternate macro mode. Further uses of the function may appear at any
>> time. This likely is a result of write.h's comment not properly
>> spelling out all the constraints placed on the selection of the special
>> character used. Then again RISC-V anyway violates a constraint which has
>> been properly spelled out there: Such labels _do_ appear in assembler
>> output.
>> ---
>> RFC: Of course this breaks interoperability between older gas / new
>>      objdump and vice versa. But I don't see a way to resolve the issue
>>      at hand without introducing such a discontinuity. To limit "damage"
>>      a little, riscv_symbol_is_valid() could of course be tought to also
>>      ignore old style fake label names. (Personally I view this tying of
>>      functionality between assembler and disassembler as problematic
>>      anyway.) Thoughts?
> 
> This wouldn't be the first time we've made a change around here in 
> RISC-V land, see 2469b3c5844 ("Riscv ld-elf/stab failure and fake label 
> cleanup.").  Unless I'm missing something we maintained compatibility 
> with the old binaries there, I'd prefer if we can do that here too.  

That commit didn't touch the binutils/ subtree at all, so "existing"
binaries would no longer have had their "helper" symbols suppressed. And
indeed current code is merely

bool
riscv_symbol_is_valid (asymbol * sym,
                       struct disassemble_info * info ATTRIBUTE_UNUSED)
{
  const char * name;

  if (sym == NULL)
    return false;

  name = bfd_asymbol_name (sym);

  return (strcmp (name, RISCV_FAKE_LABEL_NAME) != 0
	  && !riscv_elf_is_mapping_symbols (name));
}

i.e. no check for the earlier used special name. Yet the RFC question
is specifically to figure out whether riscv_symbol_is_valid() should
also be adjusted (at which point I could of course go and make it
recognize the earlier name as well as the one that is in use prior to
the patch here. (The RFC question also extends in the other direction,
as we can't really do anything about old objdump then no longer
recognizing the new label name. But that again would be no different
from the earlier discontinuity, afaict.)

> That said...
> 
>> I question the use of FAKE_LABEL_NAME in make_internal_label(). The
>> comment in tc-riscv.h isn't correct anyway because of (at least) this
>> use - the symbols generated there are never used for Dwarf. And them all
>> being the same makes it rather hard to associate relocations resolved to
>> symbol names (e.g. "objdump -dr" output) with the actual instance that's
>> referenced. Their naming should imo rather follow the model of
>> {fb,dollar}_label_name().
>>
>> Considering the special casing of FAKE_LABEL_CHAR in read_symbol_name()
>> and get_symbol_name() I wonder in how far LOCAL_LABEL_CHAR and/or
>> DOLLAR_LABEL_CHAR don't also need recognizing there (and then also be
>> marked in lex[] just like done here). I can't point out a specific case
>> though where there could be a problem.
>>
>> The checking for *_LABEL_CHAR in S_IS_LOCAL() looks to collide with uses
>> of these characters in quoted symbol names.
> 
> ... if this is all broken anyway then I guess compatibility doesn't 
> matter that much?  I've definately seen some oddness around here, but I 
> think it generally works.

The latter two of these remarks were general observations while doing the
work, the first of which Nick meanwhile has reassured me is not an issue.
Dealing with the first remark above would (aiui) introduce another
discontinuity, though, so dealing with both aspects at (about) the same
time (i.e. at least within a single release cycle) might be best. Provided
of course it is acceptable in the first place to change
make_internal_label().

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-10  9:36 [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME Jan Beulich
  2023-03-10  9:40 ` Jan Beulich
  2023-03-15 16:05 ` Palmer Dabbelt
@ 2023-03-30 12:01 ` Jan Beulich
  2023-08-04 12:04   ` Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-03-30 12:01 UTC (permalink / raw)
  To: Binutils; +Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

On 10.03.2023 10:36, Jan Beulich via Binutils wrote:
> The use of a space char there collides with anything using temp_ilp(),
> i.e. at present at least with the special % operator available in
> alternate macro mode. Further uses of the function may appear at any
> time. This likely is a result of write.h's comment not properly
> spelling out all the constraints placed on the selection of the special
> character used. Then again RISC-V anyway violates a constraint which has
> been properly spelled out there: Such labels _do_ appear in assembler
> output.
> ---
> RFC: Of course this breaks interoperability between older gas / new
>      objdump and vice versa. But I don't see a way to resolve the issue
>      at hand without introducing such a discontinuity. To limit "damage"
>      a little, riscv_symbol_is_valid() could of course be tought to also
>      ignore old style fake label names. (Personally I view this tying of
>      functionality between assembler and disassembler as problematic
>      anyway.) Thoughts?
> 
> I question the use of FAKE_LABEL_NAME in make_internal_label(). The
> comment in tc-riscv.h isn't correct anyway because of (at least) this
> use - the symbols generated there are never used for Dwarf. And them all
> being the same makes it rather hard to associate relocations resolved to
> symbol names (e.g. "objdump -dr" output) with the actual instance that's
> referenced. Their naming should imo rather follow the model of
> {fb,dollar}_label_name().

I wanted to further mention the following: Using '?' or really any
printable character has the downside of (pretty much) closing the road
of making such characters usable in normal (unquoted) symbols. '?' in
particular is, at least on x86, used e.g. in Microsoft VC's C++ name
mangling scheme. Yet as per the comment in tc-riscv.h it is specifically
a goal to use a printable character here.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME
  2023-03-30 12:01 ` Jan Beulich
@ 2023-08-04 12:04   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2023-08-04 12:04 UTC (permalink / raw)
  To: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu; +Cc: Binutils

On 30.03.2023 14:01, Jan Beulich wrote:
> On 10.03.2023 10:36, Jan Beulich via Binutils wrote:
>> The use of a space char there collides with anything using temp_ilp(),
>> i.e. at present at least with the special % operator available in
>> alternate macro mode. Further uses of the function may appear at any
>> time. This likely is a result of write.h's comment not properly
>> spelling out all the constraints placed on the selection of the special
>> character used. Then again RISC-V anyway violates a constraint which has
>> been properly spelled out there: Such labels _do_ appear in assembler
>> output.
>> ---
>> RFC: Of course this breaks interoperability between older gas / new
>>      objdump and vice versa. But I don't see a way to resolve the issue
>>      at hand without introducing such a discontinuity. To limit "damage"
>>      a little, riscv_symbol_is_valid() could of course be tought to also
>>      ignore old style fake label names. (Personally I view this tying of
>>      functionality between assembler and disassembler as problematic
>>      anyway.) Thoughts?
>>
>> I question the use of FAKE_LABEL_NAME in make_internal_label(). The
>> comment in tc-riscv.h isn't correct anyway because of (at least) this
>> use - the symbols generated there are never used for Dwarf. And them all
>> being the same makes it rather hard to associate relocations resolved to
>> symbol names (e.g. "objdump -dr" output) with the actual instance that's
>> referenced. Their naming should imo rather follow the model of
>> {fb,dollar}_label_name().
> 
> I wanted to further mention the following: Using '?' or really any
> printable character has the downside of (pretty much) closing the road
> of making such characters usable in normal (unquoted) symbols. '?' in
> particular is, at least on x86, used e.g. in Microsoft VC's C++ name
> mangling scheme. Yet as per the comment in tc-riscv.h it is specifically
> a goal to use a printable character here.

All,

with 2.41 out of the way, may I ask that we make some progress here? All
I got back so far was agreement that the situation isn't nice. But that
doesn't help determining how to improve things (a) without too much
breakage and (b) with the goal of not needing to redo this yet again
later.

Thanks, Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-04 12:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10  9:36 [PATCH RFC] RISC-V: alter the special character used in FAKE_LABEL_NAME Jan Beulich
2023-03-10  9:40 ` Jan Beulich
2023-03-13 12:06   ` Nick Clifton
2023-03-13 12:52     ` Jan Beulich
2023-03-13 14:25       ` Nick Clifton
2023-03-13 15:57         ` Jan Beulich
2023-03-15 11:08           ` Nick Clifton
2023-03-15 15:45             ` Jan Beulich
2023-03-15 16:05 ` Palmer Dabbelt
2023-03-16 10:35   ` Jan Beulich
2023-03-30 12:01 ` Jan Beulich
2023-08-04 12:04   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).