public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
@ 2023-01-06 12:34 Jan Beulich
  2023-01-07 23:29 ` Aurelien Jarno
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2023-01-06 12:34 UTC (permalink / raw)
  To: Binutils
  Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu, Aurelien Jarno

PR gas/29940

With the single-operand JAL entry now sitting ahead of the two-operand
one, the parsing of a two-operand insn would first try to parse an 'a'-
style operand, resulting in the insertion of bogus (and otherwise
unused) undefined symbols in the symbol table, having register names.
Since 'a' is used as 1st operand only with J and JAL, and since JAL is
the only insn _also_ allowing for a register as 1st operand (and then
there being a 2nd one), special case this parsing aspect right there.
---
This, of course, is fragile, but I guess such workarounds are
unavoidable with the chosen approach of (recurring) parsing, and with
register names being special only in certain contexts.

A more generic approach, then possibly also helping performance, might
be to count the number of operands first, and do full parsing only when
the count matches that in the operand specifier string (at least when
there are multiple insn forms).

The similar workaround in my_getSmallExpression() actually looks
suspicious to me: I expect that it would get in the way of using equates
"shadowing" names of GPRs.

--- a/gas/config/tc-riscv.c
+++ b/gas/config/tc-riscv.c
@@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
 	      continue;
 
 	    case 'a': /* 20-bit PC-relative offset.  */
+	      /* Like in my_getSmallExpression() we need to avoid emitting
+		 a stray undefined symbol if the 1st JAL entry doesn't match,
+		 but the 2nd (with 2 operands) might.  */
+	      if (oparg == insn->args)
+		{
+		  asargStart = asarg;
+		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
+		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
+		    break;
+		  asarg = asargStart;
+		}
 	    jump:
 	      my_getExpression (imm_expr, asarg);
 	      asarg = expr_end;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-06 12:34 [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering Jan Beulich
@ 2023-01-07 23:29 ` Aurelien Jarno
  2023-01-09 17:07   ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2023-01-07 23:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

On 2023-01-06 13:34, Jan Beulich via Binutils wrote:
> PR gas/29940
> 
> With the single-operand JAL entry now sitting ahead of the two-operand
> one, the parsing of a two-operand insn would first try to parse an 'a'-
> style operand, resulting in the insertion of bogus (and otherwise
> unused) undefined symbols in the symbol table, having register names.
> Since 'a' is used as 1st operand only with J and JAL, and since JAL is
> the only insn _also_ allowing for a register as 1st operand (and then
> there being a 2nd one), special case this parsing aspect right there.
> ---
> This, of course, is fragile, but I guess such workarounds are
> unavoidable with the chosen approach of (recurring) parsing, and with
> register names being special only in certain contexts.
> 
> A more generic approach, then possibly also helping performance, might
> be to count the number of operands first, and do full parsing only when
> the count matches that in the operand specifier string (at least when
> there are multiple insn forms).
> 
> The similar workaround in my_getSmallExpression() actually looks
> suspicious to me: I expect that it would get in the way of using equates
> "shadowing" names of GPRs.
> 
> --- a/gas/config/tc-riscv.c
> +++ b/gas/config/tc-riscv.c
> @@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>  	      continue;
>  
>  	    case 'a': /* 20-bit PC-relative offset.  */
> +	      /* Like in my_getSmallExpression() we need to avoid emitting
> +		 a stray undefined symbol if the 1st JAL entry doesn't match,
> +		 but the 2nd (with 2 operands) might.  */
> +	      if (oparg == insn->args)
> +		{
> +		  asargStart = asarg;
> +		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
> +		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
> +		    break;
> +		  asarg = asargStart;
> +		}
>  	    jump:
>  	      my_getExpression (imm_expr, asarg);
>  	      asarg = expr_end;

Thanks for the patch. I have tested it and confirmed it fix the problem
I reported.

Regards
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-07 23:29 ` Aurelien Jarno
@ 2023-01-09 17:07   ` Jan Beulich
  2023-01-09 19:07     ` Palmer Dabbelt
  2023-01-10 12:31     ` Nick Clifton
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Beulich @ 2023-01-09 17:07 UTC (permalink / raw)
  To: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu, Nick Clifton
  Cc: Binutils

On 08.01.2023 00:29, Aurelien Jarno wrote:
> On 2023-01-06 13:34, Jan Beulich via Binutils wrote:
>> PR gas/29940
>>
>> With the single-operand JAL entry now sitting ahead of the two-operand
>> one, the parsing of a two-operand insn would first try to parse an 'a'-
>> style operand, resulting in the insertion of bogus (and otherwise
>> unused) undefined symbols in the symbol table, having register names.
>> Since 'a' is used as 1st operand only with J and JAL, and since JAL is
>> the only insn _also_ allowing for a register as 1st operand (and then
>> there being a 2nd one), special case this parsing aspect right there.
>> ---
>> This, of course, is fragile, but I guess such workarounds are
>> unavoidable with the chosen approach of (recurring) parsing, and with
>> register names being special only in certain contexts.
>>
>> A more generic approach, then possibly also helping performance, might
>> be to count the number of operands first, and do full parsing only when
>> the count matches that in the operand specifier string (at least when
>> there are multiple insn forms).
>>
>> The similar workaround in my_getSmallExpression() actually looks
>> suspicious to me: I expect that it would get in the way of using equates
>> "shadowing" names of GPRs.
>>
>> --- a/gas/config/tc-riscv.c
>> +++ b/gas/config/tc-riscv.c
>> @@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>>  	      continue;
>>  
>>  	    case 'a': /* 20-bit PC-relative offset.  */
>> +	      /* Like in my_getSmallExpression() we need to avoid emitting
>> +		 a stray undefined symbol if the 1st JAL entry doesn't match,
>> +		 but the 2nd (with 2 operands) might.  */
>> +	      if (oparg == insn->args)
>> +		{
>> +		  asargStart = asarg;
>> +		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
>> +		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
>> +		    break;
>> +		  asarg = asargStart;
>> +		}
>>  	    jump:
>>  	      my_getExpression (imm_expr, asarg);
>>  	      asarg = expr_end;
> 
> Thanks for the patch. I have tested it and confirmed it fix the problem
> I reported.

With 2.40 scheduled to be cut in less than a week, may I ask for an arch
maintainer's view here?

Nick, to save a round trip, could you confirm (or otherwise) that this
fix is then also fine to go on the branch (after having gone into master)?

Thanks, Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-09 17:07   ` Jan Beulich
@ 2023-01-09 19:07     ` Palmer Dabbelt
  2023-01-09 21:59       ` Maciej W. Rozycki
  2023-01-10 12:31     ` Nick Clifton
  1 sibling, 1 reply; 13+ messages in thread
From: Palmer Dabbelt @ 2023-01-09 19:07 UTC (permalink / raw)
  To: jbeulich; +Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils

On Mon, 09 Jan 2023 09:07:06 PST (-0800), jbeulich@suse.com wrote:
> On 08.01.2023 00:29, Aurelien Jarno wrote:
>> On 2023-01-06 13:34, Jan Beulich via Binutils wrote:
>>> PR gas/29940
>>>
>>> With the single-operand JAL entry now sitting ahead of the two-operand
>>> one, the parsing of a two-operand insn would first try to parse an 'a'-
>>> style operand, resulting in the insertion of bogus (and otherwise
>>> unused) undefined symbols in the symbol table, having register names.
>>> Since 'a' is used as 1st operand only with J and JAL, and since JAL is
>>> the only insn _also_ allowing for a register as 1st operand (and then
>>> there being a 2nd one), special case this parsing aspect right there.
>>> ---
>>> This, of course, is fragile, but I guess such workarounds are
>>> unavoidable with the chosen approach of (recurring) parsing, and with
>>> register names being special only in certain contexts.
>>>
>>> A more generic approach, then possibly also helping performance, might
>>> be to count the number of operands first, and do full parsing only when
>>> the count matches that in the operand specifier string (at least when
>>> there are multiple insn forms).
>>>
>>> The similar workaround in my_getSmallExpression() actually looks
>>> suspicious to me: I expect that it would get in the way of using equates
>>> "shadowing" names of GPRs.
>>>
>>> --- a/gas/config/tc-riscv.c
>>> +++ b/gas/config/tc-riscv.c
>>> @@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>>>  	      continue;
>>>
>>>  	    case 'a': /* 20-bit PC-relative offset.  */
>>> +	      /* Like in my_getSmallExpression() we need to avoid emitting
>>> +		 a stray undefined symbol if the 1st JAL entry doesn't match,
>>> +		 but the 2nd (with 2 operands) might.  */
>>> +	      if (oparg == insn->args)
>>> +		{
>>> +		  asargStart = asarg;
>>> +		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
>>> +		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
>>> +		    break;
>>> +		  asarg = asargStart;
>>> +		}
>>>  	    jump:
>>>  	      my_getExpression (imm_expr, asarg);
>>>  	      asarg = expr_end;
>>
>> Thanks for the patch. I have tested it and confirmed it fix the problem
>> I reported.
>
> With 2.40 scheduled to be cut in less than a week, may I ask for an arch
> maintainer's view here?

Thanks for fixing this.  I don't have any issues with what's there, but 
looks like I'm also getting some failures (glibc/multilib errno related 
stuff).  I'm trying to bisect those so I can't really get a proper test 
up now, I'll try to do so ASAP as it's really late to have stuff broken.

> Nick, to save a round trip, could you confirm (or otherwise) that this
> fix is then also fine to go on the branch (after having gone into master)?
>
> Thanks, Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-09 19:07     ` Palmer Dabbelt
@ 2023-01-09 21:59       ` Maciej W. Rozycki
  2023-01-10  9:25         ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2023-01-09 21:59 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: jbeulich, Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils

On Mon, 9 Jan 2023, Palmer Dabbelt wrote:

> > > > The similar workaround in my_getSmallExpression() actually looks
> > > > suspicious to me: I expect that it would get in the way of using equates
> > > > "shadowing" names of GPRs.
> > > > 
> > > > --- a/gas/config/tc-riscv.c
> > > > +++ b/gas/config/tc-riscv.c
> > > > @@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
> > > >  	      continue;
> > > > 
> > > >  	    case 'a': /* 20-bit PC-relative offset.  */
> > > > +	      /* Like in my_getSmallExpression() we need to avoid emitting
> > > > +		 a stray undefined symbol if the 1st JAL entry doesn't match,
> > > > +		 but the 2nd (with 2 operands) might.  */
> > > > +	      if (oparg == insn->args)
> > > > +		{
> > > > +		  asargStart = asarg;
> > > > +		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
> > > > +		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] ==
> > > > ',')))
> > > > +		    break;
> > > > +		  asarg = asargStart;
> > > > +		}
> > > >  	    jump:
> > > >  	      my_getExpression (imm_expr, asarg);
> > > >  	      asarg = expr_end;
> > > 
> > > Thanks for the patch. I have tested it and confirmed it fix the problem
> > > I reported.
> > 
> > With 2.40 scheduled to be cut in less than a week, may I ask for an arch
> > maintainer's view here?
> 
> Thanks for fixing this.  I don't have any issues with what's there, but looks
> like I'm also getting some failures (glibc/multilib errno related stuff).  I'm
> trying to bisect those so I can't really get a proper test up now, I'll try to
> do so ASAP as it's really late to have stuff broken.

 I wonder why the RISC-V port needs such a hack while the MIPS one 
doesn't.

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-09 21:59       ` Maciej W. Rozycki
@ 2023-01-10  9:25         ` Jan Beulich
  2023-01-10 22:58           ` Maciej W. Rozycki
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2023-01-10  9:25 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On 09.01.2023 22:59, Maciej W. Rozycki wrote:
> On Mon, 9 Jan 2023, Palmer Dabbelt wrote:
> 
>>>>> The similar workaround in my_getSmallExpression() actually looks
>>>>> suspicious to me: I expect that it would get in the way of using equates
>>>>> "shadowing" names of GPRs.
>>>>>
>>>>> --- a/gas/config/tc-riscv.c
>>>>> +++ b/gas/config/tc-riscv.c
>>>>> @@ -3266,6 +3266,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>>>>>  	      continue;
>>>>>
>>>>>  	    case 'a': /* 20-bit PC-relative offset.  */
>>>>> +	      /* Like in my_getSmallExpression() we need to avoid emitting
>>>>> +		 a stray undefined symbol if the 1st JAL entry doesn't match,
>>>>> +		 but the 2nd (with 2 operands) might.  */
>>>>> +	      if (oparg == insn->args)
>>>>> +		{
>>>>> +		  asargStart = asarg;
>>>>> +		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
>>>>> +		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] ==
>>>>> ',')))
>>>>> +		    break;
>>>>> +		  asarg = asargStart;
>>>>> +		}
>>>>>  	    jump:
>>>>>  	      my_getExpression (imm_expr, asarg);
>>>>>  	      asarg = expr_end;
>>>>
>>>> Thanks for the patch. I have tested it and confirmed it fix the problem
>>>> I reported.
>>>
>>> With 2.40 scheduled to be cut in less than a week, may I ask for an arch
>>> maintainer's view here?
>>
>> Thanks for fixing this.  I don't have any issues with what's there, but looks
>> like I'm also getting some failures (glibc/multilib errno related stuff).  I'm
>> trying to bisect those so I can't really get a proper test up now, I'll try to
>> do so ASAP as it's really late to have stuff broken.
> 
>  I wonder why the RISC-V port needs such a hack while the MIPS one 
> doesn't.

Perhaps I misunderstood your earlier reply then. There you said you deal with
this by carefully ordering the opcode table. Here restoring the original order
would only be a temporary workaround, as it would re-introduce the
disassembler issue that was fixed by altering the order: Alias entries need to
come ahead of "real" ones, or else respective alias mnemonics would never be
emitted by the disassembler. Hence a solution is needed here which retains the
order, and which also respects that insn operands may be parsed
- more than once,
- with identical names having different meaning depending on context.
I realize what I'm doing is a hack, but I cannot think of anything better.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-09 17:07   ` Jan Beulich
  2023-01-09 19:07     ` Palmer Dabbelt
@ 2023-01-10 12:31     ` Nick Clifton
  2023-01-10 20:14       ` Palmer Dabbelt
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Clifton @ 2023-01-10 12:31 UTC (permalink / raw)
  To: Jan Beulich, Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu
  Cc: Binutils

Hi Jan,

>>> PR gas/29940

> With 2.40 scheduled to be cut in less than a week, may I ask for an arch
> maintainer's view here?
> 
> Nick, to save a round trip, could you confirm (or otherwise) that this
> fix is then also fine to go on the branch (after having gone into master)?

If the arch maintainer approves, then yes, backporting this patch to the 2.40
branch is also approved.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-10 12:31     ` Nick Clifton
@ 2023-01-10 20:14       ` Palmer Dabbelt
  0 siblings, 0 replies; 13+ messages in thread
From: Palmer Dabbelt @ 2023-01-10 20:14 UTC (permalink / raw)
  To: Nick Clifton; +Cc: jbeulich, Andrew Waterman, Jim Wilson, nelson, binutils

On Tue, 10 Jan 2023 04:31:01 PST (-0800), Nick Clifton wrote:
> Hi Jan,
>
>>>> PR gas/29940
>
>> With 2.40 scheduled to be cut in less than a week, may I ask for an arch
>> maintainer's view here?
>>
>> Nick, to save a round trip, could you confirm (or otherwise) that this
>> fix is then also fine to go on the branch (after having gone into master)?
>
> If the arch maintainer approves, then yes, backporting this patch to the 2.40
> branch is also approved.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

I'm still hitting some glibc/multili issues, but this doesn't appear to 
make anything worse so IMO we're better off just taking this now.  I'll 
try and sort out what's up on my end.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-10  9:25         ` Jan Beulich
@ 2023-01-10 22:58           ` Maciej W. Rozycki
  2023-01-11  9:28             ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2023-01-10 22:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On Tue, 10 Jan 2023, Jan Beulich wrote:

> >> Thanks for fixing this.  I don't have any issues with what's there, but looks
> >> like I'm also getting some failures (glibc/multilib errno related stuff).  I'm
> >> trying to bisect those so I can't really get a proper test up now, I'll try to
> >> do so ASAP as it's really late to have stuff broken.
> > 
> >  I wonder why the RISC-V port needs such a hack while the MIPS one 
> > doesn't.
> 
> Perhaps I misunderstood your earlier reply then. There you said you deal with
> this by carefully ordering the opcode table. Here restoring the original order
> would only be a temporary workaround, as it would re-introduce the
> disassembler issue that was fixed by altering the order: Alias entries need to
> come ahead of "real" ones, or else respective alias mnemonics would never be
> emitted by the disassembler.

 Correct.  But why does it require such an intervention on the GAS side?

 AFAIK similarly to the MIPS port and like the disassembler GAS tries to 
match instructions in the order they appear in the opcode table, except 
that the disassembler matches by the opcode/mask, but GAS matches by the 
mnemonic/operands.  It is expected that a single-operand alias appears 
first so that the disassembler matches it first (unless `-M no-aliases'), 
but it is also expected not to match a two-operand assembly instruction, 
so that GAS proceeds to the next candidate opcode table entry.

 And it does appear to happen, because correct machine code is produced 
regardless of your hack, except for the spurious symbol produced.  So is 
it not the case that simply the state (interal relocations recorded) is 
not correctly reset on an unsuccessful operand match?  Why does it have to 
be special-cased just for the `a' operand type?

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-10 22:58           ` Maciej W. Rozycki
@ 2023-01-11  9:28             ` Jan Beulich
  2023-01-12  1:28               ` Maciej W. Rozycki
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2023-01-11  9:28 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On 10.01.2023 23:58, Maciej W. Rozycki wrote:
> On Tue, 10 Jan 2023, Jan Beulich wrote:
> 
>>>> Thanks for fixing this.  I don't have any issues with what's there, but looks
>>>> like I'm also getting some failures (glibc/multilib errno related stuff).  I'm
>>>> trying to bisect those so I can't really get a proper test up now, I'll try to
>>>> do so ASAP as it's really late to have stuff broken.
>>>
>>>  I wonder why the RISC-V port needs such a hack while the MIPS one 
>>> doesn't.
>>
>> Perhaps I misunderstood your earlier reply then. There you said you deal with
>> this by carefully ordering the opcode table. Here restoring the original order
>> would only be a temporary workaround, as it would re-introduce the
>> disassembler issue that was fixed by altering the order: Alias entries need to
>> come ahead of "real" ones, or else respective alias mnemonics would never be
>> emitted by the disassembler.
> 
>  Correct.  But why does it require such an intervention on the GAS side?
> 
>  AFAIK similarly to the MIPS port and like the disassembler GAS tries to 
> match instructions in the order they appear in the opcode table, except 
> that the disassembler matches by the opcode/mask, but GAS matches by the 
> mnemonic/operands.  It is expected that a single-operand alias appears 
> first so that the disassembler matches it first (unless `-M no-aliases'), 
> but it is also expected not to match a two-operand assembly instruction, 
> so that GAS proceeds to the next candidate opcode table entry.
> 
>  And it does appear to happen, because correct machine code is produced 
> regardless of your hack, except for the spurious symbol produced.  So is 
> it not the case that simply the state (interal relocations recorded) is 
> not correctly reset on an unsuccessful operand match?  Why does it have to 
> be special-cased just for the `a' operand type?

The parsing of an 'a' type operand involves expression(), a side effect of
which is to insert a symbol table entry for symbols not otherwise
recognized (and note how my_getSmallExpression() addresses the same issue
by filtering out GPR names first [1]). Yes, in a way this is an
"insufficient undoing" issue, just that undoing of that symbol table
insertion would be quite hard and/or fragile (from all I can tell). And
this is where the dual meaning of symbol names comes into play: This looks
to be intentional, and hence we can't make use of md_parse_name() to
suppress the symbol table insertion in the first place for symbols which
(in other contexts) identify registers.

As to "just" - the issue could in principle happen elsewhere as well
(beyond what my_getSmallExpression() accounts for at present), but luckily
(so far) it doesn't. Hence I thought it would be best to keep things
isolated to 'a'.

As suggested in a post-commit-message remark, another workaround might be
to count actual and expected operands first, and skip right to the next
table entry when the two numbers don't match. I didn't go this route
because I have no insight into possible future plans as to operands which
might themselves contain commas (see e.g. Arm or x86 AT&T syntax for
examples of such), at which point counting actuals would become more
complicated than just counting commas.

Jan

[1] As stated elsewhere, I think this is wrong, as I expect it breaks
certain use cases of, in particular, equates. But correcting it would
require finding another solution to the symbol table insertion issue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-11  9:28             ` Jan Beulich
@ 2023-01-12  1:28               ` Maciej W. Rozycki
  2023-01-12  8:26                 ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej W. Rozycki @ 2023-01-12  1:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On Wed, 11 Jan 2023, Jan Beulich wrote:

> >  And it does appear to happen, because correct machine code is produced 
> > regardless of your hack, except for the spurious symbol produced.  So is 
> > it not the case that simply the state (interal relocations recorded) is 
> > not correctly reset on an unsuccessful operand match?  Why does it have to 
> > be special-cased just for the `a' operand type?
> 
> The parsing of an 'a' type operand involves expression(), a side effect of
> which is to insert a symbol table entry for symbols not otherwise
> recognized (and note how my_getSmallExpression() addresses the same issue
> by filtering out GPR names first [1]). Yes, in a way this is an
> "insufficient undoing" issue, just that undoing of that symbol table
> insertion would be quite hard and/or fragile (from all I can tell). And
> this is where the dual meaning of symbol names comes into play: This looks
> to be intentional, and hence we can't make use of md_parse_name() to
> suppress the symbol table insertion in the first place for symbols which
> (in other contexts) identify registers.

 Thank you for looking into it.  Indeed it looks to me like a problem with 
`expression' (or `expr' really) and the way the RISC-V assembly dialect 
defines register references (unlike the MIPS one which uses a `$' prefix).  

 At a glance it seems to me that the correct approach would be to define a 
"dry run" mode for `expr' and use it in the RISC-V backend to validate an 
operand in the first invocation without causing any side effects, and then 
only once all the operands have been processed and an opcode table entry 
accepted `expr' would be called to finalise the expression.

 I realise it's something you may not be willing to commit to, as it's 
likely a larger task than a random tweak to the RISC-V backend, but I 
think it's the way we ought to do it rather than piling up workarounds.

 FWIW,

  Maciej

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-12  1:28               ` Maciej W. Rozycki
@ 2023-01-12  8:26                 ` Jan Beulich
  2023-01-12  8:40                   ` Andrew Waterman
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2023-01-12  8:26 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Waterman, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On 12.01.2023 02:28, Maciej W. Rozycki wrote:
> On Wed, 11 Jan 2023, Jan Beulich wrote:
> 
>>>  And it does appear to happen, because correct machine code is produced 
>>> regardless of your hack, except for the spurious symbol produced.  So is 
>>> it not the case that simply the state (interal relocations recorded) is 
>>> not correctly reset on an unsuccessful operand match?  Why does it have to 
>>> be special-cased just for the `a' operand type?
>>
>> The parsing of an 'a' type operand involves expression(), a side effect of
>> which is to insert a symbol table entry for symbols not otherwise
>> recognized (and note how my_getSmallExpression() addresses the same issue
>> by filtering out GPR names first [1]). Yes, in a way this is an
>> "insufficient undoing" issue, just that undoing of that symbol table
>> insertion would be quite hard and/or fragile (from all I can tell). And
>> this is where the dual meaning of symbol names comes into play: This looks
>> to be intentional, and hence we can't make use of md_parse_name() to
>> suppress the symbol table insertion in the first place for symbols which
>> (in other contexts) identify registers.
> 
>  Thank you for looking into it.  Indeed it looks to me like a problem with 
> `expression' (or `expr' really) and the way the RISC-V assembly dialect 
> defines register references (unlike the MIPS one which uses a `$' prefix).  
> 
>  At a glance it seems to me that the correct approach would be to define a 
> "dry run" mode for `expr' and use it in the RISC-V backend to validate an 
> operand in the first invocation without causing any side effects, and then 
> only once all the operands have been processed and an opcode table entry 
> accepted `expr' would be called to finalise the expression.
> 
>  I realise it's something you may not be willing to commit to, as it's 
> likely a larger task than a random tweak to the RISC-V backend, but I 
> think it's the way we ought to do it rather than piling up workarounds.

I might actually try to do something along those lines, but only once it was
clarified (by the arch maintainers) that the present behavior of identifiers
meaning different things depending on context is actually intentional.
There not being a prefix to indicate registers isn't unprecedented, after
all - at least x86 (Intel syntax, or more generally "noprefix" mode), ia64,
and Arm permit the same. The former two take the identifier as a register
regardless of which insn this is an operand of (creating another problem
when you really mean a symbol of that name, with varying approaches to
dealing with). Arm instead makes sure that different mnemonics are used
(b vs bx for Arm32, b vs br for Arm64) and hence ambiguities cannot arise.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering
  2023-01-12  8:26                 ` Jan Beulich
@ 2023-01-12  8:40                   ` Andrew Waterman
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Waterman @ 2023-01-12  8:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Maciej W. Rozycki, Jim Wilson, nelson, Nick Clifton, binutils,
	Palmer Dabbelt

On Thu, Jan 12, 2023 at 12:26 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 12.01.2023 02:28, Maciej W. Rozycki wrote:
> > On Wed, 11 Jan 2023, Jan Beulich wrote:
> >
> >>>  And it does appear to happen, because correct machine code is produced
> >>> regardless of your hack, except for the spurious symbol produced.  So is
> >>> it not the case that simply the state (interal relocations recorded) is
> >>> not correctly reset on an unsuccessful operand match?  Why does it have to
> >>> be special-cased just for the `a' operand type?
> >>
> >> The parsing of an 'a' type operand involves expression(), a side effect of
> >> which is to insert a symbol table entry for symbols not otherwise
> >> recognized (and note how my_getSmallExpression() addresses the same issue
> >> by filtering out GPR names first [1]). Yes, in a way this is an
> >> "insufficient undoing" issue, just that undoing of that symbol table
> >> insertion would be quite hard and/or fragile (from all I can tell). And
> >> this is where the dual meaning of symbol names comes into play: This looks
> >> to be intentional, and hence we can't make use of md_parse_name() to
> >> suppress the symbol table insertion in the first place for symbols which
> >> (in other contexts) identify registers.
> >
> >  Thank you for looking into it.  Indeed it looks to me like a problem with
> > `expression' (or `expr' really) and the way the RISC-V assembly dialect
> > defines register references (unlike the MIPS one which uses a `$' prefix).
> >
> >  At a glance it seems to me that the correct approach would be to define a
> > "dry run" mode for `expr' and use it in the RISC-V backend to validate an
> > operand in the first invocation without causing any side effects, and then
> > only once all the operands have been processed and an opcode table entry
> > accepted `expr' would be called to finalise the expression.
> >
> >  I realise it's something you may not be willing to commit to, as it's
> > likely a larger task than a random tweak to the RISC-V backend, but I
> > think it's the way we ought to do it rather than piling up workarounds.
>
> I might actually try to do something along those lines, but only once it was
> clarified (by the arch maintainers) that the present behavior of identifiers
> meaning different things depending on context is actually intentional.

I haven't been following this discussion until now, but if I
understand the question correctly, then yes, it is intentional.  Were
we to travel back in time, we would have defined a different assembly
syntax that sidestepped this complexity.  But it is now part of an API
that is in widespread use, so we are stuck with it.

> There not being a prefix to indicate registers isn't unprecedented, after
> all - at least x86 (Intel syntax, or more generally "noprefix" mode), ia64,
> and Arm permit the same. The former two take the identifier as a register
> regardless of which insn this is an operand of (creating another problem
> when you really mean a symbol of that name, with varying approaches to
> dealing with). Arm instead makes sure that different mnemonics are used
> (b vs bx for Arm32, b vs br for Arm64) and hence ambiguities cannot arise.
>
> Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-12  8:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06 12:34 [PATCH] gas/RISC-V: adjust assembler for opcode table re-ordering Jan Beulich
2023-01-07 23:29 ` Aurelien Jarno
2023-01-09 17:07   ` Jan Beulich
2023-01-09 19:07     ` Palmer Dabbelt
2023-01-09 21:59       ` Maciej W. Rozycki
2023-01-10  9:25         ` Jan Beulich
2023-01-10 22:58           ` Maciej W. Rozycki
2023-01-11  9:28             ` Jan Beulich
2023-01-12  1:28               ` Maciej W. Rozycki
2023-01-12  8:26                 ` Jan Beulich
2023-01-12  8:40                   ` Andrew Waterman
2023-01-10 12:31     ` Nick Clifton
2023-01-10 20:14       ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).