public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] RISC-V/gas: re-work register named symbols avoidance logic
@ 2023-02-13  8:01 Jan Beulich
  2023-02-13  8:02 ` [PATCH 1/2] RISC-V: test for expected / no unexpected symbols Jan Beulich
  2023-02-13  8:02 ` [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2023-02-13  8:01 UTC (permalink / raw)
  To: Binutils; +Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

1: test for expected / no unexpected symbols
2: adjust logic to avoid register name symbols

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] RISC-V: test for expected / no unexpected symbols
  2023-02-13  8:01 [PATCH 0/2] RISC-V/gas: re-work register named symbols avoidance logic Jan Beulich
@ 2023-02-13  8:02 ` Jan Beulich
  2023-02-13  8:02 ` [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-02-13  8:02 UTC (permalink / raw)
  To: Binutils; +Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

Both the temporary workaround for PR/gas 29940 and the existing special
casing of GPRs in my_getSmallExpression() aren't really tested anywhere
(i.e. with the workarounds remove testing would still succeed). Nor is
there any test for uses of symbols with names matching GPRs, where such
is permitted. Before altering how this is to be dealt with, install two
testcases covering the expected behavior. (For now this includes only
known affected insns; re-ordering of entries in riscv_opcodes[] could,
however, yield more of them.)

--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms.d
@@ -0,0 +1,6 @@
+#as: -march=rv32i
+#nm: --
+
+0+ t start
+ +U x2
+ +U x4
--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms.s
@@ -0,0 +1,8 @@
+	.text
+start:
+	and	x8, x8, x1
+	j	x2
+	jal	x3, x4
+	sll	x1, x1, x5
+	sra	x8, x8, x6
+	srl	x8, x8, x7
--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms-C.d
@@ -0,0 +1,4 @@
+#as: -march=rv32ic
+#source: reg-syms.s
+#nm: --
+#dump: reg-syms.d


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-13  8:01 [PATCH 0/2] RISC-V/gas: re-work register named symbols avoidance logic Jan Beulich
  2023-02-13  8:02 ` [PATCH 1/2] RISC-V: test for expected / no unexpected symbols Jan Beulich
@ 2023-02-13  8:02 ` Jan Beulich
  2023-02-15  3:09   ` Nelson Chu
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-02-13  8:02 UTC (permalink / raw)
  To: Binutils; +Cc: Palmer Dabbelt, Andrew Waterman, Jim Wilson, Nelson Chu

Special casing GPR names in my_getSmallExpression() leads to a number of
inconsistencies. Generalize this by utilizing the md_parse_name() hook,
limited to when instruction operands are being parsed (really: probed).
Then both the GPR lookup there and the yet more ad hoc workaround for
PR/gas 29940 can be removed (including its extension needed for making
the compressed form JAL work again).
---
When a floating point extension (but not [hfdq]inx) is enabled, floating
point registers would perhaps better be recognized by riscv_parse_name()
as well. Same for vector registers and likely also CSRs. Otherwise
inconsistent behavior may (continue to) result when equates come into
play, afaict.

Considering equates, the new behavior isn't the only sensible one: We
could also omit the symbol_find() call, thus never honoring equates.
Then, however, even with the "i" insn suffix (or sometimes infix)
explicitly specified, equates with names matching a GPR name (or, see
above, potentially also other register names) wouldn't be accepted
anymore. I guess the primary question here is what the intended (and
consistent) behavior is to be.

Overall the uses of my_getSmallExpression() vs my_getExpression() look
pretty random to me: Is there any underlying principle when which of the
two is expected to be used?

--- a/gas/config/tc-riscv.c
+++ b/gas/config/tc-riscv.c
@@ -171,6 +171,8 @@ static enum float_abi float_abi = FLOAT_
 
 static unsigned elf_flags = 0;
 
+static unsigned probing_insn_operands;
+
 /* Set the default_isa_spec.  Return 0 if the spec isn't supported.
    Otherwise, return 1.  */
 
@@ -2228,21 +2230,9 @@ my_getSmallExpression (expressionS *ep,
 		       char *str, const struct percent_op_match *percent_op)
 {
   size_t reloc_index;
-  unsigned crux_depth, str_depth, regno;
+  unsigned crux_depth, str_depth;
   char *crux;
 
-  /* First, check for integer registers.  No callers can accept a reg, but
-     we need to avoid accidentally creating a useless undefined symbol below,
-     if this is an instruction pattern that can't match.  A glibc build fails
-     if this is removed.  */
-  if (reg_lookup (&str, RCLASS_GPR, &regno))
-    {
-      ep->X_op = O_register;
-      ep->X_add_number = regno;
-      expr_parse_end = str;
-      return 0;
-    }
-
   /* Search for the start of the main expression.
 
      End the loop with CRUX pointing to the start of the main expression and
@@ -2265,9 +2255,15 @@ my_getSmallExpression (expressionS *ep,
 	 && reloc_index < 1
 	 && parse_relocation (&str, reloc, percent_op));
 
+  if (probing_insn_operands)
+    ++probing_insn_operands;
+
   my_getExpression (ep, crux);
   str = expr_parse_end;
 
+  if (probing_insn_operands)
+    --probing_insn_operands;
+
   /* Match every open bracket.  */
   while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
     if (*str++ == ')')
@@ -2453,6 +2449,13 @@ riscv_is_priv_insn (insn_t insn)
 	  || ((insn ^ MATCH_SFENCE_VM) & MASK_SFENCE_VM) == 0);
 }
 
+static symbolS *deferred_sym_rootP;
+static symbolS *deferred_sym_lastP;
+/* Since symbols can't easily be freed, try to recycle ones which weren't
+   committed.  */
+static symbolS *orphan_sym_rootP;
+static symbolS *orphan_sym_lastP;
+
 /* This routine assembles an instruction into its binary format.  As a
    side effect, it sets the global variable imm_reloc to the type of
    relocation to do if one of the operands is an address expression.  */
@@ -2488,6 +2491,8 @@ riscv_ip (char *str, struct riscv_cl_ins
 
   insn = (struct riscv_opcode *) str_hash_find (hash, str);
 
+  probing_insn_operands = 1;
+
   asargStart = asarg;
   for ( ; insn && insn->name && strcmp (insn->name, str) == 0; insn++)
     {
@@ -2504,6 +2509,17 @@ riscv_ip (char *str, struct riscv_cl_ins
       /* Reset error message of the previous round.  */
       error.msg = _("illegal operands");
       error.missing_ext = NULL;
+
+      /* Purge deferred symbols from the previous round, if any.  */
+      while (deferred_sym_rootP)
+	{
+	  symbolS *sym = deferred_sym_rootP;
+
+	  symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
+	  symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
+			 &orphan_sym_lastP);
+	}
+
       create_insn (ip, insn);
 
       imm_expr->X_op = O_absent;
@@ -2558,9 +2574,22 @@ riscv_ip (char *str, struct riscv_cl_ins
 		}
 	      if (*asarg != '\0')
 		break;
+
 	      /* Successful assembly.  */
 	      error.msg = NULL;
 	      insn_with_csr = false;
+
+	      /* Commit deferred symbols, if any.  */
+	      while (deferred_sym_rootP)
+		{
+		  symbolS *sym = deferred_sym_rootP;
+
+		  symbol_remove (sym, &deferred_sym_rootP,
+				 &deferred_sym_lastP);
+		  symbol_append (sym, symbol_lastP, &symbol_rootP,
+				 &symbol_lastP);
+		  symbol_table_insert (sym);
+		}
 	      goto out;
 
 	    case 'C': /* RVC */
@@ -2764,8 +2793,6 @@ riscv_ip (char *str, struct riscv_cl_ins
 		case 'p':
 		  goto branch;
 		case 'a':
-		  if (oparg == insn->args + 1)
-		    goto jump_check_gpr;
 		  goto jump;
 		case 'S': /* Floating-point RS1 x8-x15.  */
 		  if (!reg_lookup (&asarg, RCLASS_FPR, &regno)
@@ -3270,18 +3297,6 @@ riscv_ip (char *str, struct riscv_cl_ins
 	      continue;
 
 	    case 'a': /* 20-bit PC-relative offset.  */
-	      /* Like in my_getSmallExpression() we need to avoid emitting
-		 a stray undefined symbol if the 1st JAL entry doesn't match,
-		 but the 2nd (with 2 operands) might.  */
-	      if (oparg == insn->args)
-		{
-	    jump_check_gpr:
-		  asargStart = asarg;
-		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
-		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
-		    break;
-		  asarg = asargStart;
-		}
 	    jump:
 	      my_getExpression (imm_expr, asarg);
 	      asarg = expr_parse_end;
@@ -3504,6 +3519,8 @@ riscv_ip (char *str, struct riscv_cl_ins
   if (save_c)
     *(asargStart  - 1) = save_c;
 
+  probing_insn_operands = 0;
+
   return error;
 }
 
@@ -3800,6 +3817,62 @@ riscv_after_parse_args (void)
     flag_dwarf_cie_version = 3;
 }
 
+bool riscv_parse_name (const char *name, struct expressionS *ep,
+		       enum expr_mode mode)
+{
+  unsigned int regno;
+
+  if (!probing_insn_operands)
+    return false;
+
+  gas_assert (mode == expr_normal);
+
+  regno = reg_lookup_internal (name, RCLASS_GPR);
+  if (regno == (unsigned int)-1)
+    return false;
+
+  if (symbol_find (name) != NULL)
+    return false;
+
+  if (probing_insn_operands > 1)
+    {
+      ep->X_op = O_register;
+      ep->X_add_number = regno;
+    }
+  else
+    {
+      /* Create a symbol without adding it to the symbol table yet.
+	 Insertion will happen only once we commit to using the insn
+	 we're probing operands for.  */
+      symbolS *sym;
+
+      for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
+	if (strcmp (name, S_GET_NAME (sym)) == 0)
+	  break;
+      if (!sym)
+	{
+	  for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
+	    if (strcmp (name, S_GET_NAME (sym)) == 0)
+	      {
+		symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
+		break;
+	      }
+	  if (!sym)
+	    sym = symbol_create (name, undefined_section,
+				 &zero_address_frag, 0);
+
+	  symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
+			 &deferred_sym_lastP);
+	}
+
+      ep->X_op = O_symbol;
+      ep->X_add_symbol = sym;
+      ep->X_add_number = 0;
+    }
+
+  return true;
+}
+
 long
 md_pcrel_from (fixS *fixP)
 {
--- a/gas/config/tc-riscv.h
+++ b/gas/config/tc-riscv.h
@@ -123,6 +123,10 @@ extern void riscv_elf_final_processing (
 /* Adjust debug_line after relaxation.  */
 #define DWARF2_USE_FIXED_ADVANCE_PC 1
 
+#define md_parse_name(name, exp, mode, c) \
+  riscv_parse_name (name, exp, mode)
+bool riscv_parse_name (const char *, struct expressionS *, enum expr_mode);
+
 #define md_finish riscv_md_finish
 #define CONVERT_SYMBOLIC_ATTRIBUTE riscv_convert_symbolic_attribute
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-13  8:02 ` [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols Jan Beulich
@ 2023-02-15  3:09   ` Nelson Chu
  2023-02-15  8:03     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Nelson Chu @ 2023-02-15  3:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

Though the logic becomes kind of complicated to me, you are more
familiar with other targets than me, so maybe this is the right way to
do it.  Basically, the logic looks correct and fine to me, but I still
need to run regressions in case of an accident.  Just make sure that -
1. The deferred_sym_rootP are the symbols that have the same names as
GPR (or FPR, VPR in the future maybe), but we still need to add them
into the symbol table?
2. Even if the expression has the same name as GPR, only
my_getSmallExpression is possible to set exp to O_register, but
my_getExpression won't have, so the probing_insn_operands is used to
distinguish between them?

Thanks
Nelson

On Mon, Feb 13, 2023 at 4:02 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> Special casing GPR names in my_getSmallExpression() leads to a number of
> inconsistencies. Generalize this by utilizing the md_parse_name() hook,
> limited to when instruction operands are being parsed (really: probed).
> Then both the GPR lookup there and the yet more ad hoc workaround for
> PR/gas 29940 can be removed (including its extension needed for making
> the compressed form JAL work again).
> ---
> When a floating point extension (but not [hfdq]inx) is enabled, floating
> point registers would perhaps better be recognized by riscv_parse_name()
> as well. Same for vector registers and likely also CSRs. Otherwise
> inconsistent behavior may (continue to) result when equates come into
> play, afaict.
>
> Considering equates, the new behavior isn't the only sensible one: We
> could also omit the symbol_find() call, thus never honoring equates.
> Then, however, even with the "i" insn suffix (or sometimes infix)
> explicitly specified, equates with names matching a GPR name (or, see
> above, potentially also other register names) wouldn't be accepted
> anymore. I guess the primary question here is what the intended (and
> consistent) behavior is to be.
>
> Overall the uses of my_getSmallExpression() vs my_getExpression() look
> pretty random to me: Is there any underlying principle when which of the
> two is expected to be used?

The my_getSmallExpression is used when we expect the immediate operand
with % prefix, and those % prefixes are used to insert different
relocations.  If there are other exceptions, then we need to fix them.

> --- a/gas/config/tc-riscv.c
> +++ b/gas/config/tc-riscv.c
> @@ -171,6 +171,8 @@ static enum float_abi float_abi = FLOAT_
>
>  static unsigned elf_flags = 0;
>
> +static unsigned probing_insn_operands;

Maybe add some comments to mention that "the probing_insn_operands is
used to distinguish between my_getSmallExpression and
my_getExpression", or other comments, is good for understanding.

> +
>  /* Set the default_isa_spec.  Return 0 if the spec isn't supported.
>     Otherwise, return 1.  */
>
> @@ -2228,21 +2230,9 @@ my_getSmallExpression (expressionS *ep,
>                        char *str, const struct percent_op_match *percent_op)
>  {
>    size_t reloc_index;
> -  unsigned crux_depth, str_depth, regno;
> +  unsigned crux_depth, str_depth;
>    char *crux;
>
> -  /* First, check for integer registers.  No callers can accept a reg, but
> -     we need to avoid accidentally creating a useless undefined symbol below,
> -     if this is an instruction pattern that can't match.  A glibc build fails
> -     if this is removed.  */
> -  if (reg_lookup (&str, RCLASS_GPR, &regno))
> -    {
> -      ep->X_op = O_register;
> -      ep->X_add_number = regno;
> -      expr_parse_end = str;
> -      return 0;
> -    }
> -
>    /* Search for the start of the main expression.
>
>       End the loop with CRUX pointing to the start of the main expression and
> @@ -2265,9 +2255,15 @@ my_getSmallExpression (expressionS *ep,
>          && reloc_index < 1
>          && parse_relocation (&str, reloc, percent_op));
>
> +  if (probing_insn_operands)
> +    ++probing_insn_operands;
> +
>    my_getExpression (ep, crux);
>    str = expr_parse_end;
>
> +  if (probing_insn_operands)
> +    --probing_insn_operands;
> +
>    /* Match every open bracket.  */
>    while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
>      if (*str++ == ')')
> @@ -2453,6 +2449,13 @@ riscv_is_priv_insn (insn_t insn)
>           || ((insn ^ MATCH_SFENCE_VM) & MASK_SFENCE_VM) == 0);
>  }
>
> +static symbolS *deferred_sym_rootP;
> +static symbolS *deferred_sym_lastP;
> +/* Since symbols can't easily be freed, try to recycle ones which weren't
> +   committed.  */
> +static symbolS *orphan_sym_rootP;
> +static symbolS *orphan_sym_lastP;
> +
>  /* This routine assembles an instruction into its binary format.  As a
>     side effect, it sets the global variable imm_reloc to the type of
>     relocation to do if one of the operands is an address expression.  */
> @@ -2488,6 +2491,8 @@ riscv_ip (char *str, struct riscv_cl_ins
>
>    insn = (struct riscv_opcode *) str_hash_find (hash, str);
>
> +  probing_insn_operands = 1;
> +
>    asargStart = asarg;
>    for ( ; insn && insn->name && strcmp (insn->name, str) == 0; insn++)
>      {
> @@ -2504,6 +2509,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>        /* Reset error message of the previous round.  */
>        error.msg = _("illegal operands");
>        error.missing_ext = NULL;
> +
> +      /* Purge deferred symbols from the previous round, if any.  */
> +      while (deferred_sym_rootP)
> +       {
> +         symbolS *sym = deferred_sym_rootP;
> +
> +         symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
> +         symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
> +                        &orphan_sym_lastP);
> +       }
> +
>        create_insn (ip, insn);
>
>        imm_expr->X_op = O_absent;
> @@ -2558,9 +2574,22 @@ riscv_ip (char *str, struct riscv_cl_ins
>                 }
>               if (*asarg != '\0')
>                 break;
> +
>               /* Successful assembly.  */
>               error.msg = NULL;
>               insn_with_csr = false;
> +
> +             /* Commit deferred symbols, if any.  */
> +             while (deferred_sym_rootP)
> +               {
> +                 symbolS *sym = deferred_sym_rootP;
> +
> +                 symbol_remove (sym, &deferred_sym_rootP,
> +                                &deferred_sym_lastP);
> +                 symbol_append (sym, symbol_lastP, &symbol_rootP,
> +                                &symbol_lastP);
> +                 symbol_table_insert (sym);
> +               }
>               goto out;
>
>             case 'C': /* RVC */
> @@ -2764,8 +2793,6 @@ riscv_ip (char *str, struct riscv_cl_ins
>                 case 'p':
>                   goto branch;
>                 case 'a':
> -                 if (oparg == insn->args + 1)
> -                   goto jump_check_gpr;
>                   goto jump;
>                 case 'S': /* Floating-point RS1 x8-x15.  */
>                   if (!reg_lookup (&asarg, RCLASS_FPR, &regno)
> @@ -3270,18 +3297,6 @@ riscv_ip (char *str, struct riscv_cl_ins
>               continue;
>
>             case 'a': /* 20-bit PC-relative offset.  */
> -             /* Like in my_getSmallExpression() we need to avoid emitting
> -                a stray undefined symbol if the 1st JAL entry doesn't match,
> -                but the 2nd (with 2 operands) might.  */
> -             if (oparg == insn->args)
> -               {
> -           jump_check_gpr:
> -                 asargStart = asarg;
> -                 if (reg_lookup (&asarg, RCLASS_GPR, NULL)
> -                     && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
> -                   break;
> -                 asarg = asargStart;
> -               }
>             jump:
>               my_getExpression (imm_expr, asarg);
>               asarg = expr_parse_end;
> @@ -3504,6 +3519,8 @@ riscv_ip (char *str, struct riscv_cl_ins
>    if (save_c)
>      *(asargStart  - 1) = save_c;
>
> +  probing_insn_operands = 0;
> +
>    return error;
>  }
>
> @@ -3800,6 +3817,62 @@ riscv_after_parse_args (void)
>      flag_dwarf_cie_version = 3;
>  }

The comment of ppc said "This function is called for each symbol seen
in an expression.  It handles the special parsing ...", looks good and
more easy to understand the logic, so could we also have a similar
description detailed?

> +bool riscv_parse_name (const char *name, struct expressionS *ep,
> +                      enum expr_mode mode)
> +{
> +  unsigned int regno;
> +
> +  if (!probing_insn_operands)
> +    return false;
> +
> +  gas_assert (mode == expr_normal);
> +
> +  regno = reg_lookup_internal (name, RCLASS_GPR);
> +  if (regno == (unsigned int)-1)
> +    return false;
> +
> +  if (symbol_find (name) != NULL)
> +    return false;
> +
> +  if (probing_insn_operands > 1)
> +    {
> +      ep->X_op = O_register;
> +      ep->X_add_number = regno;
> +    }
> +  else
> +    {
> +      /* Create a symbol without adding it to the symbol table yet.
> +        Insertion will happen only once we commit to using the insn
> +        we're probing operands for.  */
> +      symbolS *sym;
> +
> +      for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
> +       if (strcmp (name, S_GET_NAME (sym)) == 0)
> +         break;
> +      if (!sym)
> +       {
> +         for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
> +           if (strcmp (name, S_GET_NAME (sym)) == 0)
> +             {
> +               symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
> +               break;
> +             }
> +         if (!sym)
> +           sym = symbol_create (name, undefined_section,
> +                                &zero_address_frag, 0);
> +
> +         symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
> +                        &deferred_sym_lastP);
> +       }
> +
> +      ep->X_op = O_symbol;
> +      ep->X_add_symbol = sym;
> +      ep->X_add_number = 0;
> +    }
> +
> +  return true;
> +}
> +
>  long
>  md_pcrel_from (fixS *fixP)
>  {
> --- a/gas/config/tc-riscv.h
> +++ b/gas/config/tc-riscv.h
> @@ -123,6 +123,10 @@ extern void riscv_elf_final_processing (
>  /* Adjust debug_line after relaxation.  */
>  #define DWARF2_USE_FIXED_ADVANCE_PC 1
>
> +#define md_parse_name(name, exp, mode, c) \
> +  riscv_parse_name (name, exp, mode)
> +bool riscv_parse_name (const char *, struct expressionS *, enum expr_mode);
> +
>  #define md_finish riscv_md_finish
>  #define CONVERT_SYMBOLIC_ATTRIBUTE riscv_convert_symbolic_attribute
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-15  3:09   ` Nelson Chu
@ 2023-02-15  8:03     ` Jan Beulich
  2023-02-15  8:48       ` Nelson Chu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-02-15  8:03 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

On 15.02.2023 04:09, Nelson Chu wrote:
> Though the logic becomes kind of complicated to me, you are more
> familiar with other targets than me, so maybe this is the right way to
> do it.  Basically, the logic looks correct and fine to me, but I still
> need to run regressions in case of an accident.  Just make sure that -
> 1. The deferred_sym_rootP are the symbols that have the same names as
> GPR (or FPR, VPR in the future maybe), but we still need to add them
> into the symbol table?

Yes, with "may" inserted ahead of "need". We only need to if the probing
actually succeeds.

> 2. Even if the expression has the same name as GPR, only
> my_getSmallExpression is possible to set exp to O_register, but
> my_getExpression won't have, so the probing_insn_operands is used to
> distinguish between them?

Yes, albeit I find this distinction between the two functions suspicious
(and as said in a post-commit-message remark I'm also having trouble
spotting a pattern of when which of the two functions is [to be] used).
But I didn't want to affect existing behavior too much; iirc there was
at least one testcase which would break otherwise. As said elsewhere -
as a first step towards further improvements it would need settling on
what the intended behavior in various cases actually ought to be.

As to the role of probing_insn_operands: Its two possible non-zero
values distinguish these two cases; zero vs non-zero distinguish whether
we deal with insn operands vs directive (or alike) ones.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-15  8:03     ` Jan Beulich
@ 2023-02-15  8:48       ` Nelson Chu
  2023-02-15  9:21         ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Nelson Chu @ 2023-02-15  8:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

Thanks for the detailed clarify :-)

A quick update, I just use riscv-gnu-toolchain to build a linux toolchain with,
gcc (releases/gcc-12.2.0) + binutils (master + these two patches) +
glibc (release/2.37/master),
and then I get some unexpected errors from gcc testsuite, which won't
occurred without these patches.

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
     rv64gc/  lp64d/ medlow |  347 /    62 |   29 /     9 |    0 /     0 |

I haven't tested the elf regressions, and the whole linux regression,
but it seems that there are some unexpected situations that we have
not considered in this patch.

Reproduce commands,
1. git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git --recursive
2. need to install some tools before build,
https://github.com/riscv-collab/riscv-gnu-toolchain#prerequisites
3. git checkout branches of gcc, binutils and glibc to the branches
mentioned above
4. cd <patch-to-build-folder>
5. <patch-to-riscv-gnu-toolchain>configure
--prefix=<patch-to-build-folder>build-install --with-arch=rv64gc
6. make report-gcc-linux -j32

Thanks
Nelson

On Wed, Feb 15, 2023 at 4:03 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.02.2023 04:09, Nelson Chu wrote:
> > Though the logic becomes kind of complicated to me, you are more
> > familiar with other targets than me, so maybe this is the right way to
> > do it.  Basically, the logic looks correct and fine to me, but I still
> > need to run regressions in case of an accident.  Just make sure that -
> > 1. The deferred_sym_rootP are the symbols that have the same names as
> > GPR (or FPR, VPR in the future maybe), but we still need to add them
> > into the symbol table?
>
> Yes, with "may" inserted ahead of "need". We only need to if the probing
> actually succeeds.
>
> > 2. Even if the expression has the same name as GPR, only
> > my_getSmallExpression is possible to set exp to O_register, but
> > my_getExpression won't have, so the probing_insn_operands is used to
> > distinguish between them?
>
> Yes, albeit I find this distinction between the two functions suspicious
> (and as said in a post-commit-message remark I'm also having trouble
> spotting a pattern of when which of the two functions is [to be] used).
> But I didn't want to affect existing behavior too much; iirc there was
> at least one testcase which would break otherwise. As said elsewhere -
> as a first step towards further improvements it would need settling on
> what the intended behavior in various cases actually ought to be.
>
> As to the role of probing_insn_operands: Its two possible non-zero
> values distinguish these two cases; zero vs non-zero distinguish whether
> we deal with insn operands vs directive (or alike) ones.
>
> Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-15  8:48       ` Nelson Chu
@ 2023-02-15  9:21         ` Jan Beulich
  2023-02-16  9:36           ` Nelson Chu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-02-15  9:21 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

On 15.02.2023 09:48, Nelson Chu wrote:
> Thanks for the detailed clarify :-)
> 
> A quick update, I just use riscv-gnu-toolchain to build a linux toolchain with,
> gcc (releases/gcc-12.2.0) + binutils (master + these two patches) +
> glibc (release/2.37/master),
> and then I get some unexpected errors from gcc testsuite, which won't
> occurred without these patches.
> 
>                ========= Summary of gcc testsuite =========
>                             | # of unexpected case / # of unique unexpected case
>                             |          gcc |          g++ |     gfortran |
>      rv64gc/  lp64d/ medlow |  347 /    62 |   29 /     9 |    0 /     0 |
> 
> I haven't tested the elf regressions, and the whole linux regression,
> but it seems that there are some unexpected situations that we have
> not considered in this patch.

Hmm, this suggests to me that testcases are missing in the gas testsuite.

> Reproduce commands,
> 1. git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git --recursive
> 2. need to install some tools before build,
> https://github.com/riscv-collab/riscv-gnu-toolchain#prerequisites
> 3. git checkout branches of gcc, binutils and glibc to the branches
> mentioned above
> 4. cd <patch-to-build-folder>
> 5. <patch-to-riscv-gnu-toolchain>configure
> --prefix=<patch-to-build-folder>build-install --with-arch=rv64gc
> 6. make report-gcc-linux -j32

Instead of me trying to go through this (at least part of which looks like
it won't really fit my normal environment; among other things I would also
prefer to work from upstream gcc, not any RISC-V clone thereof), could you
provide me with the detailed gcc.log and g++.log from the run (I know
they're huge, so you may want to upload them somewhere, or you might simply
extract relevant fragments)? Of course if it would be the complete logs,
then I would also need to know which, if any, FAILs are pre-existing.

From what you write it also doesn't look as if glibc was necessary to
include.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-15  9:21         ` Jan Beulich
@ 2023-02-16  9:36           ` Nelson Chu
  2023-02-16 16:49             ` Jan Beulich
  2023-03-03 13:52             ` Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: Nelson Chu @ 2023-02-16  9:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

Yeah the logs are too big, so not sure where can be uploaded these
kind of huge files.  However, we should get the same result if we just
run the gcc upstream testsuites.  Seems many errors but a quick
review, most of them are,

*Assembler messages:
*Error: register value used as expression
...

Here are some details,

Executing on host:
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/
-march=rv64gc -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output
   -O1  -w -c   -o pr79780.o
/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.c-torture/compile/pr79780.c
   (timeout = 600)
spawn -ignore SIGHUP
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/
-march=rv64gc -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output
-O1 -w -c -o pr79780.o
/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.c-torture/compile/pr79780.c
/tmp/cc1LYHhd.s: Assembler messages:
/tmp/cc1LYHhd.s:17: Error: register value used as expression
/tmp/cc1LYHhd.s:18: Error: register value used as expression
/tmp/cc1LYHhd.s:28: Error: register value used as expression
/tmp/cc1LYHhd.s:29: Error: register value used as expression
/tmp/cc1LYHhd.s:54: Error: register value used as expression
/tmp/cc1LYHhd.s:55: Error: register value used as expression
/tmp/cc1LYHhd.s:61: Error: register value used as expression
/tmp/cc1LYHhd.s:62: Error: register value used as expression
/tmp/cc1LYHhd.s:65: Error: register value used as expression
compiler exited with status 1
FAIL: gcc.c-torture/compile/pr79780.c   -O1  (test for excess errors)
...
Executing on host:
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/
-march=rv64gc -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output
   -O0  -w -c   -o funcptr-1.o
/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.c-torture/compile/funcptr-1.c
   (timeout = 600)
spawn -ignore SIGHUP
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/
-march=rv64gc -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output
-O0 -w -c -o funcptr-1.o
/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.c-torture/compile/funcptr-1.c
/tmp/cc1RjV1g.s: Assembler messages:
/tmp/cc1RjV1g.s:34: Error: register value used as expression
/tmp/cc1RjV1g.s:35: Error: register value used as expression
compiler exited with status 1
FAIL: gcc.c-torture/compile/funcptr-1.c   -O0  (test for excess errors)
...
Executing on host:
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/../../xg++
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/../../
 -march=rv64gc -mabi=lp64d -mcmodel=medlow
-fdiagnostics-plain-output  -nostdinc++
-I/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/libstdc++-v3/include/riscv64-unknown-linux-gnu
-I/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/libstdc++-v3/include
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/libsupc++
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/include/backward
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=c++17 -DCXX14_VS_CXX17 -w
-I/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/g++.dg/compat
-Wno-abi   -c   -o cp_compat_y_tst.o
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/g++.dg-struct-layout-1//t032_y.C
   (timeout = 600)
spawn -ignore SIGHUP
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/../../xg++
-B/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/../../
-march=rv64gc -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output
-nostdinc++ -I/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/libstdc++-v3/include/riscv64-unknown-linux-gnu
-I/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/riscv64-unknown-linux-gnu/libstdc++-v3/include
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/libsupc++
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/include/backward
-I/scratch/riscv-gnu-toolchain/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=c++17 -DCXX14_VS_CXX17 -w
-I/scratch/riscv-gnu-toolchain/gcc/gcc/testsuite/g++.dg/compat
-Wno-abi -c -o cp_compat_y_tst.o
/scratch/build-toolchains/rv64gc-linux/build-gcc-linux-stage2/gcc/testsuite/g++/g++.dg-struct-layout-1//t032_y.C
/tmp/cci191w8.s: Assembler messages:
/tmp/cci191w8.s:50: Error: register value used as expression
/tmp/cci191w8.s:51: Error: register value used as expression
... too many...
compiler exited with status 1
FAIL: tmpdir-g++.dg-struct-layout-1/t032 cp_compat_y_tst.o compile
...

Sorry that I only extracted some small snippets, since there are too
many.  It would be better to just run the gcc testsuite again, with
any environments.

Hope these information are helpful, thanks
Nelson

On Wed, Feb 15, 2023 at 5:21 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.02.2023 09:48, Nelson Chu wrote:
> > Thanks for the detailed clarify :-)
> >
> > A quick update, I just use riscv-gnu-toolchain to build a linux toolchain with,
> > gcc (releases/gcc-12.2.0) + binutils (master + these two patches) +
> > glibc (release/2.37/master),
> > and then I get some unexpected errors from gcc testsuite, which won't
> > occurred without these patches.
> >
> >                ========= Summary of gcc testsuite =========
> >                             | # of unexpected case / # of unique unexpected case
> >                             |          gcc |          g++ |     gfortran |
> >      rv64gc/  lp64d/ medlow |  347 /    62 |   29 /     9 |    0 /     0 |
> >
> > I haven't tested the elf regressions, and the whole linux regression,
> > but it seems that there are some unexpected situations that we have
> > not considered in this patch.
>
> Hmm, this suggests to me that testcases are missing in the gas testsuite.
>
> > Reproduce commands,
> > 1. git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git --recursive
> > 2. need to install some tools before build,
> > https://github.com/riscv-collab/riscv-gnu-toolchain#prerequisites
> > 3. git checkout branches of gcc, binutils and glibc to the branches
> > mentioned above
> > 4. cd <patch-to-build-folder>
> > 5. <patch-to-riscv-gnu-toolchain>configure
> > --prefix=<patch-to-build-folder>build-install --with-arch=rv64gc
> > 6. make report-gcc-linux -j32
>
> Instead of me trying to go through this (at least part of which looks like
> it won't really fit my normal environment; among other things I would also
> prefer to work from upstream gcc, not any RISC-V clone thereof), could you
> provide me with the detailed gcc.log and g++.log from the run (I know
> they're huge, so you may want to upload them somewhere, or you might simply
> extract relevant fragments)? Of course if it would be the complete logs,
> then I would also need to know which, if any, FAILs are pre-existing.
>
> From what you write it also doesn't look as if glibc was necessary to
> include.
>
> Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-16  9:36           ` Nelson Chu
@ 2023-02-16 16:49             ` Jan Beulich
  2023-02-16 16:54               ` Palmer Dabbelt
  2023-03-03 13:52             ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-02-16 16:49 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

On 16.02.2023 10:36, Nelson Chu wrote:
> Yeah the logs are too big, so not sure where can be uploaded these
> kind of huge files.  However, we should get the same result if we just
> run the gcc upstream testsuites.  Seems many errors but a quick
> review, most of them are,
> 
> *Assembler messages:
> *Error: register value used as expression
> ...
> 
> Here are some details,

Thanks - knowing examples of failures and compiler options used is
sufficient indeed. And yes, taking one of the examples I can repro and
will hence be able to investigate. (As hinted at earlier, first thing
I'll do is add yet another testcase to the gas testsuite, as imo the
case should be covered there).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-16 16:49             ` Jan Beulich
@ 2023-02-16 16:54               ` Palmer Dabbelt
  0 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2023-02-16 16:54 UTC (permalink / raw)
  To: jbeulich; +Cc: nelson, binutils, Andrew Waterman, Jim Wilson

On Thu, 16 Feb 2023 08:49:36 PST (-0800), jbeulich@suse.com wrote:
> On 16.02.2023 10:36, Nelson Chu wrote:
>> Yeah the logs are too big, so not sure where can be uploaded these
>> kind of huge files.  However, we should get the same result if we just
>> run the gcc upstream testsuites.  Seems many errors but a quick
>> review, most of them are,
>>
>> *Assembler messages:
>> *Error: register value used as expression
>> ...
>>
>> Here are some details,
>
> Thanks - knowing examples of failures and compiler options used is
> sufficient indeed. And yes, taking one of the examples I can repro and
> will hence be able to investigate. (As hinted at earlier, first thing
> I'll do is add yet another testcase to the gas testsuite, as imo the
> case should be covered there).

Thanks for that, we didn't really add target-specific tests when we did 
the original binutils port and it bites us pretty frequently for stuff 
like this.  So the more we can get in there the better.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-02-16  9:36           ` Nelson Chu
  2023-02-16 16:49             ` Jan Beulich
@ 2023-03-03 13:52             ` Jan Beulich
  2023-03-15 10:07               ` Nelson Chu
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-03-03 13:52 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 8771 bytes --]

On 16.02.2023 10:36, Nelson Chu wrote:
> Yeah the logs are too big, so not sure where can be uploaded these
> kind of huge files.  However, we should get the same result if we just
> run the gcc upstream testsuites.  Seems many errors but a quick
> review, most of them are,
> 
> *Assembler messages:
> *Error: register value used as expression
> ...

Below the tentative updated patch. I haven't got around to seeing
whether I can sensibly run gcc's testsuite yet, though. My past
experience with doing so for cross builds was quite poor (but that
has been quite a while back) ...

Attached also the corresponding extended testcase (subject of patch 1
of the series), which I expect is good to have in any event.

Jan

RISC-V: adjust logic to avoid register name symbols

Special casing GPR names in my_getSmallExpression() leads to a number of
inconsistencies. Generalize this by utilizing the md_parse_name() hook,
limited to when instruction operands are being parsed (really: probed).
Then both the GPR lookup there and the yet more ad hoc workaround for
PR/gas 29940 can be removed (including its extension needed for making
the compressed form JAL work again).
---
When a floating point extension (but not [hfdq]inx) is enabled, floating
point registers would perhaps better be recognized by riscv_parse_name()
as well. Same for vector registers and likely also CSRs. Otherwise
inconsistent behavior may (continue to) result when equates come into
play, afaict.

Considering equates, the new behavior isn't the only sensible one: We
could also omit the symbol_find() call, thus never honoring equates.
Then, however, even with the "i" insn suffix (or sometimes infix)
explicitly specified, equates with names matching a GPR name (or, see
above, potentially also other register names) wouldn't be accepted
anymore. I guess the primary question here is what the intended (and
consistent) behavior is to be.

Overall the uses of my_getSmallExpression() vs my_getExpression() look
pretty random to me: Is there any underlying principle when which of the
two is expected to be used?
---
v2: Re-work to not break things like %hi(<register-name>).

--- a/gas/config/tc-riscv.c
+++ b/gas/config/tc-riscv.c
@@ -171,6 +171,8 @@ static enum float_abi float_abi = FLOAT_
 
 static unsigned elf_flags = 0;
 
+static bool probing_insn_operands;
+
 /* Set the default_isa_spec.  Return 0 if the spec isn't supported.
    Otherwise, return 1.  */
 
@@ -2228,21 +2230,10 @@ my_getSmallExpression (expressionS *ep,
 		       char *str, const struct percent_op_match *percent_op)
 {
   size_t reloc_index;
-  unsigned crux_depth, str_depth, regno;
+  unsigned crux_depth, str_depth;
+  bool orig_probing = probing_insn_operands;
   char *crux;
 
-  /* First, check for integer registers.  No callers can accept a reg, but
-     we need to avoid accidentally creating a useless undefined symbol below,
-     if this is an instruction pattern that can't match.  A glibc build fails
-     if this is removed.  */
-  if (reg_lookup (&str, RCLASS_GPR, &regno))
-    {
-      ep->X_op = O_register;
-      ep->X_add_number = regno;
-      expr_parse_end = str;
-      return 0;
-    }
-
   /* Search for the start of the main expression.
 
      End the loop with CRUX pointing to the start of the main expression and
@@ -2265,9 +2256,16 @@ my_getSmallExpression (expressionS *ep,
 	 && reloc_index < 1
 	 && parse_relocation (&str, reloc, percent_op));
 
+  /* Anything inside parentheses cannot be a register and hence can be
+     treated the same as operands to directives (other than .insn).  */
+  if (str_depth)
+    probing_insn_operands = false;
+
   my_getExpression (ep, crux);
   str = expr_parse_end;
 
+  probing_insn_operands = orig_probing;
+
   /* Match every open bracket.  */
   while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
     if (*str++ == ')')
@@ -2453,6 +2451,13 @@ riscv_is_priv_insn (insn_t insn)
 	  || ((insn ^ MATCH_SFENCE_VM) & MASK_SFENCE_VM) == 0);
 }
 
+static symbolS *deferred_sym_rootP;
+static symbolS *deferred_sym_lastP;
+/* Since symbols can't easily be freed, try to recycle ones which weren't
+   committed.  */
+static symbolS *orphan_sym_rootP;
+static symbolS *orphan_sym_lastP;
+
 /* This routine assembles an instruction into its binary format.  As a
    side effect, it sets the global variable imm_reloc to the type of
    relocation to do if one of the operands is an address expression.  */
@@ -2488,6 +2493,8 @@ riscv_ip (char *str, struct riscv_cl_ins
 
   insn = (struct riscv_opcode *) str_hash_find (hash, str);
 
+  probing_insn_operands = true;
+
   asargStart = asarg;
   for ( ; insn && insn->name && strcmp (insn->name, str) == 0; insn++)
     {
@@ -2504,6 +2511,17 @@ riscv_ip (char *str, struct riscv_cl_ins
       /* Reset error message of the previous round.  */
       error.msg = _("illegal operands");
       error.missing_ext = NULL;
+
+      /* Purge deferred symbols from the previous round, if any.  */
+      while (deferred_sym_rootP)
+	{
+	  symbolS *sym = deferred_sym_rootP;
+
+	  symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
+	  symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
+			 &orphan_sym_lastP);
+	}
+
       create_insn (ip, insn);
 
       imm_expr->X_op = O_absent;
@@ -2558,9 +2576,22 @@ riscv_ip (char *str, struct riscv_cl_ins
 		}
 	      if (*asarg != '\0')
 		break;
+
 	      /* Successful assembly.  */
 	      error.msg = NULL;
 	      insn_with_csr = false;
+
+	      /* Commit deferred symbols, if any.  */
+	      while (deferred_sym_rootP)
+		{
+		  symbolS *sym = deferred_sym_rootP;
+
+		  symbol_remove (sym, &deferred_sym_rootP,
+				 &deferred_sym_lastP);
+		  symbol_append (sym, symbol_lastP, &symbol_rootP,
+				 &symbol_lastP);
+		  symbol_table_insert (sym);
+		}
 	      goto out;
 
 	    case 'C': /* RVC */
@@ -2764,8 +2795,6 @@ riscv_ip (char *str, struct riscv_cl_ins
 		case 'p':
 		  goto branch;
 		case 'a':
-		  if (oparg == insn->args + 1)
-		    goto jump_check_gpr;
 		  goto jump;
 		case 'S': /* Floating-point RS1 x8-x15.  */
 		  if (!reg_lookup (&asarg, RCLASS_FPR, &regno)
@@ -3270,18 +3299,6 @@ riscv_ip (char *str, struct riscv_cl_ins
 	      continue;
 
 	    case 'a': /* 20-bit PC-relative offset.  */
-	      /* Like in my_getSmallExpression() we need to avoid emitting
-		 a stray undefined symbol if the 1st JAL entry doesn't match,
-		 but the 2nd (with 2 operands) might.  */
-	      if (oparg == insn->args)
-		{
-	    jump_check_gpr:
-		  asargStart = asarg;
-		  if (reg_lookup (&asarg, RCLASS_GPR, NULL)
-		      && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
-		    break;
-		  asarg = asargStart;
-		}
 	    jump:
 	      my_getExpression (imm_expr, asarg);
 	      asarg = expr_parse_end;
@@ -3504,6 +3521,8 @@ riscv_ip (char *str, struct riscv_cl_ins
   if (save_c)
     *(asargStart  - 1) = save_c;
 
+  probing_insn_operands = false;
+
   return error;
 }
 
@@ -3800,6 +3819,53 @@ riscv_after_parse_args (void)
     flag_dwarf_cie_version = 3;
 }
 
+bool riscv_parse_name (const char *name, struct expressionS *ep,
+		       enum expr_mode mode)
+{
+  unsigned int regno;
+  symbolS *sym;
+
+  if (!probing_insn_operands)
+    return false;
+
+  gas_assert (mode == expr_normal);
+
+  regno = reg_lookup_internal (name, RCLASS_GPR);
+  if (regno == (unsigned int)-1)
+    return false;
+
+  if (symbol_find (name) != NULL)
+    return false;
+
+  /* Create a symbol without adding it to the symbol table yet.
+     Insertion will happen only once we commit to using the insn
+     we're probing operands for.  */
+  for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
+    if (strcmp (name, S_GET_NAME (sym)) == 0)
+      break;
+  if (!sym)
+    {
+      for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
+	if (strcmp (name, S_GET_NAME (sym)) == 0)
+	  {
+	    symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
+	    break;
+	  }
+      if (!sym)
+	sym = symbol_create (name, undefined_section,
+			     &zero_address_frag, 0);
+
+      symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
+		     &deferred_sym_lastP);
+    }
+
+  ep->X_op = O_symbol;
+  ep->X_add_symbol = sym;
+  ep->X_add_number = 0;
+
+  return true;
+}
+
 long
 md_pcrel_from (fixS *fixP)
 {
--- a/gas/config/tc-riscv.h
+++ b/gas/config/tc-riscv.h
@@ -123,6 +123,10 @@ extern void riscv_elf_final_processing (
 /* Adjust debug_line after relaxation.  */
 #define DWARF2_USE_FIXED_ADVANCE_PC 1
 
+#define md_parse_name(name, exp, mode, c) \
+  riscv_parse_name (name, exp, mode)
+bool riscv_parse_name (const char *, struct expressionS *, enum expr_mode);
+
 #define md_finish riscv_md_finish
 #define CONVERT_SYMBOLIC_ATTRIBUTE riscv_convert_symbolic_attribute
 


[-- Attachment #2: binutils-master-RISCV-test-no-reg-syms.patch --]
[-- Type: text/plain, Size: 1120 bytes --]

RISC-V: test for expected / no unexpected symbols

Both the temporary workaround for PR/gas 29940 and the existing special
casing of GPRs in my_getSmallExpression() aren't really tested anywhere
(i.e. with the workarounds remove testing would still succeed). Nor is
there any test for uses of symbols with names matching GPRs, where such
is permitted. Before altering how this is to be dealt with, install two
testcases covering the expected behavior. (For now this includes only
known affected insns; re-ordering of entries in riscv_opcodes[] could,
however, yield more of them.)
---
v2: Also include LUI and LW.

--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms.d
@@ -0,0 +1,8 @@
+#as: -march=rv32i
+#nm: --
+
+0+ t start
+ +U x2
+ +U x4
+ +U x6
+ +U x8
--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms.s
@@ -0,0 +1,10 @@
+	.text
+start:
+	and	x8, x8, x1
+	j	x2
+	jal	x3, x4
+	lui	x5, %hi(x6)
+	lw	x7, %lo(x8)(x9)
+	sll	x1, x1, x10
+	sra	x8, x8, x11
+	srl	x8, x8, x12
--- /dev/null
+++ b/gas/testsuite/gas/riscv/reg-syms-C.d
@@ -0,0 +1,4 @@
+#as: -march=rv32ic
+#source: reg-syms.s
+#nm: --
+#dump: reg-syms.d

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-03-03 13:52             ` Jan Beulich
@ 2023-03-15 10:07               ` Nelson Chu
  2023-03-15 11:49                 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Nelson Chu @ 2023-03-15 10:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

My regression schedule was a bit full lately, so I just finished
running for this patch now.  Everything looks good when running the
gcc/binutils regression with this patch in riscv-gnu-toolchain, the
errors mentioned last time should be all resolved perfectly, so LGTM.
Thanks for continuing to fix this issue.

Nelson

On Fri, Mar 3, 2023 at 9:53 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.02.2023 10:36, Nelson Chu wrote:
> > Yeah the logs are too big, so not sure where can be uploaded these
> > kind of huge files.  However, we should get the same result if we just
> > run the gcc upstream testsuites.  Seems many errors but a quick
> > review, most of them are,
> >
> > *Assembler messages:
> > *Error: register value used as expression
> > ...
>
> Below the tentative updated patch. I haven't got around to seeing
> whether I can sensibly run gcc's testsuite yet, though. My past
> experience with doing so for cross builds was quite poor (but that
> has been quite a while back) ...
>
> Attached also the corresponding extended testcase (subject of patch 1
> of the series), which I expect is good to have in any event.
>
> Jan
>
> RISC-V: adjust logic to avoid register name symbols
>
> Special casing GPR names in my_getSmallExpression() leads to a number of
> inconsistencies. Generalize this by utilizing the md_parse_name() hook,
> limited to when instruction operands are being parsed (really: probed).
> Then both the GPR lookup there and the yet more ad hoc workaround for
> PR/gas 29940 can be removed (including its extension needed for making
> the compressed form JAL work again).
> ---
> When a floating point extension (but not [hfdq]inx) is enabled, floating
> point registers would perhaps better be recognized by riscv_parse_name()
> as well. Same for vector registers and likely also CSRs. Otherwise
> inconsistent behavior may (continue to) result when equates come into
> play, afaict.
>
> Considering equates, the new behavior isn't the only sensible one: We
> could also omit the symbol_find() call, thus never honoring equates.
> Then, however, even with the "i" insn suffix (or sometimes infix)
> explicitly specified, equates with names matching a GPR name (or, see
> above, potentially also other register names) wouldn't be accepted
> anymore. I guess the primary question here is what the intended (and
> consistent) behavior is to be.
>
> Overall the uses of my_getSmallExpression() vs my_getExpression() look
> pretty random to me: Is there any underlying principle when which of the
> two is expected to be used?
> ---
> v2: Re-work to not break things like %hi(<register-name>).
>
> --- a/gas/config/tc-riscv.c
> +++ b/gas/config/tc-riscv.c
> @@ -171,6 +171,8 @@ static enum float_abi float_abi = FLOAT_
>
>  static unsigned elf_flags = 0;
>
> +static bool probing_insn_operands;
> +
>  /* Set the default_isa_spec.  Return 0 if the spec isn't supported.
>     Otherwise, return 1.  */
>
> @@ -2228,21 +2230,10 @@ my_getSmallExpression (expressionS *ep,
>                        char *str, const struct percent_op_match *percent_op)
>  {
>    size_t reloc_index;
> -  unsigned crux_depth, str_depth, regno;
> +  unsigned crux_depth, str_depth;
> +  bool orig_probing = probing_insn_operands;
>    char *crux;
>
> -  /* First, check for integer registers.  No callers can accept a reg, but
> -     we need to avoid accidentally creating a useless undefined symbol below,
> -     if this is an instruction pattern that can't match.  A glibc build fails
> -     if this is removed.  */
> -  if (reg_lookup (&str, RCLASS_GPR, &regno))
> -    {
> -      ep->X_op = O_register;
> -      ep->X_add_number = regno;
> -      expr_parse_end = str;
> -      return 0;
> -    }
> -
>    /* Search for the start of the main expression.
>
>       End the loop with CRUX pointing to the start of the main expression and
> @@ -2265,9 +2256,16 @@ my_getSmallExpression (expressionS *ep,
>          && reloc_index < 1
>          && parse_relocation (&str, reloc, percent_op));
>
> +  /* Anything inside parentheses cannot be a register and hence can be
> +     treated the same as operands to directives (other than .insn).  */
> +  if (str_depth)
> +    probing_insn_operands = false;
> +
>    my_getExpression (ep, crux);
>    str = expr_parse_end;
>
> +  probing_insn_operands = orig_probing;
> +
>    /* Match every open bracket.  */
>    while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
>      if (*str++ == ')')
> @@ -2453,6 +2451,13 @@ riscv_is_priv_insn (insn_t insn)
>           || ((insn ^ MATCH_SFENCE_VM) & MASK_SFENCE_VM) == 0);
>  }
>
> +static symbolS *deferred_sym_rootP;
> +static symbolS *deferred_sym_lastP;
> +/* Since symbols can't easily be freed, try to recycle ones which weren't
> +   committed.  */
> +static symbolS *orphan_sym_rootP;
> +static symbolS *orphan_sym_lastP;
> +
>  /* This routine assembles an instruction into its binary format.  As a
>     side effect, it sets the global variable imm_reloc to the type of
>     relocation to do if one of the operands is an address expression.  */
> @@ -2488,6 +2493,8 @@ riscv_ip (char *str, struct riscv_cl_ins
>
>    insn = (struct riscv_opcode *) str_hash_find (hash, str);
>
> +  probing_insn_operands = true;
> +
>    asargStart = asarg;
>    for ( ; insn && insn->name && strcmp (insn->name, str) == 0; insn++)
>      {
> @@ -2504,6 +2511,17 @@ riscv_ip (char *str, struct riscv_cl_ins
>        /* Reset error message of the previous round.  */
>        error.msg = _("illegal operands");
>        error.missing_ext = NULL;
> +
> +      /* Purge deferred symbols from the previous round, if any.  */
> +      while (deferred_sym_rootP)
> +       {
> +         symbolS *sym = deferred_sym_rootP;
> +
> +         symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
> +         symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
> +                        &orphan_sym_lastP);
> +       }
> +
>        create_insn (ip, insn);
>
>        imm_expr->X_op = O_absent;
> @@ -2558,9 +2576,22 @@ riscv_ip (char *str, struct riscv_cl_ins
>                 }
>               if (*asarg != '\0')
>                 break;
> +
>               /* Successful assembly.  */
>               error.msg = NULL;
>               insn_with_csr = false;
> +
> +             /* Commit deferred symbols, if any.  */
> +             while (deferred_sym_rootP)
> +               {
> +                 symbolS *sym = deferred_sym_rootP;
> +
> +                 symbol_remove (sym, &deferred_sym_rootP,
> +                                &deferred_sym_lastP);
> +                 symbol_append (sym, symbol_lastP, &symbol_rootP,
> +                                &symbol_lastP);
> +                 symbol_table_insert (sym);
> +               }
>               goto out;
>
>             case 'C': /* RVC */
> @@ -2764,8 +2795,6 @@ riscv_ip (char *str, struct riscv_cl_ins
>                 case 'p':
>                   goto branch;
>                 case 'a':
> -                 if (oparg == insn->args + 1)
> -                   goto jump_check_gpr;
>                   goto jump;
>                 case 'S': /* Floating-point RS1 x8-x15.  */
>                   if (!reg_lookup (&asarg, RCLASS_FPR, &regno)
> @@ -3270,18 +3299,6 @@ riscv_ip (char *str, struct riscv_cl_ins
>               continue;
>
>             case 'a': /* 20-bit PC-relative offset.  */
> -             /* Like in my_getSmallExpression() we need to avoid emitting
> -                a stray undefined symbol if the 1st JAL entry doesn't match,
> -                but the 2nd (with 2 operands) might.  */
> -             if (oparg == insn->args)
> -               {
> -           jump_check_gpr:
> -                 asargStart = asarg;
> -                 if (reg_lookup (&asarg, RCLASS_GPR, NULL)
> -                     && (*asarg == ',' || (ISSPACE (*asarg) && asarg[1] == ',')))
> -                   break;
> -                 asarg = asargStart;
> -               }
>             jump:
>               my_getExpression (imm_expr, asarg);
>               asarg = expr_parse_end;
> @@ -3504,6 +3521,8 @@ riscv_ip (char *str, struct riscv_cl_ins
>    if (save_c)
>      *(asargStart  - 1) = save_c;
>
> +  probing_insn_operands = false;
> +
>    return error;
>  }
>
> @@ -3800,6 +3819,53 @@ riscv_after_parse_args (void)
>      flag_dwarf_cie_version = 3;
>  }
>
> +bool riscv_parse_name (const char *name, struct expressionS *ep,
> +                      enum expr_mode mode)
> +{
> +  unsigned int regno;
> +  symbolS *sym;
> +
> +  if (!probing_insn_operands)
> +    return false;
> +
> +  gas_assert (mode == expr_normal);
> +
> +  regno = reg_lookup_internal (name, RCLASS_GPR);
> +  if (regno == (unsigned int)-1)
> +    return false;
> +
> +  if (symbol_find (name) != NULL)
> +    return false;
> +
> +  /* Create a symbol without adding it to the symbol table yet.
> +     Insertion will happen only once we commit to using the insn
> +     we're probing operands for.  */
> +  for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
> +    if (strcmp (name, S_GET_NAME (sym)) == 0)
> +      break;
> +  if (!sym)
> +    {
> +      for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
> +       if (strcmp (name, S_GET_NAME (sym)) == 0)
> +         {
> +           symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
> +           break;
> +         }
> +      if (!sym)
> +       sym = symbol_create (name, undefined_section,
> +                            &zero_address_frag, 0);
> +
> +      symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
> +                    &deferred_sym_lastP);
> +    }
> +
> +  ep->X_op = O_symbol;
> +  ep->X_add_symbol = sym;
> +  ep->X_add_number = 0;
> +
> +  return true;
> +}
> +
>  long
>  md_pcrel_from (fixS *fixP)
>  {
> --- a/gas/config/tc-riscv.h
> +++ b/gas/config/tc-riscv.h
> @@ -123,6 +123,10 @@ extern void riscv_elf_final_processing (
>  /* Adjust debug_line after relaxation.  */
>  #define DWARF2_USE_FIXED_ADVANCE_PC 1
>
> +#define md_parse_name(name, exp, mode, c) \
> +  riscv_parse_name (name, exp, mode)
> +bool riscv_parse_name (const char *, struct expressionS *, enum expr_mode);
> +
>  #define md_finish riscv_md_finish
>  #define CONVERT_SYMBOLIC_ATTRIBUTE riscv_convert_symbolic_attribute
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-03-15 10:07               ` Nelson Chu
@ 2023-03-15 11:49                 ` Jan Beulich
  2023-03-16  0:30                   ` Nelson Chu
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-03-15 11:49 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

On 15.03.2023 11:07, Nelson Chu wrote:
> My regression schedule was a bit full lately, so I just finished
> running for this patch now.  Everything looks good when running the
> gcc/binutils regression with this patch in riscv-gnu-toolchain, the
> errors mentioned last time should be all resolved perfectly, so LGTM.

Thanks for double checking. The two patches meanwhile have become part
of a larger series, though (submitted on the 10th). I could pull this
pair ahead and commit them, but I'm not sure that's going to be best.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-03-15 11:49                 ` Jan Beulich
@ 2023-03-16  0:30                   ` Nelson Chu
  2023-04-19  9:37                     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Nelson Chu @ 2023-03-16  0:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

Oh, sorry I see that these 2 will be part of the new 7 series patches,
which I haven't seen in detail yet.  I will check the new series ASAP.

Thanks for the reminder
Nelson

On Wed, Mar 15, 2023 at 7:49 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.03.2023 11:07, Nelson Chu wrote:
> > My regression schedule was a bit full lately, so I just finished
> > running for this patch now.  Everything looks good when running the
> > gcc/binutils regression with this patch in riscv-gnu-toolchain, the
> > errors mentioned last time should be all resolved perfectly, so LGTM.
>
> Thanks for double checking. The two patches meanwhile have become part
> of a larger series, though (submitted on the 10th). I could pull this
> pair ahead and commit them, but I'm not sure that's going to be best.
>
> Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols
  2023-03-16  0:30                   ` Nelson Chu
@ 2023-04-19  9:37                     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-04-19  9:37 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils, Palmer Dabbelt, Andrew Waterman, Jim Wilson

On 16.03.2023 01:30, Nelson Chu wrote:
> Oh, sorry I see that these 2 will be part of the new 7 series patches,
> which I haven't seen in detail yet.  I will check the new series ASAP.

Mind me asking about that series, now that a month has passed?

Jan

> On Wed, Mar 15, 2023 at 7:49 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 15.03.2023 11:07, Nelson Chu wrote:
>>> My regression schedule was a bit full lately, so I just finished
>>> running for this patch now.  Everything looks good when running the
>>> gcc/binutils regression with this patch in riscv-gnu-toolchain, the
>>> errors mentioned last time should be all resolved perfectly, so LGTM.
>>
>> Thanks for double checking. The two patches meanwhile have become part
>> of a larger series, though (submitted on the 10th). I could pull this
>> pair ahead and commit them, but I'm not sure that's going to be best.
>>
>> Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-04-19  9:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13  8:01 [PATCH 0/2] RISC-V/gas: re-work register named symbols avoidance logic Jan Beulich
2023-02-13  8:02 ` [PATCH 1/2] RISC-V: test for expected / no unexpected symbols Jan Beulich
2023-02-13  8:02 ` [PATCH 2/2] RISC-V: adjust logic to avoid register name symbols Jan Beulich
2023-02-15  3:09   ` Nelson Chu
2023-02-15  8:03     ` Jan Beulich
2023-02-15  8:48       ` Nelson Chu
2023-02-15  9:21         ` Jan Beulich
2023-02-16  9:36           ` Nelson Chu
2023-02-16 16:49             ` Jan Beulich
2023-02-16 16:54               ` Palmer Dabbelt
2023-03-03 13:52             ` Jan Beulich
2023-03-15 10:07               ` Nelson Chu
2023-03-15 11:49                 ` Jan Beulich
2023-03-16  0:30                   ` Nelson Chu
2023-04-19  9:37                     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).