public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing
@ 2023-11-14 17:58 David Faust
  2023-11-14 17:58 ` [PATCH 1/2] gas: add symbol_table_remove David Faust
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: David Faust @ 2023-11-14 17:58 UTC (permalink / raw)
  To: binutils; +Cc: jose.marchesi

To support the "pseudo-C" asm dialect in BPF, the BPF parser must
often attempt multiple different templates for a single instruction.
In some cases, this can lead to a call to expression () which creates a
symbol, and only later the template is determined not to match and the
expression is discarded.

However, symbols created during this process are added to the symbol
table and are not removed if the expression is discarded.

This is a problem for BPF: generally the assembled object will be
loaded directly by the Linux kernel, without being linked.  The kernel
BPF loader requires that BTF information is available for every symbol
in a loaded BPF program, which will not be available for these symbols
erroneously created by the parser.

Patch 1 adds a symbol_table_remove () function to symbols.c and
exposes it in the header, since symbol_remove () is not sufficient to
prevent such symbols being written in the symbol table of the resulting
ELF object.

Patch 2 detects cases in the BPF parser where a call to expression ()
created any new symbol(s), but the parsing of the instruction as a
whole against the current template failed.  In that case the created
symbols are deleted.

David Faust (2):
  gas: add symbol_table_remove
  bpf: remove symbols created during failed parse

 gas/config/tc-bpf.c                     | 30 +++++++++++++++++++++++++
 gas/symbols.c                           | 10 +++++++++
 gas/symbols.h                           |  1 +
 gas/testsuite/gas/bpf/asm-extra-sym-1.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-1.s |  1 +
 gas/testsuite/gas/bpf/asm-extra-sym-2.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-2.s |  8 +++++++
 gas/testsuite/gas/bpf/bpf.exp           |  4 ++++
 8 files changed, 68 insertions(+)
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.s
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.s

-- 
2.42.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] gas: add symbol_table_remove
  2023-11-14 17:58 [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
@ 2023-11-14 17:58 ` David Faust
  2023-11-14 17:58 ` [PATCH 2/2] bpf: remove symbols created during failed parse David Faust
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: David Faust @ 2023-11-14 17:58 UTC (permalink / raw)
  To: binutils; +Cc: jose.marchesi

This patch adds a symbol_table_remove () function to symbols.c which
mirorrs symbol_table_add (), and exposes it in the header.

gas/

	* symbols.c (symbol_table_remove): New function.
	* symbols.h (symbol_table_remove): Prototype.
---
 gas/symbols.c | 10 ++++++++++
 gas/symbols.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/gas/symbols.c b/gas/symbols.c
index 45e46ed39b7..e94fa61b56e 100644
--- a/gas/symbols.c
+++ b/gas/symbols.c
@@ -722,6 +722,16 @@ symbol_table_insert (symbolS *symbolP)
 
   htab_insert (sy_hash, symbolP, 1);
 }
+
+/* Remove a symbol from the symbol table.  */
+
+void
+symbol_table_remove (symbolS *symbolP)
+{
+  know (symbolP);
+
+  htab_remove_elt (sy_hash, symbolP);
+}
 \f
 /* If a symbol name does not exist, create it as undefined, and insert
    it into the symbol table.  Return a pointer to it.  */
diff --git a/gas/symbols.h b/gas/symbols.h
index 46425c97d79..90cefaa0402 100644
--- a/gas/symbols.h
+++ b/gas/symbols.h
@@ -71,6 +71,7 @@ void symbol_end (void);
 void dot_symbol_init (void);
 void symbol_print_statistics (FILE *);
 void symbol_table_insert (symbolS * symbolP);
+void symbol_table_remove (symbolS * symbolP);
 valueT resolve_symbol_value (symbolS *);
 void resolve_local_symbol_values (void);
 int snapshot_symbol (symbolS **, valueT *, segT *, fragS **);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] bpf: remove symbols created during failed parse
  2023-11-14 17:58 [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
  2023-11-14 17:58 ` [PATCH 1/2] gas: add symbol_table_remove David Faust
@ 2023-11-14 17:58 ` David Faust
  2023-11-14 22:13 ` [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
  2023-11-15  9:49 ` Jan Beulich
  3 siblings, 0 replies; 6+ messages in thread
From: David Faust @ 2023-11-14 17:58 UTC (permalink / raw)
  To: binutils; +Cc: jose.marchesi

Parsing the BPF pseudo-c asm syntax requires attempting to parse an
instruction using a template that may later be determined to not match.
During this parsing, a call to expression () may end up creating one or
more symbols.  If the parsed instruction is later determined to not
match the template, then any symbols created during this process should
be discarded.

If such unused symbols are not discarded, they impede the loading of the
resulting BPF object by the Linux kernel.

gas/

	* config/tc-bpf.c (last_parsed_expr, old_symbol_lastP): New.
	(parse_expression): Track last_parsed_expr and old_symbol_lastP.
	(parse_error): Cleanup symbols created during a failed parse.
	* testsuite/gas/bpf/asm-extra-sym-1.d: New.
	* testsuite/gas/bpf/asm-extra-sym-1.s: New.
	* testsuite/gas/bpf/asm-extra-sym-2.d: New.
	* testsuite/gas/bpf/asm-extra-sym-2.s: New.
	* testsuite/gas/bpf/bpf.exp: Run new tests.
---
 gas/config/tc-bpf.c                     | 30 +++++++++++++++++++++++++
 gas/testsuite/gas/bpf/asm-extra-sym-1.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-1.s |  1 +
 gas/testsuite/gas/bpf/asm-extra-sym-2.d |  7 ++++++
 gas/testsuite/gas/bpf/asm-extra-sym-2.s |  8 +++++++
 gas/testsuite/gas/bpf/bpf.exp           |  4 ++++
 6 files changed, 57 insertions(+)
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.s
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.d
 create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-2.s

diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
index fd4144a354b..d64576415e1 100644
--- a/gas/config/tc-bpf.c
+++ b/gas/config/tc-bpf.c
@@ -1223,6 +1223,8 @@ add_relaxed_insn (struct bpf_insn *insn, expressionS *exp)
    See md_operand below to see how exp_parse_failed is used.  */
 
 static int exp_parse_failed = 0;
+static expressionS *last_parsed_expr = NULL;
+static symbolS *old_symbol_lastP = NULL;
 
 static char *
 parse_expression (char *s, expressionS *exp)
@@ -1232,10 +1234,13 @@ parse_expression (char *s, expressionS *exp)
 
   exp_parse_failed = 0;
   input_line_pointer = s;
+  old_symbol_lastP = symbol_lastP;
   expression (exp);
   s = input_line_pointer;
   input_line_pointer = saved_input_line_pointer;
 
+  last_parsed_expr = exp;
+
   switch (exp->X_op == O_absent || exp_parse_failed)
     return NULL;
 
@@ -1317,6 +1322,25 @@ parse_error (int length, const char *fmt, ...)
       va_end (args);
       partial_match_length = length;
     }
+
+  /* Cleanup any symbols created during the failed parsing.  */
+  if (last_parsed_expr
+      && (last_parsed_expr->X_add_symbol || last_parsed_expr->X_op_symbol))
+    {
+      /* NOTE: this logic exploits the implementation detail that a symbol
+	 created by expression () during parsing is appended to the list
+	 rather than potentially being inserted somewhere in the middle.  */
+      symbolS *sym = symbol_lastP;
+      while (sym != old_symbol_lastP)
+	{
+	  /* Must have created at least one symbol.  */
+	  symbol_remove (sym, &symbol_rootP, &symbol_lastP);
+	  symbol_table_remove (sym);
+	  sym = symbol_lastP;
+	}
+
+      old_symbol_lastP = symbol_lastP;
+    }
 }
 
 /* Assemble a machine instruction in STR and emit the frags/bytes it
@@ -1368,6 +1392,12 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
       if (opcode->version > isa_spec)
         continue;
 
+      /* Track expression parsed while trying this opcode.  If this turns
+	 out to be the wrong opcode, we need to undo side effects of the
+	 expression parsing, such as creating a new undefined symbol.
+	 Set by parse_expression () and used by parse_error ().  */
+      last_parsed_expr = NULL;
+
       memset (&insn, 0, sizeof (struct bpf_insn));
       insn.size = 8;
       for (s = str, p = template; *p != '\0';)
diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.d b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
new file mode 100644
index 00000000000..56bdb7082f5
--- /dev/null
+++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
@@ -0,0 +1,7 @@
+#as: -EL -mdialect=pseudoc
+#nm: --numeric-sort
+#source: asm-extra-sym-1.s
+#name: BPF pseudoc no extra symbols 1
+
+# Note: there should be no output from nm.
+# Previously a bug created an UND '*' symbol.
diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.s b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
new file mode 100644
index 00000000000..2cfa605a259
--- /dev/null
+++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
@@ -0,0 +1 @@
+    r2 = *(u32*)(r1 + 8)
diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-2.d b/gas/testsuite/gas/bpf/asm-extra-sym-2.d
new file mode 100644
index 00000000000..e17ae0f2422
--- /dev/null
+++ b/gas/testsuite/gas/bpf/asm-extra-sym-2.d
@@ -0,0 +1,7 @@
+#as: -EL -mdialect=pseudoc
+#nm: --numeric-sort
+#source: asm-extra-sym-2.s
+#name: BPF pseudoc no extra symbols 2
+
+[0-9a-f]+ t main
+[0-9a-f]+ t foo
diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-2.s b/gas/testsuite/gas/bpf/asm-extra-sym-2.s
new file mode 100644
index 00000000000..ccbf43065d9
--- /dev/null
+++ b/gas/testsuite/gas/bpf/asm-extra-sym-2.s
@@ -0,0 +1,8 @@
+
+    .text
+main:
+    call foo
+    call foo
+foo:
+    r1 = 1
+    exit
diff --git a/gas/testsuite/gas/bpf/bpf.exp b/gas/testsuite/gas/bpf/bpf.exp
index 80f5a1dbc2d..680b8dbdb10 100644
--- a/gas/testsuite/gas/bpf/bpf.exp
+++ b/gas/testsuite/gas/bpf/bpf.exp
@@ -72,4 +72,8 @@ if {[istarget bpf*-*-*]} {
     run_dump_test disp16-overflow-relax
     run_dump_test disp32-overflow
     run_dump_test imm32-overflow
+
+    # Test that parser does not create undefined symbols
+    run_dump_test asm-extra-sym-1
+    run_dump_test asm-extra-sym-2
 }
-- 
2.42.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing
  2023-11-14 17:58 [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
  2023-11-14 17:58 ` [PATCH 1/2] gas: add symbol_table_remove David Faust
  2023-11-14 17:58 ` [PATCH 2/2] bpf: remove symbols created during failed parse David Faust
@ 2023-11-14 22:13 ` David Faust
  2023-11-15  9:49 ` Jan Beulich
  3 siblings, 0 replies; 6+ messages in thread
From: David Faust @ 2023-11-14 22:13 UTC (permalink / raw)
  To: binutils



On 11/14/23 09:58, David Faust wrote:
...
> 
> Patch 1 adds a symbol_table_remove () function to symbols.c and
> exposes it in the header, since symbol_remove () is not sufficient to
> prevent such symbols being written in the symbol table of the resulting
> ELF object.

Hm, I misspoke here and inverted the reasoning.  To correct myself:

symbol_remove () is sufficient to prevent the symbol from being emitted.

But it does not remove the symbol from the hash table used by the
symbol_find () family of routines.  That means later calls to e.g.
symbol_find_or_make () will return a reference to the old (removed)
symbol.  The symbol will not be re-added to the symbol list, so it will
not be emitted, even in cases where it is real and needed.

Adding and using symbol_table_remove () allows a removed symbol to be
potentially re-added later.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing
  2023-11-14 17:58 [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
                   ` (2 preceding siblings ...)
  2023-11-14 22:13 ` [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
@ 2023-11-15  9:49 ` Jan Beulich
  2023-11-15 21:57   ` David Faust
  3 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2023-11-15  9:49 UTC (permalink / raw)
  To: David Faust; +Cc: jose.marchesi, binutils

On 14.11.2023 18:58, David Faust wrote:
> To support the "pseudo-C" asm dialect in BPF, the BPF parser must
> often attempt multiple different templates for a single instruction.
> In some cases, this can lead to a call to expression () which creates a
> symbol, and only later the template is determined not to match and the
> expression is discarded.
> 
> However, symbols created during this process are added to the symbol
> table and are not removed if the expression is discarded.
> 
> This is a problem for BPF: generally the assembled object will be
> loaded directly by the Linux kernel, without being linked.  The kernel
> BPF loader requires that BTF information is available for every symbol
> in a loaded BPF program, which will not be available for these symbols
> erroneously created by the parser.
> 
> Patch 1 adds a symbol_table_remove () function to symbols.c and
> exposes it in the header, since symbol_remove () is not sufficient to
> prevent such symbols being written in the symbol table of the resulting
> ELF object.
> 
> Patch 2 detects cases in the BPF parser where a call to expression ()
> created any new symbol(s), but the parsing of the instruction as a
> whole against the current template failed.  In that case the created
> symbols are deleted.

While this may be a workaround (and perhaps even a viable one), I think
it would be better to suppress symbol table insertion in the first place.
I had to solve a somewhat related issue for RISC-V (and I expect MIPS
would want to also be switched to a similar approach), see 7a29ee290307
("RISC-V: adjust logic to avoid register name symbols"). I've looked at
the testcases added by patch 2. While the first one's purpose is clear
(thanks to the comment there), I can't really figure what the 2nd aims to
test. Which may mean that I'm not properly understanding what (set of)
condition(s) is/are involved here, and hence whether using one or more of
the existing target hooks can indeed help here (without needing to
transiently insert any symbols, and then having target-specific code
depend on how exactly symbols are inserted).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing
  2023-11-15  9:49 ` Jan Beulich
@ 2023-11-15 21:57   ` David Faust
  0 siblings, 0 replies; 6+ messages in thread
From: David Faust @ 2023-11-15 21:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jose.marchesi, binutils


On 11/15/23 01:49, Jan Beulich wrote:
> On 14.11.2023 18:58, David Faust wrote:
>> To support the "pseudo-C" asm dialect in BPF, the BPF parser must
>> often attempt multiple different templates for a single instruction.
>> In some cases, this can lead to a call to expression () which creates a
>> symbol, and only later the template is determined not to match and the
>> expression is discarded.
>>
>> However, symbols created during this process are added to the symbol
>> table and are not removed if the expression is discarded.
>>
>> This is a problem for BPF: generally the assembled object will be
>> loaded directly by the Linux kernel, without being linked.  The kernel
>> BPF loader requires that BTF information is available for every symbol
>> in a loaded BPF program, which will not be available for these symbols
>> erroneously created by the parser.
>>
>> Patch 1 adds a symbol_table_remove () function to symbols.c and
>> exposes it in the header, since symbol_remove () is not sufficient to
>> prevent such symbols being written in the symbol table of the resulting
>> ELF object.
>>
>> Patch 2 detects cases in the BPF parser where a call to expression ()
>> created any new symbol(s), but the parsing of the instruction as a
>> whole against the current template failed.  In that case the created
>> symbols are deleted.
> 
> While this may be a workaround (and perhaps even a viable one), I think
> it would be better to suppress symbol table insertion in the first place.
> I had to solve a somewhat related issue for RISC-V (and I expect MIPS
> would want to also be switched to a similar approach), see 7a29ee290307
> ("RISC-V: adjust logic to avoid register name symbols"). I've looked at
> the testcases added by patch 2. While the first one's purpose is clear
> (thanks to the comment there), I can't really figure what the 2nd aims to
> test. Which may mean that I'm not properly understanding what (set of)
> condition(s) is/are involved here, and hence whether using one or more of
> the existing target hooks can indeed help here (without needing to
> transiently insert any symbols, and then having target-specific code
> depend on how exactly symbols are inserted).

Hi Jan,
Thank you for the review. This is very helpful.

I agree that it is be better to suppress symbol table insertion. To be
honest, I did not think of a good way to do it before this workaround.
So, thank you very much for the pointer to 7a29ee290307 because that is
certainly a nicer approach :). I think something similar to that will
work for BPF too, and I plan to rewrite the patch to use that approach.

As for the 2nd test, before adding and using symbol_table_remove ()
it would fail because the symbol 'foo' was not emitted in the object.
This happened because a failed instruction parse would now remove
any symbol created while parsing the instruction, including the real and
used 'foo'. The symbol was never "un-removed", because it remained in
the symbol hash table. So all later calls to find_symbol ('foo')
returned a reference to the removed symbol, rather than failing and
allowing 'foo' to be remade by a later instruction parse. With an
approach that suppresses insertion of wrong symbols rather than trying
to remove them later, this will not be an issue.
(The existing gas/bpf/indcall-1-pseudoc test was also failing for the
same reason.)

Thanks
David

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-15 21:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-14 17:58 [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
2023-11-14 17:58 ` [PATCH 1/2] gas: add symbol_table_remove David Faust
2023-11-14 17:58 ` [PATCH 2/2] bpf: remove symbols created during failed parse David Faust
2023-11-14 22:13 ` [PATCH 0/2] gas,bpf: cleanup bad symbols created while parsing David Faust
2023-11-15  9:49 ` Jan Beulich
2023-11-15 21:57   ` David Faust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).