From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
To: David Faust <david.faust@oracle.com>
Cc: binutils@sourceware.org, jbeulich@suse.com
Subject: Re: [PATCH v2] bpf: avoid creating wrong symbols while parsing
Date: Fri, 17 Nov 2023 20:29:44 +0100 [thread overview]
Message-ID: <87edgo6uvr.fsf@oracle.com> (raw)
In-Reply-To: <20231117185428.10823-1-david.faust@oracle.com> (David Faust's message of "Fri, 17 Nov 2023 10:54:28 -0800")
Hi David.
> [ WAS: gas,bpf: cleanup bad symbols created while parsing
> v1: https://sourceware.org/pipermail/binutils/2023-November/130556.html
>
> Changes from v1:
> - Rewrite patch to avoid inserting wrong symbols in the first place,
> rather than insert, remove, re-insert dance.
> Suggested by Jan Beulich. ]
>
> To support the "pseudo-C" asm dialect in BPF, the BPF parser must often
> attempt multiple different templates for a single instruction. In some
> cases this can cause the parser to incorrectly parse part of the
> instruction opcode as an expression, which leads to the creation of a
> new undefined symbol.
>
> Once the parser recognizes the error, the expression is discarded and it
> tries again with a new instruction template. However, symbols created
> during the process are added to the symbol table and are not removed
> even if the expression is discarded.
>
> This is a problem for BPF: generally the assembled object will be loaded
> directly to the Linux kernel, without being linked. These erroneous
> parser-created symbols are rejected by the kernel BPF loader, and the
> entire object is refused.
>
> This patch remedies the issue by tentatively creating symbols while
> parsing instruction operands, and storing them in a temporary list
> rather than immediately inserting them into the symbol table. Later,
> after the parser is sure that it has correctly parsed the instruction,
> those symbols are committed to the real symbol table.
>
> This approach is modeled directly after Jan Beulich's patch for RISC-V:
>
> commit 7a29ee290307087e1749ce610207e93a15d0b78d
> RISC-V: adjust logic to avoid register name symbols
>
> Many thanks to Jan for recognizing the problem as similar, and pointing
> me to that patch.
>
> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
>
> gas/
>
> * config/tc-bpf.c (parsing_insn_operands): New.
> (parse_expression): Set it here.
> (deferred_sym_rootP, deferred_sym_lastP): New.
> (orphan_sym_rootP, orphan_sym_lastP): New.
> (bpf_parse_name): New.
> (parse_error): Clear deferred symbol list on error.
> (md_assemble): Clear parsing_insn_operands. Commit deferred
> symbols to symbol table on successful parse.
> * config/tc-bpf.h (md_parse_name): Define to...
> (bpf_parse_name): ...this. New prototype.
> * testsuite/gas/bpf/asm-extra-sym-1.s: New test source.
> * testsuite/gas/bpf/asm-extra-sym-1.d: New test.
> * testsuite/gas/bpf/bpf.exp: Run new test.
> ---
> gas/config/tc-bpf.c | 92 +++++++++++++++++++++++++
> gas/config/tc-bpf.h | 4 ++
> gas/testsuite/gas/bpf/asm-extra-sym-1.d | 7 ++
> gas/testsuite/gas/bpf/asm-extra-sym-1.s | 1 +
> gas/testsuite/gas/bpf/bpf.exp | 3 +
> 5 files changed, 107 insertions(+)
> create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.d
> create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.s
>
> diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
> index fd4144a354b..3122f80804a 100644
> --- a/gas/config/tc-bpf.c
> +++ b/gas/config/tc-bpf.c
> @@ -1223,6 +1223,7 @@ add_relaxed_insn (struct bpf_insn *insn, expressionS *exp)
> See md_operand below to see how exp_parse_failed is used. */
>
> static int exp_parse_failed = 0;
> +static bool parsing_insn_operands = false;
>
> static char *
> parse_expression (char *s, expressionS *exp)
> @@ -1230,6 +1231,9 @@ parse_expression (char *s, expressionS *exp)
> char *saved_input_line_pointer = input_line_pointer;
> char *saved_s = s;
>
> + /* Wake up bpf_parse_name before the call to expression (). */
> + parsing_insn_operands = true;
> +
> exp_parse_failed = 0;
> input_line_pointer = s;
> expression (exp);
> @@ -1251,6 +1255,71 @@ parse_expression (char *s, expressionS *exp)
> return s;
> }
>
> +/* Symbols created by this parse, but not yet committed to the real
> + symbol table. */
> +static symbolS *deferred_sym_rootP;
> +static symbolS *deferred_sym_lastP;
> +
> +/* Symbols discarded by a previous parse. Symbols cannot easily be freed
> + after creation, so try to recycle. */
> +static symbolS *orphan_sym_rootP;
> +static symbolS *orphan_sym_lastP;
> +
> +/* Implement md_parse_name hook. Handles any symbol found in an expression.
> + This allows us to tentatively create symbols, before we know for sure
> + whether the parser is using the correct template for an instruction.
> + If we end up keeping the instruction, the deferred symbols are committed
> + to the real symbol table. This approach is modeled after the riscv port. */
> +
> +bool
> +bpf_parse_name (const char *name, expressionS *exp, enum expr_mode mode)
> +{
> + symbolS *sym;
> +
> + /* If we aren't currently parsing an instruction, don't do anything.
> + This prevents tampering with operands to directives. */
> + if (!parsing_insn_operands)
> + return false;
> +
> + gas_assert (mode == expr_normal);
> +
> + if (symbol_find (name) != NULL)
> + return false;
Is this check correct?
I see that expr.c:operand calls symbol_find_or_make right after calling
md_parse_name (if the later is defined) and actually installs the
resulting symbol in expressionP even if it was already defined. That is
not done if md_parse_name returns `false'.
> +
> + for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
> + if (strcmp (name, S_GET_NAME (sym)) == 0)
> + break;
> +
> + /* Tentatively create a symbol. */
> + if (!sym)
> + {
> + /* See if we can reuse a symbol discarded by a previous parse.
> + This may be quite common, for example when trying multiple templates
> + for an instruction with the first reference to a valid symbol. */
> + for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
> + if (strcmp (name, S_GET_NAME (sym)) == 0)
> + {
> + symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
> + break;
> + }
> +
> + if (!sym)
> + sym = symbol_create (name, undefined_section, &zero_address_frag, 0);
> +
> + /* Add symbol to the deferred list. If we commit to the isntruction,
> + then the symbol will be inserted into to the real symbol table at
> + that point (in md_assemble). */
> + symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
> + &deferred_sym_lastP);
> + }
> +
> + exp->X_op = O_symbol;
> + exp->X_add_symbol = sym;
> + exp->X_add_number = 0;
> +
> + return true;
> +}
> +
> /* Parse a BPF register name and return the corresponding register
> number. Return NULL in case of parse error, or a pointer to the
> first character in S that is not part of the register name. */
> @@ -1317,6 +1386,16 @@ parse_error (int length, const char *fmt, ...)
> va_end (args);
> partial_match_length = length;
> }
> +
> + /* Discard deferred symbols from the failed parse. They may potentially
> + be reused in the future from the orphan list. */
> + while (deferred_sym_rootP)
> + {
> + symbolS *sym = deferred_sym_rootP;
> + symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
> + symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
> + &orphan_sym_lastP);
> + }
> }
>
> /* Assemble a machine instruction in STR and emit the frags/bytes it
> @@ -1606,6 +1685,10 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
> }
> }
>
> + /* Mark that we are no longer parsing an instruction, bpf_parse_name does
> + not interfere with symbols in e.g. assembler directives. */
> + parsing_insn_operands = false;
> +
> if (opcode == NULL)
> {
> as_bad (_("unrecognized instruction `%s'"), str);
> @@ -1622,6 +1705,15 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
>
> #undef PARSE_ERROR
>
> + /* Commit any symbols created while parsing the instruction. */
> + while (deferred_sym_rootP)
> + {
> + symbolS *sym = deferred_sym_rootP;
> + symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
> + symbol_append (sym, symbol_lastP, &symbol_rootP, &symbol_lastP);
> + symbol_table_insert (sym);
> + }
> +
> /* Generate the frags and fixups for the parsed instruction. */
> if (do_relax && isa_spec >= BPF_V4 && insn.is_relaxable)
> {
> diff --git a/gas/config/tc-bpf.h b/gas/config/tc-bpf.h
> index 9fb71eddd14..06096ef5926 100644
> --- a/gas/config/tc-bpf.h
> +++ b/gas/config/tc-bpf.h
> @@ -51,6 +51,10 @@
> a jump to offset 0 means jump to the next instruction. */
> #define md_single_noop_insn "ja 0"
>
> +#define md_parse_name(name, exp, mode, c) \
> + bpf_parse_name (name, exp, mode)
> +bool bpf_parse_name (const char *, struct expressionS *, enum expr_mode);
> +
> #define TC_EQUAL_IN_INSN(c, s) bpf_tc_equal_in_insn ((c), (s))
> extern bool bpf_tc_equal_in_insn (int, char *);
>
> diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.d b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
> new file mode 100644
> index 00000000000..113750dd3fd
> --- /dev/null
> +++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
> @@ -0,0 +1,7 @@
> +#as: -EL -mdialect=pseudoc
> +#nm: --numeric-sort
> +#source: asm-extra-sym-1.s
> +#name: BPF pseudoc no extra symbols 1
> +
> +# Note: there should be no output from nm.
> +# Previously a bug in the BPF parser created an UND '*' symbol.
> diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.s b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
> new file mode 100644
> index 00000000000..2cfa605a259
> --- /dev/null
> +++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
> @@ -0,0 +1 @@
> + r2 = *(u32*)(r1 + 8)
> diff --git a/gas/testsuite/gas/bpf/bpf.exp b/gas/testsuite/gas/bpf/bpf.exp
> index 80f5a1dbc2d..fcbeccd8ecd 100644
> --- a/gas/testsuite/gas/bpf/bpf.exp
> +++ b/gas/testsuite/gas/bpf/bpf.exp
> @@ -72,4 +72,7 @@ if {[istarget bpf*-*-*]} {
> run_dump_test disp16-overflow-relax
> run_dump_test disp32-overflow
> run_dump_test imm32-overflow
> +
> + # Test that parser does not create undefined symbols
> + run_dump_test asm-extra-sym-1
> }
next prev parent reply other threads:[~2023-11-17 19:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-17 18:54 David Faust
2023-11-17 19:29 ` Jose E. Marchesi [this message]
2023-11-17 20:09 ` David Faust
2023-11-17 20:27 ` Jose E. Marchesi
2023-11-17 21:41 ` David Faust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87edgo6uvr.fsf@oracle.com \
--to=jose.marchesi@oracle.com \
--cc=binutils@sourceware.org \
--cc=david.faust@oracle.com \
--cc=jbeulich@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).