public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: David Faust <david.faust@oracle.com>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: binutils@sourceware.org, jbeulich@suse.com
Subject: Re: [PATCH v2] bpf: avoid creating wrong symbols while parsing
Date: Fri, 17 Nov 2023 13:41:36 -0800	[thread overview]
Message-ID: <495d481c-4ad3-48f7-a0e9-1ab7d0697708@oracle.com> (raw)
In-Reply-To: <877cmg6s7h.fsf@oracle.com>



On 11/17/23 12:27, Jose E. Marchesi wrote:
> 
>> On 11/17/23 11:29, Jose E. Marchesi wrote:
>>>
>>> Hi David.
>>>
>>>> [ WAS: gas,bpf: cleanup bad symbols created while parsing
>>>>   v1: https://sourceware.org/pipermail/binutils/2023-November/130556.html
>>>>
>>>>   Changes from v1:
>>>>   - Rewrite patch to avoid inserting wrong symbols in the first place,
>>>>     rather than insert, remove, re-insert dance.
>>>>     Suggested by Jan Beulich. ]
>>>>
>>>> To support the "pseudo-C" asm dialect in BPF, the BPF parser must often
>>>> attempt multiple different templates for a single instruction. In some
>>>> cases this can cause the parser to incorrectly parse part of the
>>>> instruction opcode as an expression, which leads to the creation of a
>>>> new undefined symbol.
>>>>
>>>> Once the parser recognizes the error, the expression is discarded and it
>>>> tries again with a new instruction template. However, symbols created
>>>> during the process are added to the symbol table and are not removed
>>>> even if the expression is discarded.
>>>>
>>>> This is a problem for BPF: generally the assembled object will be loaded
>>>> directly to the Linux kernel, without being linked. These erroneous
>>>> parser-created symbols are rejected by the kernel BPF loader, and the
>>>> entire object is refused.
>>>>
>>>> This patch remedies the issue by tentatively creating symbols while
>>>> parsing instruction operands, and storing them in a temporary list
>>>> rather than immediately inserting them into the symbol table. Later,
>>>> after the parser is sure that it has correctly parsed the instruction,
>>>> those symbols are committed to the real symbol table.
>>>>
>>>> This approach is modeled directly after Jan Beulich's patch for RISC-V:
>>>>
>>>>   commit 7a29ee290307087e1749ce610207e93a15d0b78d
>>>>   RISC-V: adjust logic to avoid register name symbols
>>>>
>>>> Many thanks to Jan for recognizing the problem as similar, and pointing
>>>> me to that patch.
>>>>
>>>> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
>>>>
>>>> gas/
>>>>
>>>> 	* config/tc-bpf.c (parsing_insn_operands): New.
>>>> 	(parse_expression): Set it here.
>>>> 	(deferred_sym_rootP, deferred_sym_lastP): New.
>>>> 	(orphan_sym_rootP, orphan_sym_lastP): New.
>>>> 	(bpf_parse_name): New.
>>>> 	(parse_error): Clear deferred symbol list on error.
>>>> 	(md_assemble): Clear parsing_insn_operands. Commit deferred
>>>> 	symbols to symbol table on successful parse.
>>>> 	* config/tc-bpf.h (md_parse_name): Define to...
>>>> 	(bpf_parse_name): ...this. New prototype.
>>>> 	* testsuite/gas/bpf/asm-extra-sym-1.s: New test source.
>>>> 	* testsuite/gas/bpf/asm-extra-sym-1.d: New test.
>>>> 	* testsuite/gas/bpf/bpf.exp: Run new test.
>>>> ---
>>>>  gas/config/tc-bpf.c                     | 92 +++++++++++++++++++++++++
>>>>  gas/config/tc-bpf.h                     |  4 ++
>>>>  gas/testsuite/gas/bpf/asm-extra-sym-1.d |  7 ++
>>>>  gas/testsuite/gas/bpf/asm-extra-sym-1.s |  1 +
>>>>  gas/testsuite/gas/bpf/bpf.exp           |  3 +
>>>>  5 files changed, 107 insertions(+)
>>>>  create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.d
>>>>  create mode 100644 gas/testsuite/gas/bpf/asm-extra-sym-1.s
>>>>
>>>> diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
>>>> index fd4144a354b..3122f80804a 100644
>>>> --- a/gas/config/tc-bpf.c
>>>> +++ b/gas/config/tc-bpf.c
>>>> @@ -1223,6 +1223,7 @@ add_relaxed_insn (struct bpf_insn *insn, expressionS *exp)
>>>>     See md_operand below to see how exp_parse_failed is used.  */
>>>>  
>>>>  static int exp_parse_failed = 0;
>>>> +static bool parsing_insn_operands = false;
>>>>  
>>>>  static char *
>>>>  parse_expression (char *s, expressionS *exp)
>>>> @@ -1230,6 +1231,9 @@ parse_expression (char *s, expressionS *exp)
>>>>    char *saved_input_line_pointer = input_line_pointer;
>>>>    char *saved_s = s;
>>>>  
>>>> +  /* Wake up bpf_parse_name before the call to expression ().  */
>>>> +  parsing_insn_operands = true;
>>>> +
>>>>    exp_parse_failed = 0;
>>>>    input_line_pointer = s;
>>>>    expression (exp);
>>>> @@ -1251,6 +1255,71 @@ parse_expression (char *s, expressionS *exp)
>>>>    return s;
>>>>  }
>>>>  
>>>> +/* Symbols created by this parse, but not yet committed to the real
>>>> +   symbol table.  */
>>>> +static symbolS *deferred_sym_rootP;
>>>> +static symbolS *deferred_sym_lastP;
>>>> +
>>>> +/* Symbols discarded by a previous parse.  Symbols cannot easily be freed
>>>> +   after creation, so try to recycle.  */
>>>> +static symbolS *orphan_sym_rootP;
>>>> +static symbolS *orphan_sym_lastP;
>>>> +
>>>> +/* Implement md_parse_name hook.  Handles any symbol found in an expression.
>>>> +   This allows us to tentatively create symbols, before we know for sure
>>>> +   whether the parser is using the correct template for an instruction.
>>>> +   If we end up keeping the instruction, the deferred symbols are committed
>>>> +   to the real symbol table. This approach is modeled after the riscv port.  */
>>>> +
>>>> +bool
>>>> +bpf_parse_name (const char *name, expressionS *exp, enum expr_mode mode)
>>>> +{
>>>> +  symbolS *sym;
>>>> +
>>>> +  /* If we aren't currently parsing an instruction, don't do anything.
>>>> +     This prevents tampering with operands to directives.  */
>>>> +  if (!parsing_insn_operands)
>>>> +    return false;
>>>> +
>>>> +  gas_assert (mode == expr_normal);
>>>> +
>>>> +  if (symbol_find (name) != NULL)
>>>> +    return false;
>>>
>>> Is this check correct?
>>>
>>> I see that expr.c:operand calls symbol_find_or_make right after calling
>>> md_parse_name (if the later is defined) and actually installs the
>>> resulting symbol in expressionP even if it was already defined.  That is
>>> not done if md_parse_name returns `false'.
>>
>> Hmm, I think you may have the logic in operand () flipped.
>> If md_parse_name returns 'false' then symbol_find_or_make is called;
>> if it returns 'true' then the 'break' is hit:
>>
>> #ifdef md_parse_name
>> 	...
>> 	  if (md_parse_name (name, expressionP, mode, &c))
>> 	    {
>> 	      restore_line_pointer (c);
>> 	      break;
>> 	    }
>> #endif
>> 	  symbolP = symbol_find_or_make (name);
>> 	...
>>
>> That is what we want to happen - if the symbol_find in parse_name
>> returns non-NULL, then we know a real symbol already exists for the
>> name we are looking at, and we should use it. In that case,
>> md_parse_name returns false, so operand () calls symbol_find_or_make,
>> which finds the extant symbol and installs it in the expression.
> 
> Perfect then.
> Please apply.
> Thanks!

Pushed, thanks.

> 
>>
>>>
>>>> +
>>>> +  for (sym = deferred_sym_rootP; sym; sym = symbol_next (sym))
>>>> +    if (strcmp (name, S_GET_NAME (sym)) == 0)
>>>> +      break;
>>>> +
>>>> +  /* Tentatively create a symbol.  */
>>>> +  if (!sym)
>>>> +    {
>>>> +      /* See if we can reuse a symbol discarded by a previous parse.
>>>> +	 This may be quite common, for example when trying multiple templates
>>>> +	 for an instruction with the first reference to a valid symbol.  */
>>>> +      for (sym = orphan_sym_rootP; sym; sym = symbol_next (sym))
>>>> +	if (strcmp (name, S_GET_NAME (sym)) == 0)
>>>> +	  {
>>>> +	    symbol_remove (sym, &orphan_sym_rootP, &orphan_sym_lastP);
>>>> +	    break;
>>>> +	  }
>>>> +
>>>> +      if (!sym)
>>>> +	  sym = symbol_create (name, undefined_section, &zero_address_frag, 0);
>>>> +
>>>> +      /* Add symbol to the deferred list.  If we commit to the isntruction,
>>>> +	 then the symbol will be inserted into to the real symbol table at
>>>> +	 that point (in md_assemble).  */
>>>> +      symbol_append (sym, deferred_sym_lastP, &deferred_sym_rootP,
>>>> +		     &deferred_sym_lastP);
>>>> +    }
>>>> +
>>>> +  exp->X_op = O_symbol;
>>>> +  exp->X_add_symbol = sym;
>>>> +  exp->X_add_number = 0;
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>>  /* Parse a BPF register name and return the corresponding register
>>>>     number.  Return NULL in case of parse error, or a pointer to the
>>>>     first character in S that is not part of the register name.  */
>>>> @@ -1317,6 +1386,16 @@ parse_error (int length, const char *fmt, ...)
>>>>        va_end (args);
>>>>        partial_match_length = length;
>>>>      }
>>>> +
>>>> +  /* Discard deferred symbols from the failed parse.  They may potentially
>>>> +     be reused in the future from the orphan list.  */
>>>> +  while (deferred_sym_rootP)
>>>> +    {
>>>> +      symbolS *sym = deferred_sym_rootP;
>>>> +      symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
>>>> +      symbol_append (sym, orphan_sym_lastP, &orphan_sym_rootP,
>>>> +		     &orphan_sym_lastP);
>>>> +    }
>>>>  }
>>>>  
>>>>  /* Assemble a machine instruction in STR and emit the frags/bytes it
>>>> @@ -1606,6 +1685,10 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
>>>>          }
>>>>      }
>>>>  
>>>> +  /* Mark that we are no longer parsing an instruction, bpf_parse_name does
>>>> +     not interfere with symbols in e.g. assembler directives.  */
>>>> +  parsing_insn_operands = false;
>>>> +
>>>>    if (opcode == NULL)
>>>>      {
>>>>        as_bad (_("unrecognized instruction `%s'"), str);
>>>> @@ -1622,6 +1705,15 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
>>>>  
>>>>  #undef PARSE_ERROR
>>>>  
>>>> +  /* Commit any symbols created while parsing the instruction.  */
>>>> +  while (deferred_sym_rootP)
>>>> +    {
>>>> +      symbolS *sym = deferred_sym_rootP;
>>>> +      symbol_remove (sym, &deferred_sym_rootP, &deferred_sym_lastP);
>>>> +      symbol_append (sym, symbol_lastP, &symbol_rootP, &symbol_lastP);
>>>> +      symbol_table_insert (sym);
>>>> +    }
>>>> +
>>>>    /* Generate the frags and fixups for the parsed instruction.  */
>>>>    if (do_relax && isa_spec >= BPF_V4 && insn.is_relaxable)
>>>>      {
>>>> diff --git a/gas/config/tc-bpf.h b/gas/config/tc-bpf.h
>>>> index 9fb71eddd14..06096ef5926 100644
>>>> --- a/gas/config/tc-bpf.h
>>>> +++ b/gas/config/tc-bpf.h
>>>> @@ -51,6 +51,10 @@
>>>>     a jump to offset 0 means jump to the next instruction.  */
>>>>  #define md_single_noop_insn "ja 0"
>>>>  
>>>> +#define md_parse_name(name, exp, mode, c) \
>>>> +  bpf_parse_name (name, exp, mode)
>>>> +bool bpf_parse_name (const char *, struct expressionS *, enum expr_mode);
>>>> +
>>>>  #define TC_EQUAL_IN_INSN(c, s) bpf_tc_equal_in_insn ((c), (s))
>>>>  extern bool bpf_tc_equal_in_insn (int, char *);
>>>>  
>>>> diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.d b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
>>>> new file mode 100644
>>>> index 00000000000..113750dd3fd
>>>> --- /dev/null
>>>> +++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.d
>>>> @@ -0,0 +1,7 @@
>>>> +#as: -EL -mdialect=pseudoc
>>>> +#nm: --numeric-sort
>>>> +#source: asm-extra-sym-1.s
>>>> +#name: BPF pseudoc no extra symbols 1
>>>> +
>>>> +# Note: there should be no output from nm.
>>>> +# Previously a bug in the BPF parser created an UND '*' symbol.
>>>> diff --git a/gas/testsuite/gas/bpf/asm-extra-sym-1.s b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
>>>> new file mode 100644
>>>> index 00000000000..2cfa605a259
>>>> --- /dev/null
>>>> +++ b/gas/testsuite/gas/bpf/asm-extra-sym-1.s
>>>> @@ -0,0 +1 @@
>>>> +    r2 = *(u32*)(r1 + 8)
>>>> diff --git a/gas/testsuite/gas/bpf/bpf.exp b/gas/testsuite/gas/bpf/bpf.exp
>>>> index 80f5a1dbc2d..fcbeccd8ecd 100644
>>>> --- a/gas/testsuite/gas/bpf/bpf.exp
>>>> +++ b/gas/testsuite/gas/bpf/bpf.exp
>>>> @@ -72,4 +72,7 @@ if {[istarget bpf*-*-*]} {
>>>>      run_dump_test disp16-overflow-relax
>>>>      run_dump_test disp32-overflow
>>>>      run_dump_test imm32-overflow
>>>> +
>>>> +    # Test that parser does not create undefined symbols
>>>> +    run_dump_test asm-extra-sym-1
>>>>  }

      reply	other threads:[~2023-11-17 21:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-17 18:54 David Faust
2023-11-17 19:29 ` Jose E. Marchesi
2023-11-17 20:09   ` David Faust
2023-11-17 20:27     ` Jose E. Marchesi
2023-11-17 21:41       ` David Faust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=495d481c-4ad3-48f7-a0e9-1ab7d0697708@oracle.com \
    --to=david.faust@oracle.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    --cc=jose.marchesi@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).