Re: [PATCH,V4 10/14] gas: synthesize CFI for hand-written asm

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH,V4 10/14] gas: synthesize CFI for hand-written asm
Date: Wed, 10 Jan 2024 15:15:10 +0100	[thread overview]
Message-ID: <df095873-5dd0-4de3-b583-5560562d1c13@suse.com> (raw)
In-Reply-To: <409f6d2d-cd7e-4822-a29a-8970655c5af0@oracle.com>

On 10.01.2024 12:26, Indu Bhagat wrote:
> On 1/10/24 01:44, Jan Beulich wrote:
>> On 10.01.2024 07:10, Indu Bhagat wrote:
>>> On 1/9/24 01:30, Jan Beulich wrote:
>>>> On 08.01.2024 20:33, Indu Bhagat wrote:
>>>>> On 1/5/24 05:58, Jan Beulich wrote:
>>>>>> On 03.01.2024 08:15, Indu Bhagat wrote:
>>>>>>> +/* Check whether a '66H' prefix accompanies the instruction.
>>>>>>
>>>>>> With APX 16-bit operand size isn't necessarily represented by a 66h
>>>>>> prefix, but perhaps with an "embedded prefix" inside the EVEX one.
>>>>>> Therefore both the comment and even more so ...
>>>>>>
>>>>>>> +   The current users of this API are in the handlers for PUSH, POP
>>>>>>> +   instructions.  These instructions affect the stack pointer implicitly:  the
>>>>>>> +   operand size (16, 32, or 64 bits) determines the amount by which the stack
>>>>>>> +   pointer is decremented (2, 4 or 8).  When '66H' prefix is present, the
>>>>>>> +   instruction has a 16-bit operand.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +ginsn_prefix_66H_p (i386_insn insn)
>>>>>>
>>>>>> ... the function name would better not allude to just the legacy
>>>>>> encoding. Maybe ginsn_opsize_prefix_p()?
>>>>>>
>>>>>
>>>>> Isnt 66H_p more readable and easier to follow because that's what the
>>>>> function is currently checking ?  If more scenarios were being handled,
>>>>> ginsn_opsize_prefix_p () would fit better.
>>>>
>>>> Well, as said - with APX you can't get away with just 0x66 prefix checking.
>>>> That prefix is simply illegal to use with EVEX-encoded insns.
>>>>
>>>
>>> I am using the following in ginsn_opsize_prefix_p ():
>>>
>>> !(i.prefix[REX_PREFIX] & REX_W) && i.prefix[DATA_PREFIX] == 0x66
>>
>> That addresses one half of my earlier remarks. Note however that elsewhere
>> we never check i.prefix[DATA_PREFIX] against being 0x66; we only ever check
>> for it being zero or non-zero. I'd like this to remain consistent.
>>
>> For EVEX-encoded APX insns this isn't going to be sufficient though. See
>> respective code in process_suffix():
>>
>> 	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
>> 	     needs to be adjusted.  */
>> 	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
>> 	    {
>> 	      gas_assert (!i.tm.opcode_modifier.opcodeprefix);
>> 	      i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
>> 	    }
>> 	  else if (!add_prefix (prefix))
>> 	    return 0;
>>
>> So you'll need to also check for that combination, plus take care of
>> avoiding insns where PREFIX_0X66 alters operation, not operand size
>> (ADCX being an example).
>>
> 
> But I am now disallowing APX insns for now, altogether, by doing this in 
> the x86_ginsn_new:
> 
> /* Until it is clear how to handle APX NDD and other new opcodes,   disallow
>       them from SCFI.  */
>    if (is_apx_rex2_encoding ()
>        || (i.tm.opcode_modifier.evex && is_apx_evex_encoding ()))
>      {
>        as_bad (_("SCFI: unsupported APX op %#x may cause incorrect CFI"),
>                opcode);
>        return ginsn;
>      }

Well, if that's what you want to do ... (Even if you wanted to not support
APX for now, I would have expected the catch-all looking at just the
destination register to properly diagnose any use.)

>>>>>>> +    }
>>>>>>> +
>>>>>>> +  ginsn_set_where (ginsn);
>>>>>>> +
>>>>>>> +  return ginsn;
>>>>>>> +}
>>>>>>
>>>>>> Throughout this function (and perhaps others as well), how come you
>>>>>> don't consider operand size at all? It matters whether results are
>>>>>> 64-bit, 32-bit, or 16-bit, after all.
>>>>>
>>>>> Operation size matters: No, not for all instructions in context of SCFI.
>>>>> For instructions using stack (save / restore), the size is important.
>>>>> But for other instructions, operation size will not affect SCFI correctness.
>>>>
>>>> But aren't you wrongly treating an update of %rbp and an update of,
>>>> say, %bp the same then? The latter should end up as untracable, aiui.
>>>>
>>>
>>> For ALU ops:
>>>     - ADD/SUB reg1, reg2
>>>       If reg2 is the same as REG_CFA, then even in 64-bit mode, this
>>> causes untraceability. So there is untraceability irrespective of
>>> operation size. On the other hand, if uninteresting, it will remain
>>> unintersting irrespective of operation size.
>>>     - ADD/SUB imm, reg will never trigger untraceability, irrespective of
>>> size. There is the element of ignoring the carry bit, but I think the
>>> current diagnostics around "asymmetrical save/restore" and the planned
>>> "unbalanced stack at return" should provide user with some safety net.
>>
>> I don't see how the carry flag would matter here. What does matter is the
>> size of the register: Anything not 64-bit will need to trigger
>> untraceability imo.
>>
> 
>   - For a 16-bit ADD/SUB imm, reg insn: if the 16-bit result had a 
> carry, it will be ignored as only 16-bit result is copied to the 
> destination address IIUC.  If there is no carry bit, 16-bit ADD/SUB imm, 
> reg is semantically the same insn as a 64-bit ADD/SUB imm, reg for the 
> same immediate.

Oh, that's where you see CF coming into play. I wouldn't view it from
this angle - you simply don't know whether overflow/underflow would
occur, so it's no different to ...

>   - For 32-bit ADD/SUB imm, reg insn: yes, I stand corrected. It should 
> trigger untraceability as upper 32-bits are zeroed out. IMO, this 32-bit 
> operation should likely be unintentional by the user; something better 
> alerted to the user if we can.

... this case.

>>>>>>> +/* Generate one or more generic GAS instructions, a.k.a, ginsns for the current
>>>>>>> +   machine instruction.
>>>>>>> +
>>>>>>> +   Returns the head of linked list of ginsn(s) added, if success; Returns NULL
>>>>>>> +   if failure.
>>>>>>> +
>>>>>>> +   The input ginsn_gen_mode GMODE determines the set of minimal necessary
>>>>>>> +   ginsns necessary for correctness of any passes applicable for that mode.
>>>>>>> +   For supporting the GINSN_GEN_SCFI generation mode, following is the list of
>>>>>>> +   machine instructions that must be translated into the corresponding ginsns
>>>>>>> +   to ensure correctness of SCFI:
>>>>>>> +     - All instructions affecting the two registers that could potentially
>>>>>>> +       be used as the base register for CFA tracking.  For SCFI, the base
>>>>>>> +       register for CFA tracking is limited to REG_SP and REG_FP only for
>>>>>>> +       now.
>>>>>>> +     - All change of flow instructions: conditional and unconditional branches,
>>>>>>> +       call and return from functions.
>>>>>>> +     - All instructions that can potentially be a register save / restore
>>>>>>> +       operation.
>>>>>>
>>>>>> This could do with being more fine grained, as "potentially" is pretty vague,
>>>>>> and (as per earlier version review comments) my take on this is a much wider
>>>>>> set than yours.
>>>>>
>>>>> I would like to understand more on this comment, especially the "my take
>>>>> on this is a much wider set than yours".  I see its being hinted at in
>>>>> different flavors in the current review.
>>>>>
>>>>> I see some issues pointed out in this review (addressing modes of mov
>>>>> etc, safe to skip opcodes for TEST, CMP) etc., but it seems that your
>>>>> concerns are wider than this.
>>>>
>>>> I earlier version review I mentioned that even vector or mask registers
>>>> could in principle be use to hold preserved GPR values. I seem to recall
>>>> that you said you wouldn't want to deal with such. Hence my use of
>>>> "wider set": Just to give an example, "kmovq %rbp, %k0" plus later
>>>> "kmovq %k0, %rbp" is a pair of "instructions that can potentially be a
>>>> register save / restore operation".
>>>>
>>>
>>> Hmm. I will need to understand them on a case to case basis.  For the
>>> case of "kmovq %rbp, %k0" / "kmovq %k0, %rbp" how can this be used as
>>> save/restore to/from stack ?
>>
>> Maybe I'm still not having a clear enough picture of what forms of insns
>> you want to fully track. Said insn forms don't access the stack. But they
>> could in principle be used to preserve a certain register. Such preserving
>> of registers is part of what needs encoding in CFI, isn't it?
>>
> 
> The kind of preserving is usually on stack. It can also be in another 
> callee-saved register, in theory, but the latter defeats the purpose of 
> state saving across calls.

Callee-preserved registers, when they have a special purpose in the
architecture (like %rsi, %rdi, and %rbx have) may be cheaper to
preserve by moving to a call-clobbered register that isn't otherwise
used in the function. In the SysV ABI this only affects %rbx, the
special purpose of which is extremely limited in the ISA (xlatb). In
the Windows ABI, otoh, %rsi and %rdi are callee-preserved, and those
have very common uses in the string insns.

> BTW, I had earlier written down some notes about SCFI 
> https://sourceware.org/pipermail/binutils/2023-September/129558.html
> Some bits are stale already though, but may be it helps.

I had read through this before first reviewing v1 (I think it was).

>>>>>>> +    case 0xc2:
>>>>>>> +    case 0xc3:
>>>>>>> +      if (i.tm.opcode_space != SPACE_BASE)
>>>>>>> +	break;
>>>>>>> +      /* Near ret.  */
>>>>>>> +      ginsn = ginsn_new_return (insn_end_sym, true);
>>>>>>> +      ginsn_set_where (ginsn);
>>>>>>> +      break;
>>>>>>
>>>>>> No tracking of the stack pointer adjustment?
>>>>>
>>>>> No stack unwind information for a function is relevant after the
>>>>> function has returned.  So, tracking of stack pointer adjustment by
>>>>> return is not necessary.
>>>>
>>>> What information does the "return" insn then carry, beyond it being
>>>> an unconditional branch (which you have a different insn for)?
>>>>
>>>
>>> "return" does not carry any more information than just the
>>> GINSN_TYPE_RETURN as ginsn->type.
>>>
>>> So then why support both "return" and an unconditional branch: The
>>> intention is to carry the semantic difference between ret and
>>> unconditional jump.  Unconditional jumps may be to a label within
>>> function, and in those cases, we use it for some validation and BB
>>> linking when creating CFG. Return, OTOH, always indicates exit from
>>> function.
>>>
>>> For SCFI purposes, above is the one use.  Future analyses may find other
>>> use-cases for an explicit return ginsn.  But IMO, keeping
>>> GINSN_TYPE_RETURN as an explicit insn makes the overall offering cleaner.
>>
>> Okay. And here you don't bother decoding operands. Hence why I'm
>> asking the same to be the case for (e.g.) CALL.
>>
> 
> It seems I will need to deal with operands of RETURN insn soon.  For 
> implementing "Warn if imbalanced stack at return", we will need this info.

Will you? Isn't stack state _before_ the RET what matters (and hence
the optional immediate still doesn't matter)?

>>>>>>> @@ -5830,6 +6804,14 @@ md_assemble (char *line)
>>>>>>>       /* We are ready to output the insn.  */
>>>>>>>       output_insn (last_insn);
>>>>>>>     
>>>>>>> +  /* PS: SCFI is enabled only for AMD64 ABI.  The ABI check has been
>>>>>>> +     performed in i386_target_format.  */
>>>>>>
>>>>>> See my earlier comment - it's yet more restrictive (as in not covering
>>>>>> e.g. the Windows ABI, which importantly is also used in EFI).
>>>>>
>>>>> Not clear, can you please elaborate ?
>>>>
>>>> Hmm, it's not clear to me what's not clear in my earlier comment. The
>>>> set of preserved registers, for example, differs between the SysV and
>>>> the Windows ABIs (see x86_scfi_callee_saved_p()). Then again, thinking
>>>> of it, talking of anything ABI-ish in assembly code is questionable.
>>>> An assembly function calling another assembly function may use an
>>>> entirely custom "ABI". You just cannot guess upon preserved registers.
>>>>
>>>
>>> I think the confusion is stemming from my usage of AMD64 colloquially to
>>> refer to System V ABI for x86_64. I think "System V AMD64 ABI" is the
>>> correct term. I will use that.
>>>
>>> And yes, GAS can only work out SCFI if there is ABI adherence.  If input
>>> asm does not adhere to the supported ABIs, and the user invokes SCFI,
>>> then the user is on their own.  GAS will not (rather can not) warn about
>>> ABI incompliance.
>>
>> I don't recall documentation stating this explicitly. This is a pretty
>> important aspect for users to consider, after all.
>>
> 
> I added a reference in gas/NEWS for now.
> 
> * Experimental support in GAS to synthesize CFI for ABI-conformant,
>    hand-written asm using the new command line option 
> --scfi=experimental  on x86-64.
> 
> I will add a mention of "ABI conformance" in as.texi.

Maybe for NEWS this is enough; personally I don't think it is when there
are multiple ABIs for the/any affected architecture. You limiting support
to ELF doesn't mean the Windows ABI isn't in use. As said, UEFI uses that
ABI. And GNU ld can link ELF objects into EFI binaries.

Jan

next prev parent reply	other threads:[~2024-01-10 14:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03  7:15 [PATCH,V4 00/14] Synthesize " Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 01/14] gas: dw2gencfi: minor rejig for cfi_sections_set and all_cfi_sections Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 02/14] gas: dw2gencfi: use all_cfi_sections instead of cfi_sections Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 03/14] gas: dw2gencfi: expose a new cfi_set_last_fde API Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 04/14] gas: dw2gencfi: move some tc_* defines to the header file Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 05/14] gas: dw2gencfi: expose dot_cfi_sections for scfidw2gen Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 06/14] gas: dw2gencfi: externalize the all_cfi_sections Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 07/14] gas: add new command line option --scfi[=all,none] Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 08/14] gas: scfidw2gen: new functionality to prepare for SCFI Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 09/14] opcodes: i386: new marker for insns that implicitly update stack pointer Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 10/14] gas: synthesize CFI for hand-written asm Indu Bhagat
2024-01-05 13:58   ` Jan Beulich
2024-01-08  0:46     ` Indu Bhagat
2024-01-08  8:16       ` Jan Beulich
2024-01-08  8:33         ` Indu Bhagat
2024-01-08 19:33     ` Indu Bhagat
2024-01-09  9:30       ` Jan Beulich
2024-01-10  6:10         ` Indu Bhagat
2024-01-10  9:44           ` Jan Beulich
2024-01-10 11:26             ` Indu Bhagat
2024-01-10 14:15               ` Jan Beulich [this message]
2024-01-10 19:43                 ` Indu Bhagat
2024-01-11  8:13                   ` Jan Beulich
2024-01-11 18:14                     ` Indu Bhagat
2024-01-17  1:20             ` Indu Bhagat
2024-01-17  8:09               ` Jan Beulich
2024-01-03  7:15 ` [PATCH,V4 11/14] gas: doc: update documentation for the new listing option Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 12/14] i386-reg.tbl: Add a comment to reflect dependency on ordering Indu Bhagat
2024-01-03  7:15 ` [PATCH,V4 13/14] gas: testsuite: add a x86_64 testsuite for SCFI Indu Bhagat
2024-01-05 14:22   ` Jan Beulich
2024-01-05 22:29     ` Indu Bhagat
2024-01-08  8:11       ` Jan Beulich
2024-01-03  7:15 ` [PATCH,V4 14/14] gas/NEWS: announce the new SCFI command line option Indu Bhagat
2024-01-03  7:43 ` [PATCH,V4 09/14] opcodes: i386: new marker for insns that implicitly update stack pointer Indu Bhagat
2024-01-05 14:05   ` [PATCH, V4 " Jan Beulich
2024-01-06 10:08     ` Indu Bhagat
2024-01-08  8:12       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df095873-5dd0-4de3-b583-5560562d1c13@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=indu.bhagat@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).