Re: [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: Indu Bhagat <indu.bhagat@oracle.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm
Date: Tue, 31 Oct 2023 15:10:09 +0100	[thread overview]
Message-ID: <c7a60721-4716-bfe1-76db-0909523cbc64@suse.com> (raw)
In-Reply-To: <20231030165137.2570939-8-indu.bhagat@oracle.com>

On 30.10.2023 17:51, Indu Bhagat wrote:
> gas/
> 	* Makefile.am: Add new files.
> 	* Makefile.in: Regenerated.
> 	* as.c (defined): Handle documentation and listing option for
> 	ginsns and SCFI.
> 	* config/obj-elf.c (obj_elf_size): Invoke ginsn_data_end.
> 	(obj_elf_type): Invoke ginsn_data_begin.
> 	* config/tc-i386.c (ginsn_new): New functionality to generate
> 	ginsns.
> 	(x86_scfi_callee_saved_p): New function.
> 	(ginsn_dw2_regnum): Likewise.
> 	(ginsn_set_where): Likewise.
> 	(x86_ginsn_alu): Likewise.
> 	(x86_ginsn_move): Likewise.
> 	(x86_ginsn_lea): Likewise.
> 	(x86_ginsn_jump): Likewise.
> 	(x86_ginsn_jump_cond): Likewise.
> 	(md_assemble): Invoke ginsn_new.
> 	(s_insn): Likewise.
> 	(i386_target_format): Add hard error for usage of --scfi with non AMD64 ABIs.
> 	* config/tc-i386.h (TARGET_USE_GINSN): New definition.
> 	(TARGET_USE_SCFI): Likewise.
> 	(SCFI_NUM_REGS): Likewise.
> 	(REG_FP): Likewise.
> 	(REG_SP): Likewise.
> 	(SCFI_INIT_CFA_OFFSET): Likewise.
> 	(SCFI_CALLEE_SAVED_REG_P): Likewise.
> 	(x86_scfi_callee_saved_p): Likewise.

For this arch-specific code there's a fundamental question of maintenance
cost here: It doesn't look very reasonable to me to demand of people adding
support for new ISA extensions to also take into consideration all of this
new machinery. Yet if any such addition affects SCFI, things will go out-
of-sync without updating this code as well. It may not be very often that
such updating is necessary, but right now there's APX work in progress
which I expect will need dealing with here as well.

> --- a/gas/as.c
> +++ b/gas/as.c
> @@ -45,6 +45,7 @@
>  #include "codeview.h"
>  #include "bfdver.h"
>  #include "write.h"
> +#include "ginsn.h"
>  
>  #ifdef HAVE_ITBL_CPU
>  #include "itbl-ops.h"
> @@ -245,6 +246,7 @@ Options:\n\
>                        	  d      omit debugging directives\n\
>                        	  g      include general info\n\
>                        	  h      include high-level source\n\
> +			  i      include ginsn and synthesized CFI info\n\
>                        	  l      include assembly\n\
>                        	  m      include macro expansions\n\
>                        	  n      omit forms processing\n\

Please can you make indentation match neighboring code?

> --- a/gas/config/obj-elf.c
> +++ b/gas/config/obj-elf.c
> @@ -24,6 +24,7 @@
>  #include "subsegs.h"
>  #include "obstack.h"
>  #include "dwarf2dbg.h"
> +#include "ginsn.h"
>  
>  #ifndef ECOFF_DEBUGGING
>  #define ECOFF_DEBUGGING 0
> @@ -2302,6 +2303,13 @@ obj_elf_size (int ignore ATTRIBUTE_UNUSED)
>        symbol_get_obj (sym)->size = XNEW (expressionS);
>        *symbol_get_obj (sym)->size = exp;
>      }
> +
> +  /* If the symbol in the directive matches the current function being
> +     processed, indicate end of the current stream of ginsns.  */
> +  if (flag_synth_cfi
> +      && S_IS_FUNCTION (sym) && sym == ginsn_data_func_symbol ())
> +    ginsn_data_end (symbol_temp_new_now ());
> +
>    demand_empty_rest_of_line ();
>  }

Besides the restrictions mentioned, presence of this just in obj-elf.c
means non-ELF targets also won't be able to use this. You'll therefore
want to adjust the config/tc-i386.h such that e.g. COFF targets don't
needlessly have a large set of dead code compiled in.

> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -30,6 +30,7 @@
>  #include "subsegs.h"
>  #include "dwarf2dbg.h"
>  #include "dw2gencfi.h"
> +#include "scfi.h"
>  #include "gen-sframe.h"
>  #include "sframe.h"
>  #include "elf/x86-64.h"
> @@ -193,8 +194,11 @@ static unsigned int x86_isa_1_used;
>  static unsigned int x86_feature_2_used;
>  /* Generate x86 used ISA and feature properties.  */
>  static unsigned int x86_used_note = DEFAULT_X86_USED_NOTE;
> +
>  #endif

Nit: Stray change?

> +static ginsnS *ginsn_new (symbolS *sym, enum ginsn_gen_mode gmode);

I don't see the need for this: The function definition lives ahead of any
caller afaics.

> @@ -5116,6 +5120,716 @@ static INLINE bool may_need_pass2 (const insn_template *t)
>  	       && t->base_opcode == 0x63);
>  }
>  
> +bool
> +x86_scfi_callee_saved_p (uint32_t dw2reg_num)
> +{
> +  if (dw2reg_num == 3 /* rbx.  */
> +      || dw2reg_num == REG_FP /* rbp.  */
> +      || dw2reg_num == REG_SP /* rsp.  */
> +      || (dw2reg_num >= 12 && dw2reg_num <= 15) /* r12 - r15.  */)
> +    return true;

Non-GPRs aren't of interest here?

What about the Windows ABI, which is also used in the EFI world?

> +  return false;
> +}
> +
> +static uint32_t
> +ginsn_dw2_regnum (const reg_entry *ireg)
> +{
> +  /* PS: Note the data type here as int32_t, because of Dw2Inval (-1).  */
> +  int32_t dwarf_reg = Dw2Inval;
> +  const reg_entry *temp;
> +
> +  if (ireg->dw2_regnum[0] == Dw2Inval && ireg->dw2_regnum[1] == Dw2Inval)
> +    return dwarf_reg;
> +
> +  dwarf_reg = ireg->dw2_regnum[flag_code >> 1];

flag_code == CODE_64BIT would likely be more robust.

Also are all calls here synchronous, i.e. happening right when a
particular insn is processed? Otherwise what about flag_code changing
in between?

Plus in how far does it make sense to handle other than CODE_64BIT when
you tie yourself to the SysV 64-bit ABI?

> +  if (dwarf_reg == Dw2Inval)
> +    {
> +      temp = ireg + 16;
> +      dwarf_reg = ginsn_dw2_regnum (temp);
> +    }

What is this about? This clearly needs a comment, and it may also need
a comment in opcodes/i386-reg.tbl to clarify that certain ordering
requirements have to be retained. It also wants to be clarified that
the recursion here won't be indefinite.

> +  if (dwarf_reg == Dw2Inval)
> +    gas_assert (1); /* Needs to be addressed.  */

This is dead code (you appear to mean 0 instead of 1). Yet even then
still gas_assert (dwarf_reg != Dw2Inval) please.

> +  return (uint32_t) dwarf_reg;
> +}
> +
> +static void
> +ginsn_set_where (ginsnS* ginsn)

Nit: Misplaced *.

> +{
> +  const char *file;
> +  unsigned int line;
> +  file = as_where (&line);
> +  ginsn_set_file_line (ginsn, file, line);
> +}

This function, also from its name, isn't x86-specific, is it? As such
it wants to live in ginsn.[ch].

> +static ginsnS *
> +x86_ginsn_alu (i386_insn insn, symbolS *insn_end_sym)
> +{
> +  offsetT src_imm;
> +  uint32_t dw2_regnum;
> +  enum ginsn_src_type src_type;
> +  enum ginsn_dst_type dst_type;
> +  ginsnS *ginsn = NULL;
> +
> +  /* FIXME - create ginsn for REG_SP target only ? */
> +  /* Map for insn.tm.extension_opcode
> +     000 ADD    100 AND
> +     001 OR     101 SUB
> +     010 ADC    110 XOR
> +     011 SBB    111 CMP  */
> +
> +  /* add/sub imm, %reg.
> +     and imm, %reg only at this time for SCFI. */
> +  if (!(insn.tm.extension_opcode == 0
> +	|| insn.tm.extension_opcode == 4
> +	|| insn.tm.extension_opcode == 5))
> +    return ginsn;

Why is AND permitted, but OR/XOR aren't?

Also this is about ALU insns with immediate operands only, yet that
fact isn't reflected in the function name.

> +  /* TBD_GINSN_REPRESENTATION_LIMIT: There is no representation for when a
> +     symbol is used as an operand, like so:
> +	  addq    $simd_cmp_op+8, %rdx
> +     Skip generating any ginsn for this.  */
> +  if (insn.imm_operands == 1
> +      && insn.op[0].imms->X_op == O_symbol)
> +    return ginsn;
> +
> +  gas_assert (insn.imm_operands == 1
> +	      && insn.op[0].imms->X_op == O_constant);

Perhaps less fragile if you use != O_constant in the preceding if()?
The remaining half could the move ahead of that if(), allowing half
of its condition to also be dropped.

> +  src_imm = insn.op[0].imms->X_add_number;
> +  /* The second operand may be a register or indirect access.  */
> +  if (insn.mem_operands == 1 && insn.base_reg)
> +    {
> +      dw2_regnum = ginsn_dw2_regnum (insn.base_reg);
> +      src_type = GINSN_SRC_INDIRECT;
> +      dst_type = GINSN_DST_INDIRECT;

The possibly in use index register isn't of interest in this case?
Nor the displacement in the memory operand, ...

> +    }
> +  else if (insn.mem_operands == 1 && insn.index_reg)
> +    {
> +      dw2_regnum = ginsn_dw2_regnum (insn.index_reg);
> +      src_type = GINSN_SRC_INDIRECT;
> +      dst_type = GINSN_DST_INDIRECT;

... similarly applicable here? Nor a segment override?

> +    }
> +  else
> +    {
> +      gas_assert (insn.reg_operands == 1);

Afaict this will trigger when the memory operand has neither base
nor index.

> +      dw2_regnum = ginsn_dw2_regnum (insn.op[1].regs);
> +      src_type = GINSN_SRC_REG;
> +      dst_type = GINSN_DST_REG;
> +    }
> +
> +  /* For ginsn, keep the imm as second src operand.  */
> +  if (insn.tm.extension_opcode == 5)
> +    ginsn = ginsn_new_sub (insn_end_sym, true,
> +			   src_type, dw2_regnum, 0,
> +			   GINSN_SRC_IMM, 0, src_imm,
> +			   dst_type, dw2_regnum, 0);
> +  else if (insn.tm.extension_opcode == 4)
> +    ginsn = ginsn_new_and (insn_end_sym, true,
> +			   src_type, dw2_regnum, 0,
> +			   GINSN_SRC_IMM, 0, src_imm,
> +			   dst_type, dw2_regnum, 0);
> +  else if (insn.tm.extension_opcode == 0)
> +    ginsn = ginsn_new_add (insn_end_sym, true,
> +			   src_type, dw2_regnum, 0,
> +			   GINSN_SRC_IMM, 0, src_imm,
> +			   dst_type, dw2_regnum, 0);

I think this would benefit from setting a function pointer near the
top of the function. Else can this please at least be put in switch()
form?

> +  ginsn_set_where (ginsn);
> +
> +  return ginsn;
> +}
> +
> +static ginsnS *
> +x86_ginsn_move (i386_insn insn, symbolS *insn_end_sym)
> +{
> +  ginsnS *ginsn;
> +  uint16_t opcode;
> +  uint32_t dst_reg;
> +  uint32_t src_reg;
> +  offsetT dst_disp;
> +  offsetT src_disp;
> +  const reg_entry *dst = NULL;
> +  const reg_entry *src = NULL;
> +  enum ginsn_dst_type dst_type;
> +  enum ginsn_src_type src_type;
> +
> +  opcode = insn.tm.base_opcode;
> +  src_type = GINSN_SRC_REG;
> +  src_disp = dst_disp = 0;
> +  dst_type = GINSN_DST_REG;

Please can such be initializers of their variables?

> +  if (opcode == 0x8b)
> +    {
> +      /* mov  disp(%reg), %reg.  */
> +      if (insn.mem_operands && insn.base_reg)
> +	{
> +	  src = insn.base_reg;
> +	  if (insn.disp_operands == 1)
> +	    src_disp = insn.op[0].disps->X_add_number;
> +	  src_type = GINSN_SRC_INDIRECT;
> +	}
> +      else
> +	src = insn.op[0].regs;

What about disp(%reg,%reg) or disp(,%reg) kinds of memory operands?

> +      dst = insn.op[1].regs;
> +    }
> +  else if (opcode == 0x89 || opcode == 0x88)

How come 0x88 is handled here, but 0x8a isn't handled above?

> +    {
> +      /* mov %reg, disp(%reg).  */
> +      src = insn.op[0].regs;
> +      if (insn.mem_operands && insn.base_reg)
> +	{
> +	  dst = insn.base_reg;
> +	  if (insn.disp_operands == 1)
> +	    dst_disp = insn.op[1].disps->X_add_number;
> +	  dst_type = GINSN_DST_INDIRECT;
> +	}
> +      else
> +	dst = insn.op[1].regs;
> +    }
> +
> +  src_reg = ginsn_dw2_regnum (src);
> +  dst_reg = ginsn_dw2_regnum (dst);
> +
> +  ginsn = ginsn_new_mov (insn_end_sym, true,
> +			 src_type, src_reg, src_disp,
> +			 dst_type, dst_reg, dst_disp);
> +  ginsn_set_where (ginsn);
> +
> +  return ginsn;
> +}
> +
> +static ginsnS *
> +x86_ginsn_lea (i386_insn insn, symbolS *insn_end_sym)
> +{
> +  offsetT src_disp = 0;
> +  ginsnS *ginsn = NULL;
> +  uint32_t base_reg;
> +  uint32_t index_reg;
> +  offsetT index_scale;
> +  uint32_t dst_reg;
> +
> +  if (!insn.index_reg && !insn.base_reg)
> +    {
> +      /* lea symbol, %rN.  */
> +      dst_reg = ginsn_dw2_regnum (insn.op[1].regs);
> +      /* FIXME - Skip encoding information about the symbol.
> +	 This is TBD_GINSN_INFO_LOSS, but it is fine if the mode is
> +	 GINSN_GEN_SCFI.  */
> +      ginsn = ginsn_new_mov (insn_end_sym, false,
> +			     GINSN_SRC_IMM, 0xf /* arbitrary const.  */, 0,
> +			     GINSN_DST_REG, dst_reg, 0);
> +    }
> +  else if (insn.base_reg && !insn.index_reg)
> +    {
> +      /* lea    -0x2(%base),%dst.  */
> +      base_reg = ginsn_dw2_regnum (insn.base_reg);
> +      dst_reg = ginsn_dw2_regnum (insn.op[1].regs);
> +
> +      if (insn.disp_operands)
> +	src_disp = insn.op[0].disps->X_add_number;

What if the displacement expression is other than O_constant?

> +      if (src_disp)
> +	/* Generate an ADD ginsn.  */
> +	ginsn = ginsn_new_add (insn_end_sym, true,
> +			       GINSN_SRC_REG, base_reg, 0,
> +			       GINSN_SRC_IMM, 0, src_disp,
> +			       GINSN_DST_REG, dst_reg, 0);
> +      else
> +	  /* Generate a MOV ginsn.  */
> +	  ginsn = ginsn_new_mov (insn_end_sym, true,
> +				 GINSN_SRC_REG, base_reg, 0,
> +				 GINSN_DST_REG, dst_reg, 0);
> +    }
> +  else if (!insn.base_reg && insn.index_reg)
> +    {
> +      /* lea (,%index,imm), %dst.  */
> +      /* FIXME - Skip encoding an explicit multiply operation, instead use
> +	 GINSN_TYPE_OTHER.  This is TBD_GINSN_INFO_LOSS, but it is fine if
> +	 the mode is GINSN_GEN_SCFI.  */
> +      index_scale = insn.log2_scale_factor;
> +      index_reg = ginsn_dw2_regnum (insn.index_reg);
> +      dst_reg = ginsn_dw2_regnum (insn.op[1].regs);
> +      ginsn = ginsn_new_other (insn_end_sym, true,
> +			       GINSN_SRC_REG, index_reg,
> +			       GINSN_SRC_IMM, index_scale,
> +			       GINSN_DST_REG, dst_reg);
> +    }
> +  else
> +    {
> +      /* lea disp(%base,%index,imm) %dst.  */
> +      /* FIXME - Skip encoding information about the disp and imm for index
> +	 reg.  This is TBD_GINSN_INFO_LOSS, but it is fine if the mode is
> +	 GINSN_GEN_SCFI.  */
> +      base_reg = ginsn_dw2_regnum (insn.base_reg);
> +      index_reg = ginsn_dw2_regnum (insn.index_reg);
> +      dst_reg = ginsn_dw2_regnum (insn.op[1].regs);
> +      /* Generate an ADD ginsn.  */
> +      ginsn = ginsn_new_add (insn_end_sym, true,
> +			     GINSN_SRC_REG, base_reg, 0,
> +			     GINSN_SRC_REG, index_reg, 0,
> +			     GINSN_DST_REG, dst_reg, 0);

The comment mentions the displacement, but it's not passed on? In the earlier
"else if" block you also pay attention to the scla factor, but here you don't?
That said, I think I'm confused anyway about the FIXME comments.

> +    }
> +
> +  ginsn_set_where (ginsn);
> +
> +  return ginsn;
> +}
> +
> +static ginsnS *
> +x86_ginsn_jump (i386_insn insn, symbolS *insn_end_sym)
> +{
> +  ginsnS *ginsn = NULL;
> +  symbolS *src_symbol;
> +
> +  gas_assert (insn.disp_operands == 1);
> +
> +  if (insn.op[0].disps->X_op == O_symbol)
> +    {
> +      src_symbol = insn.op[0].disps->X_add_symbol;
> +      /* The jump target is expected to be a symbol with 0 addend.
> +	 Assert for now to see if this assumption is true.  */
> +      gas_assert (insn.op[0].disps->X_add_number == 0);

If you assert this means elsewhere this is being checked and a call
here avoided in case the value is non-zero. I can't spot such a
check, though.

> +      ginsn = ginsn_new_jump (insn_end_sym, true,
> +			      GINSN_SRC_SYMBOL, 0, src_symbol);
> +
> +      ginsn_set_where (ginsn);
> +    }
> +
> +  return ginsn;
> +}

Further, what about XABORT transferring control to what XBEGIN has
supplied? (XBEGIN can, in a sense, also be considered a [conditional]
branch.)

> +static ginsnS *
> +x86_ginsn_jump_cond (i386_insn insn, symbolS *insn_end_sym)
> +{
> +  ginsnS *ginsn = NULL;
> +  symbolS *src_symbol;
> +
> +  /* TBD_GINSN_GEN_NOT_SCFI: Ignore move to or from xmm reg for mode.  */
> +  if (i.tm.opcode_space == SPACE_0F)
> +    return ginsn;

What is the comment about? And what about SPACE_0F-encoded conditional
jumps (Jcc <disp32>)? And where are LOOP and J{E,R}CXZ handled?

> +  gas_assert (insn.disp_operands == 1);
> +
> +  if (insn.op[0].disps->X_op == O_symbol)
> +    {
> +      src_symbol = insn.op[0].disps->X_add_symbol;
> +      /* The jump target is expected to be a symbol with 0 addend.
> +	 Assert for now to see if this assumption is true.  */
> +      gas_assert (insn.op[0].disps->X_add_number == 0);
> +      ginsn = ginsn_new_jump_cond (insn_end_sym, true,
> +				   GINSN_SRC_SYMBOL, 0, src_symbol);
> +      ginsn_set_where (ginsn);
> +    }
> +  else
> +    /* Catch them for now so we know what we are dealing with.  */
> +    gas_assert (0);

I'm going from the assumption that this (and alike) will be addressed
before this series is committed?

> +  return ginsn;
> +}
> +
> +/* Generate one or more GAS instructions for the current machine dependent
> +   instruction.
> +
> +   Returns the head of linked list of ginsn(s) added, if success;
> +   Returns NULL if failure.  */
> +
> +static ginsnS *
> +ginsn_new (symbolS *insn_end_sym, enum ginsn_gen_mode gmode)

x86_ginsn_new()?

> +{
> +  uint16_t opcode;
> +  uint32_t dw2_regnum;
> +  uint32_t src2_dw2_regnum;
> +  int32_t gdisp = 0;
> +  ginsnS *ginsn = NULL;
> +  ginsnS *ginsn_next = NULL;
> +  ginsnS *ginsn_last = NULL;
> +
> +  /* FIXME - Need a way to check whether the decoding is sane.  The specific
> +     checks around i.tm.opcode_space were added as issues were seen.  Likely
> +     insufficient.  */

Furthermore opcode encoding space (SPACE_...) need to be taken into
account in all cases.

> +  /* Currently supports generation of selected ginsns, sufficient for
> +     the use-case of SCFI only.  To remove this condition will require
> +     work on this target-specific process of creation of ginsns.  Some
> +     of such places are tagged with TBD_GINSN_GEN_NOT_SCFI to serve as
> +     examples.  */
> +  if (gmode != GINSN_GEN_SCFI)
> +    return ginsn;
> +
> +  opcode = i.tm.base_opcode;
> +
> +  switch (opcode)
> +    {
> +    case 0x1:
> +      /* add reg, reg.  */
> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);

You don't care about opcode 0 (byte operation). Then what about 16-bit
operand size? Or, since we're talking of a 64-bit-ABI-only feature,
even 32-bit operand size?

Also what about opcode 0x3?

> +      if (i.reg_operands == 2)
> +	{
> +	  src2_dw2_regnum = ginsn_dw2_regnum (i.op[1].regs);
> +	  ginsn = ginsn_new_add (insn_end_sym, true,
> +				 GINSN_SRC_REG, dw2_regnum, 0,
> +				 GINSN_SRC_REG, src2_dw2_regnum, 0,
> +				 GINSN_DST_REG, src2_dw2_regnum, 0);
> +	  ginsn_set_where (ginsn);
> +	}
> +      else if (i.mem_operands && i.base_reg)
> +	{
> +	  src2_dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +	  if (i.disp_operands == 1)
> +	    gdisp = i.op[1].disps->X_add_number;
> +
> +	  ginsn = ginsn_new_add (insn_end_sym, true,
> +				 GINSN_SRC_REG, dw2_regnum, 0,
> +				 GINSN_SRC_INDIRECT, src2_dw2_regnum, gdisp,
> +				 GINSN_DST_INDIRECT, src2_dw2_regnum, gdisp);
> +	  ginsn_set_where (ginsn);
> +	}
> +      else
> +	/* Catch them for now so we know what we are dealing with.  */
> +	gas_assert (0);
> +
> +      break;
> +    case 0x29:
> +      /* If opcode_space == SPACE_0F, this is a movaps insn.  Skip it
> +	 for GINSN_GEN_SCFI.  */
> +      if (i.tm.opcode_space == SPACE_0F)
> +	break;

Extending on the earlier related comment: Why would you exclude just SPACE_0F?

> +      /* sub reg, reg.  */
> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +
> +      if (i.reg_operands == 2)
> +	{
> +	  src2_dw2_regnum = ginsn_dw2_regnum (i.op[1].regs);
> +	  ginsn = ginsn_new_sub (insn_end_sym, true,
> +				 GINSN_SRC_REG, dw2_regnum, 0,
> +				 GINSN_SRC_REG, src2_dw2_regnum, 0,
> +				 GINSN_DST_REG, src2_dw2_regnum, 0);
> +	  ginsn_set_where (ginsn);
> +	}
> +      else if (i.mem_operands && i.base_reg)
> +	{
> +	  src2_dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +	  if (i.disp_operands == 1)
> +	    gdisp = i.op[1].disps->X_add_number;
> +
> +	  ginsn = ginsn_new_sub (insn_end_sym, true,
> +				 GINSN_SRC_REG, dw2_regnum, 0,
> +				 GINSN_SRC_INDIRECT, src2_dw2_regnum, gdisp,
> +				 GINSN_DST_INDIRECT, src2_dw2_regnum, gdisp);
> +	  ginsn_set_where (ginsn);
> +	}
> +      else
> +	/* Catch them for now so we know what we are dealing with.  */
> +	gas_assert (0);
> +
> +      break;
> +    case 0xa0:
> +    case 0xa8:

Since individual case blocks are non-trivial, can there please be blank
lines between any two non-fall-through case blocks?

> +      /* If opcode_space != SPACE_0F, this is a test insn.  Skip it
> +	 for GINSN_GEN_SCFI.  */
> +      if (i.tm.opcode_space != SPACE_0F)
> +	break;

While here you properly use !=, now the comment is wrong.

> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +      /* push fs / push gs.  */
> +      ginsn = ginsn_new_sub (insn_end_sym, false,
> +			     GINSN_SRC_REG, REG_SP, 0,
> +			     GINSN_SRC_IMM, 0, 8,

Note how the 8 here assumes flag_code == CODE_64BIT. (You further assume
no operand size override here.)

> +			     GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_store (insn_end_sym, false,
> +				    GINSN_SRC_REG, dw2_regnum,
> +				    GINSN_DST_INDIRECT, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0xa1:
> +    case 0xa9:
> +      /* If opcode_space != SPACE_0F, this is test insn.  Skip it
> +	 for GINSN_GEN_SCFI.  */
> +      if (i.tm.opcode_space != SPACE_0F)
> +	break;
> +
> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +      /* pop fs / pop gs.  */
> +      ginsn = ginsn_new_load (insn_end_sym, false,
> +			      GINSN_SRC_INDIRECT, REG_SP, 0,
> +			      GINSN_DST_REG, dw2_regnum);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_add (insn_end_sym, false,
> +				  GINSN_SRC_REG, REG_SP, 0,
> +				  GINSN_SRC_IMM, 0, 8,
> +				  GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0x50 ... 0x57:
> +      /* push reg.  */
> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +      ginsn = ginsn_new_sub (insn_end_sym, false,
> +			     GINSN_SRC_REG, REG_SP, 0,
> +			     GINSN_SRC_IMM, 0, 8,
> +			     GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_store (insn_end_sym, false,
> +				    GINSN_SRC_REG, dw2_regnum,
> +				    GINSN_DST_INDIRECT, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0x58 ... 0x5f:
> +      if (i.tm.opcode_space != SPACE_BASE)
> +	break;
> +      /* pop reg.  */
> +      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +      ginsn = ginsn_new_load (insn_end_sym, false,
> +			      GINSN_SRC_INDIRECT, REG_SP, 0,
> +			      GINSN_DST_REG, dw2_regnum);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_add (insn_end_sym, false,
> +				  GINSN_SRC_REG, REG_SP, 0,
> +				  GINSN_SRC_IMM, 0, 8,
> +				  GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0x68:
> +    case 0x6a:
> +      /* push imm. */
> +      /* Skip getting the value of imm from machine instruction
> +	 because for ginsn generation this is not important.  */

In (e.g.) kernel code model it is possible to do

	push	<address>
	pop	%rbp

(could even be %rsp). How is such a frame (or stack) pointer change
not relevant?

> +      ginsn = ginsn_new_sub (insn_end_sym, false,
> +			     GINSN_SRC_REG, REG_SP, 0,
> +			     GINSN_SRC_IMM, 0, 8,
> +			     GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_store (insn_end_sym, false,
> +				    GINSN_SRC_IMM, 0,
> +				    GINSN_DST_INDIRECT, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0x70 ... 0x7f:
> +      ginsn = x86_ginsn_jump_cond (i, insn_end_sym);
> +      break;
> +    case 0x81:
> +    case 0x83:
> +      ginsn = x86_ginsn_alu (i, insn_end_sym);
> +      break;
> +    case 0x8b:
> +      /* Move r/m64 to r64.  */
> +    case 0x88:
> +    case 0x89:
> +      /* mov reg, reg/mem.  */
> +      ginsn = x86_ginsn_move (i, insn_end_sym);
> +      break;
> +    case 0x8d:
> +      /* lea disp(%src), %dst */
> +      ginsn = x86_ginsn_lea (i, insn_end_sym);
> +      break;
> +    case 0x8f:
> +      /* pop to mem.  */
> +      gas_assert (i.base_reg);

This opcode can in principle also have a register operand. We
_currently_ have no way for a user to request to use this
alternative encoding, but that's still a latent issue. In any
event, like elsewhere, all memory operand forms are possible
here, not just (%reg).

> +      dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +      ginsn = ginsn_new_load (insn_end_sym, false,
> +			      GINSN_SRC_INDIRECT, REG_SP, 0,
> +			      GINSN_DST_INDIRECT, dw2_regnum);

When both operands are "indirect", what's the difference between
move, load, and store? IOW if the above is permitted, can't all
three be folded into a single ginsn kind?

> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_add (insn_end_sym, false,
> +				  GINSN_SRC_REG, REG_SP, 0,
> +				  GINSN_SRC_IMM, 0, 8,
> +				  GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      break;
> +    case 0x9c:
> +      /* pushf / pushfd / pushfq.
> +	 Tracking EFLAGS register by number is not necessary.  */
> +      ginsn = ginsn_new_sub (insn_end_sym, false,
> +			     GINSN_SRC_REG, REG_SP, 0,
> +			     GINSN_SRC_IMM, 0, 8,

Why does the comment mention PUSHFD when you assume PUSHFQ here?

> +			     GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn);
> +
> +      ginsn_next = ginsn_new_store (insn_end_sym, false,
> +				    GINSN_SRC_IMM, 0,
> +				    GINSN_DST_INDIRECT, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +
> +      break;

Where's POPFD?

> +    case 0xff:
> +      /* push from mem.  */
> +      if (i.tm.extension_opcode == 6)
> +	{
> +	  ginsn = ginsn_new_sub (insn_end_sym, false,
> +				 GINSN_SRC_REG, REG_SP, 0,
> +				 GINSN_SRC_IMM, 0, 8,
> +				 GINSN_DST_REG, REG_SP, 0);
> +	  ginsn_set_where (ginsn);
> +
> +	  /* These instructions have no imm, only indirect access.  */
> +	  gas_assert (i.base_reg);
> +	  dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +	  ginsn_next = ginsn_new_store (insn_end_sym, false,
> +					GINSN_SRC_INDIRECT, dw2_regnum,
> +					GINSN_DST_INDIRECT, REG_SP, 0);
> +	  ginsn_set_where (ginsn_next);
> +
> +	  gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +	}
> +      else if (i.tm.extension_opcode == 4)
> +	{
> +	  /* jmp r/m.  E.g., notrack jmp *%rax.  */
> +	  if (i.reg_operands)
> +	    {
> +	      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +	      ginsn = ginsn_new_jump (insn_end_sym, true,
> +				      GINSN_SRC_REG, dw2_regnum, NULL);
> +	      ginsn_set_where (ginsn);
> +	    }
> +	  else if (i.mem_operands && i.index_reg)
> +	    {
> +	      /* jmp    *0x0(,%rax,8).  */
> +	      dw2_regnum = ginsn_dw2_regnum (i.index_reg);
> +	      ginsn = ginsn_new_jump (insn_end_sym, true,
> +				      GINSN_SRC_REG, dw2_regnum, NULL);
> +	      ginsn_set_where (ginsn);
> +	    }
> +	  else if (i.mem_operands && i.base_reg)
> +	    {
> +	      dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +	      ginsn = ginsn_new_jump (insn_end_sym, true,
> +				      GINSN_SRC_REG, dw2_regnum, NULL);
> +	      ginsn_set_where (ginsn);
> +	    }
> +	  else
> +	    /* Catch them for now so we know what we are dealing with.  */
> +	    gas_assert (0);
> +	}
> +      else if (i.tm.extension_opcode == 2)
> +	{
> +	  /* 0xFF /2 (call).  */
> +	  if (i.reg_operands)
> +	    {
> +	      dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);
> +	      ginsn = ginsn_new_call (insn_end_sym, true,
> +				      GINSN_SRC_REG, dw2_regnum, NULL);
> +	      ginsn_set_where (ginsn);
> +	    }
> +	  else if (i.mem_operands && i.base_reg)
> +	    {
> +	      dw2_regnum = ginsn_dw2_regnum (i.base_reg);
> +	      ginsn = ginsn_new_call (insn_end_sym, true,
> +				      GINSN_SRC_REG, dw2_regnum, NULL);
> +	      ginsn_set_where (ginsn);
> +	    }
> +	  else
> +	    /* Catch them for now so we know what we are dealing with.  */
> +	    gas_assert (0);
> +	}
> +      else
> +	/* Catch them for now so we know what we are dealing with.  */
> +	gas_assert (0);
> +      break;
> +    case 0xc2:
> +    case 0xc3:
> +      /* Near ret.  */
> +      ginsn = ginsn_new_return (insn_end_sym, true);
> +      ginsn_set_where (ginsn);
> +      break;

Where did the immediate operand of 0xC2 go? And what about far return
(and more generally far branches)?

> +    case 0xc9:
> +      /* The 'leave' instruction copies the contents of the RBP register
> +	 into the RSP register to release all stack space allocated to the
> +	 procedure.  */
> +      ginsn = ginsn_new_mov (insn_end_sym, false,
> +			     GINSN_SRC_REG, REG_FP, 0,
> +			     GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn);
> +
> +      /* Then it restores the old value of the RBP register from the stack.  */
> +      ginsn_next = ginsn_new_load (insn_end_sym, false,
> +				   GINSN_SRC_INDIRECT, REG_SP, 0,
> +				   GINSN_DST_REG, REG_FP);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn, ginsn_next));
> +      ginsn_last = ginsn_new_add (insn_end_sym, false,
> +				  GINSN_SRC_REG, REG_SP, 0,
> +				  GINSN_SRC_IMM, 0, 8,
> +				  GINSN_DST_REG, REG_SP, 0);
> +      ginsn_set_where (ginsn_next);
> +
> +      gas_assert (!ginsn_link_next (ginsn_next, ginsn_last));
> +      break;

You handle LEAVE, but not ENTER?

> +    case 0xe8:
> +      /* PS: SCFI machinery does not care about which func is being
> +	 called.  OK to skip that info.  */
> +      ginsn = ginsn_new_call (insn_end_sym, true,
> +			      GINSN_SRC_SYMBOL, 0, NULL);
> +      ginsn_set_where (ginsn);
> +      break;
> +    case 0xe9:
> +    case 0xeb:
> +      /* If opcode_space == SPACE_0F, this is a psubw por insn.  Skip it
> +	 for GINSN_GEN_SCFI.  */
> +      if (i.tm.opcode_space == SPACE_0F)
> +	break;
> +
> +      /* Unconditional jmp.  */
> +      ginsn = x86_ginsn_jump (i, insn_end_sym);
> +      ginsn_set_where (ginsn);
> +      break;
> +      /* Fall Through.  */
> +    default:

There's noo fall-through here.

> +      /* TBD_GINSN_GEN_NOT_SCFI: Keep a warning, for now, to find out about
> +	 possibly missed instructions affecting REG_SP or REG_FP.  These
> +	 checks may not be completely exhaustive as they do not involve
> +	 index / base reg.  */

Cases with early "break" should probably come here as well.

> +      if (i.op[0].regs)
> +	{
> +	  dw2_regnum = ginsn_dw2_regnum (i.op[0].regs);

This isn't valid without first having checked that operand 0 is a
register operand: "regs" is a member of a union, and you may not
pass expressionS * into this function.

> +	  if (dw2_regnum == REG_SP || dw2_regnum == REG_FP)
> +	    as_warn_where (last_insn.file, last_insn.line,
> +			   _("SCFI: unhandled op 0x%x may cause incorrect CFI"),
> +			   i.tm.base_opcode);
> +	}
> +      if (i.op[1].regs)
> +	{
> +	  dw2_regnum = ginsn_dw2_regnum (i.op[1].regs);
> +	  if (dw2_regnum == REG_SP || dw2_regnum == REG_FP)
> +	    as_warn_where (last_insn.file, last_insn.line,
> +			   _("SCFI: unhandled op 0x%x may cause incorrect CFI"),
> +			   i.tm.base_opcode);
> +	}

What about insns with 3 GPR operands?

Also, why do you use as_warn_where() here? You're still in the
context of synchonously processing the current input line.

> +      /* Keep an eye on other instructions affecting control flow.  */
> +      gas_assert (!i.tm.opcode_modifier.jump);

Again this cannot remain like this, even if here there's no FIXME
remark.

> +      /* TBD_GINSN_GEN_NOT_SCFI: Skip all other opcodes uninteresting for
> +	 GINSN_GEN_SCFI mode.  */

"uninteresting" is an interesting term when dealing with hand-written
assembly. How do you know what an assembly writer is going to do?
While the patch description mentions a few constraints, having come
here I'm under the impression that there are quite a few more of them.
Maybe I simply didn't spot where all caveats are fully spelled out?

> @@ -5128,6 +5842,7 @@ md_assemble (char *line)
>    const char *end, *pass1_mnem = NULL;
>    enum i386_error pass1_err = 0;
>    const insn_template *t;
> +  ginsnS *ginsn;
>  
>    /* Initialize globals.  */
>    current_templates = NULL;
> @@ -5659,6 +6374,13 @@ md_assemble (char *line)
>    /* We are ready to output the insn.  */
>    output_insn ();
>  
> +  /* At this time, SCFI is enabled only for AMD64 ABI.  */
> +  if (flag_synth_cfi && x86_elf_abi == X86_64_ABI)
> +    {
> +      ginsn = ginsn_new (symbol_temp_new_now (), frch_ginsn_gen_mode ());
> +      frch_ginsn_data_append (ginsn);
> +    }

Why would this not work for the x32 ABI?

> @@ -10904,6 +11626,7 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
>    valueT val;
>    bool vex = false, xop = false, evex = false;
>    static const templates tt = { &i.tm, &i.tm + 1 };
> +  ginsnS *ginsn;
>  
>    init_globals ();
>  
> @@ -11658,7 +12381,14 @@ s_insn (int dummy ATTRIBUTE_UNUSED)
>  
>    output_insn ();
>  
> - done:
> +  /* At this time, SCFI is enabled only for AMD64 ABI.  */
> +  if (flag_synth_cfi && x86_elf_abi == X86_64_ABI)
> +    {
> +      ginsn = ginsn_new (symbol_temp_new_now (), frch_ginsn_gen_mode ());
> +      frch_ginsn_data_append (ginsn);
> +    }
> +
> +done:

Please can you leave label indentation alone?

> @@ -15293,6 +16023,9 @@ i386_target_format (void)
>    else
>      as_fatal (_("unknown architecture"));
>  
> +  if (flag_synth_cfi && x86_elf_abi != X86_64_ABI)
> +    as_fatal (_("Synthesizing CFI is not supported for this ABI"));

With this, what use are the x86_elf_abi checks in md_assemble() and
s_insn()?

> --- a/gas/config/tc-i386.h
> +++ b/gas/config/tc-i386.h
> @@ -373,6 +373,27 @@ extern int i386_elf_section_type (const char *, size_t);
>  extern void i386_solaris_fix_up_eh_frame (segT);
>  #endif
>  
> +#define TARGET_USE_GINSN 1
> +/* Allow GAS to synthesize DWARF CFI for hand-written asm.
> +   PS: TARGET_USE_CFIPOP is a pre-condition.  */
> +#define TARGET_USE_SCFI 1
> +/* Identify the maximum DWARF register number of all the registers being
> +   tracked for SCFI.  This is the last DWARF register number of the set
> +   of SP, BP, and all callee-saved registers.  For AMD64, this means
> +   R15 (15).  Use SCFI_CALLEE_SAVED_REG_P to identify which registers
> +   are callee-saved from this set.  */
> +#define SCFI_NUM_REGS 15

Is this "number of" (and then having a value of 16 to permit covering
register 15), or is this "maximum register number" (and then needing a
different name)?

> --- /dev/null
> +++ b/gas/ginsn.c

I'll see about replying on the non-x86 parts of this separately.

Jan

next prev parent reply	other threads:[~2023-10-31 14:10 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 16:51 [PATCH, V2 00/10] Synthesize " Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 01/10] gas: dw2gencfi: minor rejig for cfi_sections_set and all_cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 02/10] gas: dw2gencfi: use all_cfi_sections instead of cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 03/10] gas: dw2gencfi: expose a new cfi_set_last_fde API Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 04/10] gas: dw2gencfi: move some tc_* defines to the header file Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 05/10] gas: add new command line option --scfi[=all,none] Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 06/10] gas: scfidw2gen: new functionality to prepapre for SCFI Indu Bhagat
2023-10-31 11:28   ` Jan Beulich
2023-10-31 22:06     ` Indu Bhagat
2023-11-02 10:35       ` Jan Beulich
2023-11-02 16:32         ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm Indu Bhagat
2023-10-31 14:10   ` Jan Beulich [this message]
2023-11-02  8:15     ` Indu Bhagat
2023-11-02 11:39       ` Jan Beulich
2023-12-10 10:22         ` Indu Bhagat
2023-12-11  7:59           ` Jan Beulich
2023-12-19  7:07             ` Indu Bhagat
2023-11-02 15:53   ` Jan Beulich
2023-11-04  7:29     ` Indu Bhagat
2023-11-06 11:03       ` Jan Beulich
2023-12-10 10:23         ` Indu Bhagat
2023-12-11  8:02           ` Jan Beulich
2023-10-30 16:51 ` [PATCH, V2 08/10] gas: doc: update documentation for the new listing option Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 09/10] gas: testsuite: add a x86_64 testsuite for SCFI Indu Bhagat
2023-10-31 16:13   ` Jan Beulich
2023-11-01  6:24     ` Indu Bhagat
2023-11-02 12:28       ` Jan Beulich
2023-11-03  5:45         ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 10/10] gas/NEWS: announce the new command line option Indu Bhagat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7a60721-4716-bfe1-76db-0909523cbc64@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=indu.bhagat@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).