public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Indu Bhagat <indu.bhagat@oracle.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm
Date: Sat, 4 Nov 2023 00:29:17 -0700	[thread overview]
Message-ID: <47dec0fe-492e-43ec-8636-d33f6653d8f9@oracle.com> (raw)
In-Reply-To: <ce890459-be9e-20ad-5769-8fc53db55c04@suse.com>

On 11/2/23 08:53, Jan Beulich wrote:
> On 30.10.2023 17:51, Indu Bhagat wrote:
>> --- /dev/null
>> +++ b/gas/ginsn.c
>> @@ -0,0 +1,1225 @@
>> +/* ginsn.h - GAS instruction representation.
>> +   Copyright (C) 2023 Free Software Foundation, Inc.
>> +
>> +   This file is part of GAS, the GNU Assembler.
>> +
>> +   GAS is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3, or (at your option)
>> +   any later version.
>> +
>> +   GAS is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with GAS; see the file COPYING.  If not, write to the Free
>> +   Software Foundation, 51 Franklin Street - Fifth Floor, Boston, MA
>> +   02110-1301, USA.  */
>> +
>> +#include "as.h"
>> +#include "subsegs.h"
>> +#include "ginsn.h"
>> +#include "scfi.h"
>> +
>> +#ifdef TARGET_USE_GINSN
>> +
>> +const char *const ginsn_type_names[] =
> 
> This looks like it wants to be static.
> 

OK.

>> +{
>> +#define _GINSN_TYPE_ITEM(NAME, STR) STR,
>> +  _GINSN_TYPES
>> +#undef _GINSN_TYPE_ITEM
>> +};
>> +
>> +const char *const ginsn_src_type_names[] =
> 
> This and ...
> 
>> +{
>> +#define _GINSN_SRC_TYPE_ITEM(NAME, STR) STR,
>> +  _GINSN_SRC_TYPES
>> +#undef _GINSN_SRC_TYPE_ITEM
>> +};
>> +const char *const ginsn_dst_type_names[] =
> 
> ... this also look so, but then they're also unused inside this CU,
> so when making them static the compiler would complain. If they're
> needed by a later patch, perhaps they should be moved there? And
> if entirely unused, perhaps they should both be deleted (or at least
> commented / #ifdef-ed out)?
> 

Remains of some previous implementation that I discarded. I will remove 
these.

> With the many symbols further down it's also hard to spot whether any
> of them should perhaps also be static.
> 
>> +{
>> +#define _GINSN_DST_TYPE_ITEM(NAME, STR) STR,
>> +  _GINSN_DST_TYPES
>> +#undef _GINSN_DST_TYPE_ITEM
>> +};
>> +
>> +static
>> +ginsnS *ginsn_alloc (void)
>> +{
>> +  ginsnS *ginsn = XCNEW (ginsnS);
>> +  return ginsn;
>> +}
>> +
>> +static ginsnS*
>> +ginsn_init (enum ginsn_type type, symbolS *sym, bool real_p)
>> +{
>> +  ginsnS *ginsn = ginsn_alloc ();
>> +  ginsn->type = type;
>> +  ginsn->sym = sym;
> 
> Is a symbol hanging off of a ginsn ever intended to be altered? If
> not, field and function argument would want to be pointer-to-const.
> 

No. This symbol is not intended to be altered.

However, using const symbolS* will cause complaints of "passing argument 
1 of ‘S_GET_NAME’ discards ‘const’ qualifier" etc.  Calling most of the 
S_XXX (symbolS *sym) will need some type casting etc, but that will 
somewhat defeat the purpose?

>> +  if (real_p)
>> +    ginsn->flags |= GINSN_F_INSN_REAL;
>> +  return ginsn;
>> +}
>> +
>> +static void
>> +ginsn_cleanup (ginsnS **ginsnp)
>> +{
>> +  ginsnS *ginsn;
>> +
>> +  if (!ginsnp || !*ginsnp)
>> +    return;
>> +
>> +  ginsn = *ginsnp;
>> +  if (ginsn->scfi_ops)
>> +    {
>> +      scfi_ops_cleanup (ginsn->scfi_ops);
>> +      ginsn->scfi_ops = NULL;
>> +    }
>> +
>> +  free (ginsn);
>> +  ginsn = NULL;
> 
> This doesn't do anything useful. Did you perhaps mean to set *ginsnp to NULL,
> and then also ahead of calling free()?
> 

Thanks. Yes I meant to do *ginsnp = NULL;

>> +}
>> +
>> +static void
>> +ginsn_set_src (struct ginsn_src *src, enum ginsn_src_type type, uint32_t reg,
>> +	       int32_t immdisp)
> 
> I find the use of fixed-width types suspicious here: For reg, likely it wants
> to be unsigned int. For immdisp it's less clear: Immediates can be wider than
> 32 bits, and they may also be either signed ot unsigned. From an abstract
> perspective, assuming the immediate value actually is used for anything, I'd
> expect offsetT to be used, following struct expressionS' X_add_number.
> 

Thanks, I will consider trying offsetT for immediate. It is a more 
appropriate data type than int32_t.  For reg, why is uint32_t less 
appropriate than unsigned int ?

>> +{
>> +  if (!src)
>> +    return;
>> +
>> +  src->type = type;
>> +  /* Even when the use-case is SCFI, the value of reg may be > SCFI_NUM_REGS.
>> +     E.g., in AMD64, push fs etc.  */
>> +  src->reg = reg;
>> +
>> +  if (type == GINSN_SRC_IMM || type == GINSN_SRC_INDIRECT)
>> +    src->immdisp = immdisp;
> 
> What's the point of the conditional around the assignment?
> 

Stale conditional. I earlier had some data structures with union etc. 
which I have since gotten rid of.

>> +ginsnS *
>> +ginsn_new_add (symbolS *sym, bool real_p,
>> +	       enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> +	       enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> +	       enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_ADD, sym, real_p);
>> +  /* src info.  */
>> +  ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> +  ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> +  /* dst info.  */
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> +  return ginsn;
>> +}
>> +
>> +ginsnS *
>> +ginsn_new_and (symbolS *sym, bool real_p,
>> +	       enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> +	       enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> +	       enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_AND, sym, real_p);
>> +  /* src info.  */
>> +  ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> +  ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> +  return ginsn;
>> +}
> 
> The comments in the two functions don't look overly useful to me. But if
> you have them, please have them be consistent across similar functions.
> 

OK.

>> +ginsnS *
>> +ginsn_new_mov (symbolS *sym, bool real_p,
>> +	       enum ginsn_src_type src_type, uint32_t src_reg, int32_t src_disp,
>> +	       enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_MOV, sym, real_p);
>> +  /* src info.  */
>> +  ginsn_set_src (&ginsn->src[0], src_type, src_reg, src_disp);
>> +  /* dst info.  */
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> +  return ginsn;
>> +}
> 
> As indicated before, if both src and dst can be indirect here, ...
> 
>> +ginsnS *
>> +ginsn_new_store (symbolS *sym, bool real_p,
>> +		 enum ginsn_src_type src_type, uint32_t src_reg,
>> +		 enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_STORE, sym, real_p);
>> +  /* src info.  */
>> +  ginsn_set_src (&ginsn->src[0], src_type, src_reg, 0);
>> +  /* dst info.  */
>> +  gas_assert (dst_type == GINSN_DST_INDIRECT);
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> +  return ginsn;
>> +}
>> +
>> +ginsnS *
>> +ginsn_new_load (symbolS *sym, bool real_p,
>> +		enum ginsn_src_type src_type, uint32_t src_reg, int32_t src_disp,
>> +		enum ginsn_dst_type dst_type, uint32_t dst_reg)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_LOAD, sym, real_p);
>> +  /* src info.  */
>> +  gas_assert (src_type == GINSN_SRC_INDIRECT);
>> +  ginsn_set_src (&ginsn->src[0], src_type, src_reg, src_disp);
>> +  /* dst info.  */
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, 0);
>> +
>> +  return ginsn;
>> +}
> 
> ... I can't see what these are needed for.
> 

For x86, they may not seem necessary. But for other architectures, or 
say for future uses-cases, we may need them. I think it is more 
meaningful (and readable) to see a LOAD/STORE/MOV ginsn for a machine 
instruction of the same type.  For other RISC-like ISAs, it is clearer 
to have separate MOV/LOAD/STORE instructions.

ginsn is meant to provide an infrastructure for other uses cases that 
may crop up later.

>> +ginsnS *
>> +ginsn_new_sub (symbolS *sym, bool real_p,
>> +	       enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> +	       enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> +	       enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> +  ginsnS *ginsn = ginsn_init (GINSN_TYPE_SUB, sym, real_p);
>> +  /* src info.  */
>> +  ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> +  ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> +  /* dst info.  */
>> +  ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> +  return ginsn;
>> +}
>> +
>> +/* PS: Note this API does not identify the displacement values of
>> +   src1/src2/dst.  At this time, it is unnecessary for correctness to support
>> +   the additional argument.  */
> 
> And why do the displacements need tracking for add, and, and sub?
> 

Hmm, I think its just a bad choice of arg name.  The value passed in 
src1_disp / src2_disp is actually the immediates in case add/sub/and. I 
will update it.

>> +static bool
>> +ginsn_indirect_jump_p (ginsnS *ginsn)
>> +{
>> +  bool ret_p = false;
>> +  if (!ginsn)
>> +    return ret_p;
>> +
>> +  ret_p = (ginsn->type == GINSN_TYPE_JUMP
>> +	   && ginsn->src[0].type == GINSN_SRC_REG);
> 
> On x86 an indirect jump can also have a memory source operand (i.e. what
> you call "indirect").
> 

IIRC, This is another case of acceptable loss of information in ginsns. 
I will double check this too when I get to fixing up the tc-i386.c to 
address those review comments for the ginsn creation process.

>> +static bool
>> +ginsn_direct_local_jump_p (ginsnS *ginsn)
>> +{
>> +  bool ret_p = false;
>> +  if (!ginsn)
>> +    return ret_p;
>> +
>> +  ret_p |= (ginsn->type == GINSN_TYPE_JUMP
>> +	    && ginsn->src[0].type == GINSN_SRC_SYMBOL
>> +	    && S_IS_LOCAL (ginsn->src[0].sym));
> 
> This looks fragile: You may not yet have seen the .globl or .weak directive.
> 

Hmm. I think the usage of S_IS_LOCAL is incorrect here. Further, I 
already have a defined label -> ginsn map per function block, and the 
checks with S_IS_LOCAL is not necessary really.  Will take a second look.

>> +static char*
>> +ginsn_src_print (struct ginsn_src *src)
>> +{
>> +  size_t len = 39;
>> +  char *src_str = XNEWVEC (char, len);
>> +
>> +  memset (src_str, 0, len);
>> +
>> +  if (src->type == GINSN_SRC_REG)
>> +    {
>> +      char *buf = XNEWVEC (char, 32);
>> +      sprintf (buf, "%%r%d, ", ginsn_get_src_reg (src));
>> +      strcat (src_str, buf);
>> +    }
>> +  else if (src->type == GINSN_SRC_IMM)
>> +    {
>> +      char *buf = XNEWVEC (char, 32);
>> +      sprintf (buf, "%d, ", ginsn_get_src_imm (src));
> 
> Here and below, if you stick to int32_t as the type, this wants to use
> PRId32.
> 

OK.

>> +static const char*
>> +ginsn_type_sym_begin_end_print (ginsnS *ginsn)
>> +{
>> +  int id = 0;
>> +  const char *ginsn_sym_strs[]
>> +    = { "", "FUNC_BEGIN", "FUNC_END" };
> 
> static and with a 2nd const?
> 

OK.

>> +static char*
>> +ginsn_print (ginsnS *ginsn)
>> +{
>> +  struct ginsn_src *src;
>> +  struct ginsn_dst *dst;
>> +  size_t len = GINSN_LISTING_LEN;
>> +  char *ginsn_str = XNEWVEC (char, len);
>> +
>> +  memset (ginsn_str, 0, len);
>> +
>> +  strcpy (ginsn_str, "ginsn: ");
>> +
>> +  strcat (ginsn_str, ginsn_type_names[ginsn->type]);
>> +  strcat (ginsn_str, " ");
>> +
>> +  /* For some ginsn types, no further information is printed for now.  */
>> +  if (ginsn->type == GINSN_TYPE_CALL
>> +      || ginsn->type == GINSN_TYPE_RETURN
>> +      || ginsn->type == GINSN_TYPE_OTHER)
>> +    goto end;
>> +  else if (ginsn->type == GINSN_TYPE_SYMBOL)
>> +    {
>> +      if (GINSN_F_USER_LABEL_P (ginsn))
>> +	strncat (ginsn_str, S_GET_NAME (ginsn->sym), len - 10);
>> +      else
>> +	strcat (ginsn_str, ginsn_type_sym_begin_end_print (ginsn));
>> +      goto end;
>> +    }
>> +
>> +  /* src 1.  */
>> +  src = ginsn_get_src1 (ginsn);
>> +  strcat (ginsn_str, ginsn_src_print (src));
>> +
>> +  /* src 2.  */
>> +  src = ginsn_get_src2 (ginsn);
>> +  strcat (ginsn_str, ginsn_src_print (src));
>> +
>> +  /* dst.  */
>> +  dst = ginsn_get_dst (ginsn);
>> +  strcat (ginsn_str, ginsn_dst_print (dst));
>> +
>> +end:
>> +  gas_assert (strlen (ginsn_str) < GINSN_LISTING_LEN);
> 
> This check comes too late - if it triggers, you've corrupted memory
> already. I wonder anyway why you construct all this "by hand", rather
> than using snprintf().
> 

I will update this to use snprintf instead.

Thanks for the review.
Indu


  reply	other threads:[~2023-11-04  7:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 16:51 [PATCH, V2 00/10] Synthesize " Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 01/10] gas: dw2gencfi: minor rejig for cfi_sections_set and all_cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 02/10] gas: dw2gencfi: use all_cfi_sections instead of cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 03/10] gas: dw2gencfi: expose a new cfi_set_last_fde API Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 04/10] gas: dw2gencfi: move some tc_* defines to the header file Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 05/10] gas: add new command line option --scfi[=all,none] Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 06/10] gas: scfidw2gen: new functionality to prepapre for SCFI Indu Bhagat
2023-10-31 11:28   ` Jan Beulich
2023-10-31 22:06     ` Indu Bhagat
2023-11-02 10:35       ` Jan Beulich
2023-11-02 16:32         ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm Indu Bhagat
2023-10-31 14:10   ` Jan Beulich
2023-11-02  8:15     ` Indu Bhagat
2023-11-02 11:39       ` Jan Beulich
2023-12-10 10:22         ` Indu Bhagat
2023-12-11  7:59           ` Jan Beulich
2023-12-19  7:07             ` Indu Bhagat
2023-11-02 15:53   ` Jan Beulich
2023-11-04  7:29     ` Indu Bhagat [this message]
2023-11-06 11:03       ` Jan Beulich
2023-12-10 10:23         ` Indu Bhagat
2023-12-11  8:02           ` Jan Beulich
2023-10-30 16:51 ` [PATCH, V2 08/10] gas: doc: update documentation for the new listing option Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 09/10] gas: testsuite: add a x86_64 testsuite for SCFI Indu Bhagat
2023-10-31 16:13   ` Jan Beulich
2023-11-01  6:24     ` Indu Bhagat
2023-11-02 12:28       ` Jan Beulich
2023-11-03  5:45         ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 10/10] gas/NEWS: announce the new command line option Indu Bhagat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47dec0fe-492e-43ec-8636-d33f6653d8f9@oracle.com \
    --to=indu.bhagat@oracle.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).