From: Indu Bhagat <indu.bhagat@oracle.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm
Date: Sat, 4 Nov 2023 00:29:17 -0700 [thread overview]
Message-ID: <47dec0fe-492e-43ec-8636-d33f6653d8f9@oracle.com> (raw)
In-Reply-To: <ce890459-be9e-20ad-5769-8fc53db55c04@suse.com>
On 11/2/23 08:53, Jan Beulich wrote:
> On 30.10.2023 17:51, Indu Bhagat wrote:
>> --- /dev/null
>> +++ b/gas/ginsn.c
>> @@ -0,0 +1,1225 @@
>> +/* ginsn.h - GAS instruction representation.
>> + Copyright (C) 2023 Free Software Foundation, Inc.
>> +
>> + This file is part of GAS, the GNU Assembler.
>> +
>> + GAS is free software; you can redistribute it and/or modify
>> + it under the terms of the GNU General Public License as published by
>> + the Free Software Foundation; either version 3, or (at your option)
>> + any later version.
>> +
>> + GAS is distributed in the hope that it will be useful,
>> + but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + GNU General Public License for more details.
>> +
>> + You should have received a copy of the GNU General Public License
>> + along with GAS; see the file COPYING. If not, write to the Free
>> + Software Foundation, 51 Franklin Street - Fifth Floor, Boston, MA
>> + 02110-1301, USA. */
>> +
>> +#include "as.h"
>> +#include "subsegs.h"
>> +#include "ginsn.h"
>> +#include "scfi.h"
>> +
>> +#ifdef TARGET_USE_GINSN
>> +
>> +const char *const ginsn_type_names[] =
>
> This looks like it wants to be static.
>
OK.
>> +{
>> +#define _GINSN_TYPE_ITEM(NAME, STR) STR,
>> + _GINSN_TYPES
>> +#undef _GINSN_TYPE_ITEM
>> +};
>> +
>> +const char *const ginsn_src_type_names[] =
>
> This and ...
>
>> +{
>> +#define _GINSN_SRC_TYPE_ITEM(NAME, STR) STR,
>> + _GINSN_SRC_TYPES
>> +#undef _GINSN_SRC_TYPE_ITEM
>> +};
>> +const char *const ginsn_dst_type_names[] =
>
> ... this also look so, but then they're also unused inside this CU,
> so when making them static the compiler would complain. If they're
> needed by a later patch, perhaps they should be moved there? And
> if entirely unused, perhaps they should both be deleted (or at least
> commented / #ifdef-ed out)?
>
Remains of some previous implementation that I discarded. I will remove
these.
> With the many symbols further down it's also hard to spot whether any
> of them should perhaps also be static.
>
>> +{
>> +#define _GINSN_DST_TYPE_ITEM(NAME, STR) STR,
>> + _GINSN_DST_TYPES
>> +#undef _GINSN_DST_TYPE_ITEM
>> +};
>> +
>> +static
>> +ginsnS *ginsn_alloc (void)
>> +{
>> + ginsnS *ginsn = XCNEW (ginsnS);
>> + return ginsn;
>> +}
>> +
>> +static ginsnS*
>> +ginsn_init (enum ginsn_type type, symbolS *sym, bool real_p)
>> +{
>> + ginsnS *ginsn = ginsn_alloc ();
>> + ginsn->type = type;
>> + ginsn->sym = sym;
>
> Is a symbol hanging off of a ginsn ever intended to be altered? If
> not, field and function argument would want to be pointer-to-const.
>
No. This symbol is not intended to be altered.
However, using const symbolS* will cause complaints of "passing argument
1 of ‘S_GET_NAME’ discards ‘const’ qualifier" etc. Calling most of the
S_XXX (symbolS *sym) will need some type casting etc, but that will
somewhat defeat the purpose?
>> + if (real_p)
>> + ginsn->flags |= GINSN_F_INSN_REAL;
>> + return ginsn;
>> +}
>> +
>> +static void
>> +ginsn_cleanup (ginsnS **ginsnp)
>> +{
>> + ginsnS *ginsn;
>> +
>> + if (!ginsnp || !*ginsnp)
>> + return;
>> +
>> + ginsn = *ginsnp;
>> + if (ginsn->scfi_ops)
>> + {
>> + scfi_ops_cleanup (ginsn->scfi_ops);
>> + ginsn->scfi_ops = NULL;
>> + }
>> +
>> + free (ginsn);
>> + ginsn = NULL;
>
> This doesn't do anything useful. Did you perhaps mean to set *ginsnp to NULL,
> and then also ahead of calling free()?
>
Thanks. Yes I meant to do *ginsnp = NULL;
>> +}
>> +
>> +static void
>> +ginsn_set_src (struct ginsn_src *src, enum ginsn_src_type type, uint32_t reg,
>> + int32_t immdisp)
>
> I find the use of fixed-width types suspicious here: For reg, likely it wants
> to be unsigned int. For immdisp it's less clear: Immediates can be wider than
> 32 bits, and they may also be either signed ot unsigned. From an abstract
> perspective, assuming the immediate value actually is used for anything, I'd
> expect offsetT to be used, following struct expressionS' X_add_number.
>
Thanks, I will consider trying offsetT for immediate. It is a more
appropriate data type than int32_t. For reg, why is uint32_t less
appropriate than unsigned int ?
>> +{
>> + if (!src)
>> + return;
>> +
>> + src->type = type;
>> + /* Even when the use-case is SCFI, the value of reg may be > SCFI_NUM_REGS.
>> + E.g., in AMD64, push fs etc. */
>> + src->reg = reg;
>> +
>> + if (type == GINSN_SRC_IMM || type == GINSN_SRC_INDIRECT)
>> + src->immdisp = immdisp;
>
> What's the point of the conditional around the assignment?
>
Stale conditional. I earlier had some data structures with union etc.
which I have since gotten rid of.
>> +ginsnS *
>> +ginsn_new_add (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> + enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_ADD, sym, real_p);
>> + /* src info. */
>> + ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> + ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> + /* dst info. */
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> + return ginsn;
>> +}
>> +
>> +ginsnS *
>> +ginsn_new_and (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> + enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_AND, sym, real_p);
>> + /* src info. */
>> + ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> + ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> + return ginsn;
>> +}
>
> The comments in the two functions don't look overly useful to me. But if
> you have them, please have them be consistent across similar functions.
>
OK.
>> +ginsnS *
>> +ginsn_new_mov (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src_type, uint32_t src_reg, int32_t src_disp,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_MOV, sym, real_p);
>> + /* src info. */
>> + ginsn_set_src (&ginsn->src[0], src_type, src_reg, src_disp);
>> + /* dst info. */
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> + return ginsn;
>> +}
>
> As indicated before, if both src and dst can be indirect here, ...
>
>> +ginsnS *
>> +ginsn_new_store (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src_type, uint32_t src_reg,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_STORE, sym, real_p);
>> + /* src info. */
>> + ginsn_set_src (&ginsn->src[0], src_type, src_reg, 0);
>> + /* dst info. */
>> + gas_assert (dst_type == GINSN_DST_INDIRECT);
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> + return ginsn;
>> +}
>> +
>> +ginsnS *
>> +ginsn_new_load (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src_type, uint32_t src_reg, int32_t src_disp,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_LOAD, sym, real_p);
>> + /* src info. */
>> + gas_assert (src_type == GINSN_SRC_INDIRECT);
>> + ginsn_set_src (&ginsn->src[0], src_type, src_reg, src_disp);
>> + /* dst info. */
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, 0);
>> +
>> + return ginsn;
>> +}
>
> ... I can't see what these are needed for.
>
For x86, they may not seem necessary. But for other architectures, or
say for future uses-cases, we may need them. I think it is more
meaningful (and readable) to see a LOAD/STORE/MOV ginsn for a machine
instruction of the same type. For other RISC-like ISAs, it is clearer
to have separate MOV/LOAD/STORE instructions.
ginsn is meant to provide an infrastructure for other uses cases that
may crop up later.
>> +ginsnS *
>> +ginsn_new_sub (symbolS *sym, bool real_p,
>> + enum ginsn_src_type src1_type, uint32_t src1_val, int32_t src1_disp,
>> + enum ginsn_src_type src2_type, uint32_t src2_val, int32_t src2_disp,
>> + enum ginsn_dst_type dst_type, uint32_t dst_reg, int32_t dst_disp)
>> +{
>> + ginsnS *ginsn = ginsn_init (GINSN_TYPE_SUB, sym, real_p);
>> + /* src info. */
>> + ginsn_set_src (&ginsn->src[0], src1_type, src1_val, src1_disp);
>> + ginsn_set_src (&ginsn->src[1], src2_type, src2_val, src2_disp);
>> + /* dst info. */
>> + ginsn_set_dst (&ginsn->dst, dst_type, dst_reg, dst_disp);
>> +
>> + return ginsn;
>> +}
>> +
>> +/* PS: Note this API does not identify the displacement values of
>> + src1/src2/dst. At this time, it is unnecessary for correctness to support
>> + the additional argument. */
>
> And why do the displacements need tracking for add, and, and sub?
>
Hmm, I think its just a bad choice of arg name. The value passed in
src1_disp / src2_disp is actually the immediates in case add/sub/and. I
will update it.
>> +static bool
>> +ginsn_indirect_jump_p (ginsnS *ginsn)
>> +{
>> + bool ret_p = false;
>> + if (!ginsn)
>> + return ret_p;
>> +
>> + ret_p = (ginsn->type == GINSN_TYPE_JUMP
>> + && ginsn->src[0].type == GINSN_SRC_REG);
>
> On x86 an indirect jump can also have a memory source operand (i.e. what
> you call "indirect").
>
IIRC, This is another case of acceptable loss of information in ginsns.
I will double check this too when I get to fixing up the tc-i386.c to
address those review comments for the ginsn creation process.
>> +static bool
>> +ginsn_direct_local_jump_p (ginsnS *ginsn)
>> +{
>> + bool ret_p = false;
>> + if (!ginsn)
>> + return ret_p;
>> +
>> + ret_p |= (ginsn->type == GINSN_TYPE_JUMP
>> + && ginsn->src[0].type == GINSN_SRC_SYMBOL
>> + && S_IS_LOCAL (ginsn->src[0].sym));
>
> This looks fragile: You may not yet have seen the .globl or .weak directive.
>
Hmm. I think the usage of S_IS_LOCAL is incorrect here. Further, I
already have a defined label -> ginsn map per function block, and the
checks with S_IS_LOCAL is not necessary really. Will take a second look.
>> +static char*
>> +ginsn_src_print (struct ginsn_src *src)
>> +{
>> + size_t len = 39;
>> + char *src_str = XNEWVEC (char, len);
>> +
>> + memset (src_str, 0, len);
>> +
>> + if (src->type == GINSN_SRC_REG)
>> + {
>> + char *buf = XNEWVEC (char, 32);
>> + sprintf (buf, "%%r%d, ", ginsn_get_src_reg (src));
>> + strcat (src_str, buf);
>> + }
>> + else if (src->type == GINSN_SRC_IMM)
>> + {
>> + char *buf = XNEWVEC (char, 32);
>> + sprintf (buf, "%d, ", ginsn_get_src_imm (src));
>
> Here and below, if you stick to int32_t as the type, this wants to use
> PRId32.
>
OK.
>> +static const char*
>> +ginsn_type_sym_begin_end_print (ginsnS *ginsn)
>> +{
>> + int id = 0;
>> + const char *ginsn_sym_strs[]
>> + = { "", "FUNC_BEGIN", "FUNC_END" };
>
> static and with a 2nd const?
>
OK.
>> +static char*
>> +ginsn_print (ginsnS *ginsn)
>> +{
>> + struct ginsn_src *src;
>> + struct ginsn_dst *dst;
>> + size_t len = GINSN_LISTING_LEN;
>> + char *ginsn_str = XNEWVEC (char, len);
>> +
>> + memset (ginsn_str, 0, len);
>> +
>> + strcpy (ginsn_str, "ginsn: ");
>> +
>> + strcat (ginsn_str, ginsn_type_names[ginsn->type]);
>> + strcat (ginsn_str, " ");
>> +
>> + /* For some ginsn types, no further information is printed for now. */
>> + if (ginsn->type == GINSN_TYPE_CALL
>> + || ginsn->type == GINSN_TYPE_RETURN
>> + || ginsn->type == GINSN_TYPE_OTHER)
>> + goto end;
>> + else if (ginsn->type == GINSN_TYPE_SYMBOL)
>> + {
>> + if (GINSN_F_USER_LABEL_P (ginsn))
>> + strncat (ginsn_str, S_GET_NAME (ginsn->sym), len - 10);
>> + else
>> + strcat (ginsn_str, ginsn_type_sym_begin_end_print (ginsn));
>> + goto end;
>> + }
>> +
>> + /* src 1. */
>> + src = ginsn_get_src1 (ginsn);
>> + strcat (ginsn_str, ginsn_src_print (src));
>> +
>> + /* src 2. */
>> + src = ginsn_get_src2 (ginsn);
>> + strcat (ginsn_str, ginsn_src_print (src));
>> +
>> + /* dst. */
>> + dst = ginsn_get_dst (ginsn);
>> + strcat (ginsn_str, ginsn_dst_print (dst));
>> +
>> +end:
>> + gas_assert (strlen (ginsn_str) < GINSN_LISTING_LEN);
>
> This check comes too late - if it triggers, you've corrupted memory
> already. I wonder anyway why you construct all this "by hand", rather
> than using snprintf().
>
I will update this to use snprintf instead.
Thanks for the review.
Indu
next prev parent reply other threads:[~2023-11-04 7:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 16:51 [PATCH, V2 00/10] Synthesize " Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 01/10] gas: dw2gencfi: minor rejig for cfi_sections_set and all_cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 02/10] gas: dw2gencfi: use all_cfi_sections instead of cfi_sections Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 03/10] gas: dw2gencfi: expose a new cfi_set_last_fde API Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 04/10] gas: dw2gencfi: move some tc_* defines to the header file Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 05/10] gas: add new command line option --scfi[=all,none] Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 06/10] gas: scfidw2gen: new functionality to prepapre for SCFI Indu Bhagat
2023-10-31 11:28 ` Jan Beulich
2023-10-31 22:06 ` Indu Bhagat
2023-11-02 10:35 ` Jan Beulich
2023-11-02 16:32 ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm Indu Bhagat
2023-10-31 14:10 ` Jan Beulich
2023-11-02 8:15 ` Indu Bhagat
2023-11-02 11:39 ` Jan Beulich
2023-12-10 10:22 ` Indu Bhagat
2023-12-11 7:59 ` Jan Beulich
2023-12-19 7:07 ` Indu Bhagat
2023-11-02 15:53 ` Jan Beulich
2023-11-04 7:29 ` Indu Bhagat [this message]
2023-11-06 11:03 ` Jan Beulich
2023-12-10 10:23 ` Indu Bhagat
2023-12-11 8:02 ` Jan Beulich
2023-10-30 16:51 ` [PATCH, V2 08/10] gas: doc: update documentation for the new listing option Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 09/10] gas: testsuite: add a x86_64 testsuite for SCFI Indu Bhagat
2023-10-31 16:13 ` Jan Beulich
2023-11-01 6:24 ` Indu Bhagat
2023-11-02 12:28 ` Jan Beulich
2023-11-03 5:45 ` Indu Bhagat
2023-10-30 16:51 ` [PATCH, V2 10/10] gas/NEWS: announce the new command line option Indu Bhagat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47dec0fe-492e-43ec-8636-d33f6653d8f9@oracle.com \
--to=indu.bhagat@oracle.com \
--cc=binutils@sourceware.org \
--cc=jbeulich@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).