From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com
Subject: Re: [PATCH] Using sub-scalars mode to move struct block
Date: Mon, 14 Nov 2022 17:53:58 +0800 [thread overview]
Message-ID: <7eleodsvyh.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2211140719340.3995@jbgna.fhfr.qr> (Richard Biener's message of "Mon, 14 Nov 2022 07:23:08 +0000 (UTC)")
Hi!
Thanks for your helpful comments/sugguestions!
Richard Biener <rguenther@suse.de> writes:
> On Mon, 14 Nov 2022, Jiufu Guo wrote:
>
>>
>> Hi!
>> Thanks so much for your review!
>>
>> Richard Biener <rguenther@suse.de> writes:
>>
>> > On Fri, 11 Nov 2022, Jiufu Guo wrote:
>> >
>> >> Hi,
>> >>
>> >> When assigning a struct parameter to another variable, or loading a
>> >> memory block to a struct var (especially for return value),
>> >> Now, "block move" would be used during expand the assignment. And
>> >> the "block move" may use a type/mode different from the mode which
>> >> is accessing the var. e.g. on ppc64le, V2DI would be used to move
>> >> the block of 16bytes.
>> >>
>> >> And then, this "block move" would prevent optimization passes from
>> >> leaping/crossing over the assignment. PR65421 reflects this issue.
>> >>
>> >> As the example code in PR65421.
>> >>
>> >> typedef struct { double a[4]; } A;
>> >> A foo (const A *a) { return *a; }
>> >>
>> >> On ppc64le, the below instructions are used for the "block move":
>> >> 7: r122:V2DI=[r121:DI]
>> >> 8: r124:V2DI=[r121:DI+r123:DI]
>> >> 9: [r112:DI]=r122:V2DI
>> >> 10: [r112:DI+0x10]=r124:V2DI
>> >>
>> >> For this issue, a few comments/suggestions are mentioned via RFC:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >> I drafted a patch which is updating the behavior of block_move for
>> >> struct type. This patch is simple to work with, a few ideas in the
>> >> comments are not put into this patch. I would submit this
>> >> patch first.
>> >>
>> >> The idea is trying to use sub-modes(scalar) for the "block move".
>> >> And the sub-modes would align with the access patterns of the
>> >> struct members and usages on parameter/return value.
>> >> The major benefits of this change would be raising more
>> >> opportunities for other optimization passes(cse/dse/xprop).
>> >>
>> >> The suitable mode would be target specified and relates to ABI,
>> >> this patch introduces a target hook. And in this patch, the hook
>> >> is implemented on rs6000.
>> >>
>> >> In this patch, the hook would be just using heuristic modes for all
>> >> struct block moving. And the hook would not check if the "block move"
>> >> is about parameters or return value or other uses.
>> >>
>> >> For the rs6000 implementation of this hook, it is able to use
>> >> DF/DI/TD/.. modes for the struct block movement. The sub-modes
>> >> would be the same as the mode when the struct type is on parameter or
>> >> return value.
>> >>
>> >> Bootstrapped and regtested on ppc64/ppc64le.
>> >> Is this ok for trunk?
>> >>
>> >>
>> >> BR,
>> >> Jeff(Jiufu)
>> >>
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> * config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
>> >> (submode_for_struct_block_move): New function. Called from
>> >> rs600_block_move_for_struct.
>> >> (rs600_block_move_for_struct): New function.
>> >> * doc/tm.texi: Regenerate.
>> >> * doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
>> >> * expr.cc (store_expr): Call block_move_for_struct.
>> >> * target.def (block_move_for_struct): New hook.
>> >> * targhooks.cc (default_block_move_for_struct): New function.
>> >> * targhooks.h (default_block_move_for_struct): New Prototype.
>> >>
>> >> ---
>> >> gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
>> >> gcc/doc/tm.texi | 6 +++++
>> >> gcc/doc/tm.texi.in | 2 ++
>> >> gcc/expr.cc | 14 +++++++++---
>> >> gcc/target.def | 10 +++++++++
>> >> gcc/targhooks.cc | 7 ++++++
>> >> gcc/targhooks.h | 1 +
>> >> 7 files changed, 81 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >> index a85d7630b41..e14cecba0ef 100644
>> >> --- a/gcc/config/rs6000/rs6000.cc
>> >> +++ b/gcc/config/rs6000/rs6000.cc
>> >> @@ -1758,6 +1758,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>> >> #undef TARGET_NEED_IPA_FN_TARGET_INFO
>> >> #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>> >>
>> >> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
>> >> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
>> >> +
>> >> #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>> >> #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> >> \f
>> >> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>> >> return gen_rtx_REG (mode, regno);
>> >> }
>> >>
>> >> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
>> >> + would be used to move the struct. */
>> >> +static machine_mode
>> >> +submode_for_struct_block_move (tree type)
>> >> +{
>> >> + gcc_assert (TREE_CODE (type) == RECORD_TYPE);
>> >> +
>> >> + /* The sub mode may not be the field's type of the struct.
>> >> + It would be fine to use the mode as if the type is used as a function
>> >> + parameter or return value. For example: DF for "{double a[4];}", and
>> >> + DI for "{doubel a[3]; long l;}".
>> >> + Here, using the mode as if it is function return type. */
>> >> + rtx val = rs6000_function_value (type, NULL, 0);
>> >> + return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
>> >> + : word_mode;
>> >> +}
>> >> +
>> >> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook. */
>> >> +static void
>> >> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> >> +{
>> >> + machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
>> >> + int mode_size = GET_MODE_SIZE (mode);
>> >> + int size = UINTVAL (expr_size (exp));
>> >> + if (size < mode_size || (size % mode_size) != 0 || size > 64)
>> >> + {
>> >> + default_block_move_for_struct (x, y, exp, method);
>> >> + return;
>> >> + }
>> >> +
>> >> + int len = size / mode_size;
>> >> + for (int i = 0; i < len; i++)
>> >> + {
>> >> + rtx temp = gen_reg_rtx (mode);
>> >> + rtx src = adjust_address (y, mode, mode_size * i);
>> >> + rtx dest = adjust_address (x, mode, mode_size * i);
>> >> + emit_move_insn (temp, src);
>> >> + emit_move_insn (dest, temp);
>> >> + }
>> >> +}
>> >> +
>> >> /* Define how to find the value returned by a library function
>> >> assuming the value has mode MODE. */
>> >> rtx
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 8572313b308..c8a1c1f30cf 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -1380,6 +1380,12 @@ retain the field's mode.
>> >> Normally, this is not needed.
>> >> @end deftypefn
>> >>
>> >> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
>> >> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
>> >> +type. @var{method} is the method if this function invokes block_move.
>> >> +The default definition invokes block_move.
>> >> +@end deftypefn
>> >> +
>> >> @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>> >> Define this macro as an expression for the alignment of a type (given
>> >> by @var{type} as a tree node) if the alignment computed in the usual
>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> >> index 986e8f0da09..f0f525a2008 100644
>> >> --- a/gcc/doc/tm.texi.in
>> >> +++ b/gcc/doc/tm.texi.in
>> >> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure.
>> >>
>> >> @hook TARGET_MEMBER_TYPE_FORCES_BLK
>> >>
>> >> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT
>> >> +
>> >> @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>> >> Define this macro as an expression for the alignment of a type (given
>> >> by @var{type} as a tree node) if the alignment computed in the usual
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index c6917fbf7bd..734dc07a76b 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p,
>> >> emit_group_store (target, temp, TREE_TYPE (exp),
>> >> int_size_in_bytes (TREE_TYPE (exp)));
>> >> else if (GET_MODE (temp) == BLKmode)
>> >> - emit_block_move (target, temp, expr_size (exp),
>> >> - (call_param_p
>> >> - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> >> + {
>> >> + if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
>> >> + targetm.block_move_for_struct (target, temp, exp,
>> >> + call_param_p ? BLOCK_OP_CALL_PARM
>> >> + : BLOCK_OP_NORMAL);
>> >
>> > I think it's only sensible to do something about the call_param_p case,
>> > all other cases are GIGO. And I don't see a good reason to add a new
>> > target hook since function parameter passing info is readily
>> > available.
>> This patch enhances the normal assignments on struct variables (
>> expand_assignment-->store_expr, where the call_param_p is 0). Because
>> the "struct block move/copy" is generated for this case too.
>>
>> For example, the struct block_move is generated.
>> # .MEM_3 = VDEF <.MEM_1(D)>
>> D.3956 = *a_2(D);
>> For this stmt, "D.3956" is a struct variable, and "a_2(D)" points to a
>> block of the struct type.
>>
>> If we only care about parameter/arg/retval(s), we would not need a
>> hook, as you said, since parm/retval contains the information about
>> the target/ABI. And for more as you suggested in the previous emails,
>> we could do SRA-like analyze to check if an assignment relates to
>> parm/retval.
>
> But the issue with applying a stmt-local heuristic is that it's going
> to be as many times wrong as it is right so there inevitably will be
> regressions.
Understant. :-) In the patch, the heurisitic is aplied to all
assignment on struct (no matter parm/return related). The generated
binary would be changed definitly: mem move(via vector loads/stores)
for struct type becomes through sub-modes scalar loads/stores.
The binary change may cause some difference(e.g. performance change).
Even I had spec2017 tests on some machines(P9 and P10), it is sitll
possible has regressions.
>
> Also copies not directly involving parameters or returns can be optimized
> on the GIMPLE level just fine (see what SRA does), the exception is
> cpymem optab implementations but those you short-circuit as well here.
Yes, agree. So, enhancing the stmt/assignment about param/retval would
fix the issue (with low regression risks).
>
>> The current patch is simple relatively, it uses 'scalar-modes' to move
>> blocks for all struct-type assignments. I feel this would be enough,
>> and I'm wondering if we need to check parm/retval.
>
> It's only for the param/retval that RTL expansion has more information
> than GIMPLE and in that specific case the code generation is very
> difficult to untie and thus a change like this looks appropriate.
Understant, I will update patch to handle more about param/retval.
Thanks again!
BR,
Jeff (Jiufu)
>
> Thanks,
> Richard.
>
>>
>> >
>> > The question is whether this is best handled up in the call chain where
>> > the parameter setup is done or below here (or in emit_block_move).
>> As above discussed, this patch touches 'block move' for struct type,
>> including copy from 'parameter stack block' to a variable block.
>> Oh, assigning from 'parameter regs' to 'stack block' is not changed
>> here, it is done through "assign_parms-...->emit_group_xx".
>>
>> If any misunderstanding, or any comments/concerns, please point them
>> out. Thanks a lot!
>>
>> BR,
>> Jeff (Jiufu)
>>
>> >
>> > Richard.
>> >
>> >> + else
>> >> + emit_block_move (target, temp, expr_size (exp),
>> >> + (call_param_p ? BLOCK_OP_CALL_PARM
>> >> + : BLOCK_OP_NORMAL));
>> >> + }
>> >> +
>> >> /* If we emit a nontemporal store, there is nothing else to do. */
>> >> else if (nontemporal && emit_storent_insn (target, temp))
>> >> ;
>> >> diff --git a/gcc/target.def b/gcc/target.def
>> >> index 25f94c19fa7..e141f72a8a3 100644
>> >> --- a/gcc/target.def
>> >> +++ b/gcc/target.def
>> >> @@ -5584,6 +5584,16 @@ Normally, this is not needed.",
>> >> bool, (const_tree field, machine_mode mode),
>> >> default_member_type_forces_blk)
>> >>
>> >> +
>> >> +/* Move block for structure type. */
>> >> +DEFHOOK
>> >> +(block_move_for_struct,
>> >> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
>> >> +type. @var{method} is the method if this function invokes block_move.\n\
>> >> +The default definition invokes block_move.",
>> >> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
>> >> + default_block_move_for_struct)
>> >> +
>> >> /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
>> >> that gate the divod transform. */
>> >> DEFHOOK
>> >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
>> >> index 12a58456b39..2b96c348419 100644
>> >> --- a/gcc/targhooks.cc
>> >> +++ b/gcc/targhooks.cc
>> >> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, machine_mode)
>> >> return false;
>> >> }
>> >>
>> >> +/* Default version of block_move_for_struct. */
>> >> +void
>> >> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> >> +{
>> >> + emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
>> >> +}
>> >> +
>> >> /* Default version of canonicalize_comparison. */
>> >>
>> >> void
>> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
>> >> index a6a423c1abb..c284a35ee28 100644
>> >> --- a/gcc/targhooks.h
>> >> +++ b/gcc/targhooks.h
>> >> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const char*);
>> >>
>> >> extern scalar_int_mode default_cstore_mode (enum insn_code);
>> >> extern bool default_member_type_forces_blk (const_tree, machine_mode);
>> >> +extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
>> >> extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
>> >> extern tree build_va_arg_indirect_ref (tree);
>> >> extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
>> >>
>>
prev parent reply other threads:[~2022-11-14 9:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 8:25 Jiufu Guo
2022-11-11 9:40 ` Richard Biener
2022-11-14 3:51 ` Jiufu Guo
2022-11-14 7:23 ` Richard Biener
2022-11-14 9:53 ` Jiufu Guo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7eleodsvyh.fsf@pike.rch.stglabs.ibm.com \
--to=guojiufu@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=linkw@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).