public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Jeff Law <jeffreyalaw@gmail.com>,
	gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
	dje.gcc@gmail.com, linkw@gcc.gnu.org
Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter
Date: Fri, 25 Nov 2022 13:05:43 +0800	[thread overview]
Message-ID: <7ewn7jr5co.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2211240720130.3995@jbgna.fhfr.qr> (Richard Biener's message of "Thu, 24 Nov 2022 07:31:15 +0000 (UTC)")


Hi Richard,

Thanks a lot for your comments!

Richard Biener <rguenther@suse.de> writes:

> On Wed, 23 Nov 2022, Jiufu Guo wrote:
>
>> Hi Jeff,
>> 
>> Thanks a lot for your comments!
>
> Sorry for the late response ...
>
>> Jeff Law <jeffreyalaw@gmail.com> writes:
>> 
>> > On 11/20/22 20:07, Jiufu Guo wrote:
>> >> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>> >>
>> >>> Hi,
>> >>>
>> >>> As mentioned in the previous version patch:
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >>> The suboptimal code is generated for "assigning from parameter" or
>> >>> "assigning to return value".
>> >>> This patch enhances the assignment from parameters like the below
>> >>> cases:
>> >>> /////case1.c
>> >>> typedef struct SA {double a[3];long l; } A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> ////case2.c
>> >>> typedef struct SA {double a[3];} A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> For this patch, bootstrap and regtest pass on ppc64{,le}
>> >>> and x86_64.
>> >>> * Besides asking for help reviewing this patch, I would like to
>> >>> consult comments about enhancing for "assigning to returns".
>> >> I updated the patch to fix the issue for returns.  This patch
>> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
>> >> by a return stmt.  This patch fix the issue in expand pass only,
>> >> so, we would try to update the patch to avoid this flag.
>> >>
>> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> >> index dd29ffffc03..09b8ec64cea 100644
>> >> --- a/gcc/cfgexpand.cc
>> >> +++ b/gcc/cfgexpand.cc
>> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
>> >>       frame_phase = off ? align - off : 0;
>> >>     }
>> >>   +  /* Collect VARs on returns.  */
>> >> +  if (DECL_RESULT (current_function_decl))
>> >> +    {
>> >> +      edge_iterator ei;
>> >> +      edge e;
>> >> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> >> +	if (greturn *ret = safe_dyn_cast<greturn *> (last_stmt (e->src)))
>> >> +	  {
>> >> +	    tree val = gimple_return_retval (ret);
>> >> +	    if (val && VAR_P (val))
>> >> +	      DECL_USEDBY_RETURN_P (val) = 1;
>
> you probably want to check && auto_var_in_fn (val, ...) since val
> might be global?
Since we are collecting the return vals, it would be built in function
gimplify_return_expr.  In this function, create_tmp_reg is used and
a local temp.  So it would not be a global var here.

code piece in gimplify_return_expr:
  if (!result_decl)
    result = NULL_TREE;
  else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl)))
    {
    ....
      result = result_decl;
    }
  else if (gimplify_ctxp->return_temp)
    result = gimplify_ctxp->return_temp;
  else
    {
      result = create_tmp_reg (TREE_TYPE (result_decl));

Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns
false for target like ppc64le, because result of "hard_function_value"
is a "rtx with PARALLELL code".
And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a
reg here. it built a DECL_VAR with RECORD type and BLK mode).

I also tried the way to use RESULT_DECL for this kind of type, or
let aggregate_value_p accept this kind of type.  But it seems not easy,
because "<retval> (RESULT_DECL with PARALLEL)" is not ok for address
operations.


>
>> >> +	  }
>> >> +    }
>> >> +
>> >>     /* Set TREE_USED on all variables in the local_decls.  */
>> >>     FOR_EACH_LOCAL_DECL (cfun, i, var)
>> >>       TREE_USED (var) = 1;
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index d9407432ea5..20973649963 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal)
>> >>         return;
>> >>       }
>
> I miss an explanatory comment here on that the following is heuristics
> and its reasoning.
>
>> >>   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
>> >> +       && TYPE_MODE (TREE_TYPE (from)) == BLKmode
>
> Why check TYPE_MODE here?  Do you want AGGREGATE_TYPE_P on the type
> instead?
Checking BLK, because I want make sure the param should occur on
register and stack (saved from register).
Actualy, my intention is checking:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode

For code:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode
&& (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
    || REG_P (DECL_INCOMING_RTL (from)))
This checking indicates if the param may be passing via 2 or more
registers.

Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more
readable. I would have a test.

>
>> >> +       && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>> >> +	   || REG_P (DECL_INCOMING_RTL (from))))
>> >> +      || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
>> >> +	  && TYPE_MODE (TREE_TYPE (to)) == BLKmode
>
> Likewise.
>
>> >> +	  && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
>> >> +	       == PARALLEL))
>
> Not REG_P here?
REG_P with BLK on return would means return in memory, instead return
through registers, so, REG_P was not added here, and let it use
orignal behavior.
>
>> >> +    {
>> >> +      push_temp_slots ();
>> >> +      rtx par_ret;
>> >> +      machine_mode mode;
>> >> +      par_ret = TREE_CODE (from) == PARM_DECL
>> >> +		  ? DECL_INCOMING_RTL (from)
>> >> +		  : DECL_RTL (DECL_RESULT (current_function_decl));
>> >> +      mode = GET_CODE (par_ret) == PARALLEL
>> >> +	       ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0))
>> >> +	       : word_mode;
>> >> +      int mode_size = GET_MODE_SIZE (mode).to_constant ();
>> >> +      int size = INTVAL (expr_size (from));
>> >> +
>> >> +      /* If/How the parameter using submode, it dependes on the size and
>> >> +	 position of the parameter.  Here using heurisitic number.  */
>> >> +      int hurstc_num = 8;
>> >
>> > Where did this come from and what does it mean?
>> Sorry for does not make this clear. We know that an aggregate arg may be
>> on registers partially or totally, as assign_parm_adjust_entry_rtl.
>> For an example, if a parameter with 12 words and the target/ABI only
>> allow 8 gprs for arguments, then the parameter could use 8 regs at most
>> and left part in stack.
>
> I also wonder about the exact semantics of the parallels we get here.
>
> +      int size = INTVAL (expr_size (from));
>
> esp. when you use sth as simple as this.  Shouldn't you instead look
> at to_rtx since that's already expanded?  For returns that should
Yes, I use "expr_size (from)" is just because the size should be same
between "from", "to" and "to_rtx" for these cases.
> be the desired layout to match 'from' to, no?  Maybe it's better
> to not try sharing the code for both incoming and return copies
> for clarity?
OK, thanks, good sugguestion.
>
> Also, what happens if there's a copy from a PARM_DECL to a
> DECL_USEDBY_RETURN_P decl?  Which heuristic takes precedent?
I thinking below code reflects this senario:

typedef struct SA {double a[3];} A;
A ret_arg (A a) {return a;}

The gimple seq:
  D.3951 = a;
  return D.3951;

For this patch, since "PARM_DECL" is checked before "VAR_P",
so, PARM_DECL is checked as true first.

>
> I think that at least the place of the copy improvement and the
> way you compute DECL_USEDBY_RETURN_P is reasonable.

I'm also thinking to mark this flag for return variables during
it was built in gimplify_return_expr.  This would be more
straightforward and no need to query/walk return stmts.  One
possible concern: it is too far between "gimplify" phase and
"expand".

I will update the patch accordingly.

Thanks for suggestions and comments!


BR,
Jeff (Jiufu)

>
> Thanks,
> Richard.
>
>> >
>> >
>> > Note that BLKmode subword values passed in registers can be either
>> > right or left justified.  I think you also need to worry about
>> > endianness here.
>> Since the subword is used to move block(read from source mem and then
>> store to destination mem with register mode), and this would keep to use
>> the same endianness on reg like move_block_from_reg. So, the patch does
>> not check the endianness.
>> 
>> If any concerns and sugguestions, please point out, thanks!
>> 
>> BR,
>> Jeff (Jiufu)
>> >
>> >
>> > Jeff
>> 

  reply	other threads:[~2022-11-25  5:05 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  6:15 Jiufu Guo
2022-11-21  3:07 ` Jiufu Guo
2022-11-22 21:57   ` Jeff Law
2022-11-23  2:58     ` Jiufu Guo
2022-11-24  7:31       ` Richard Biener
2022-11-25  5:05         ` Jiufu Guo [this message]
2022-11-25 12:29           ` Jiufu Guo
2022-11-28 17:00       ` Jeff Law
2022-11-29  3:53         ` Jiufu Guo
2022-12-05 16:48           ` Jeff Law
2022-12-06  2:36             ` Jiufu Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ewn7jr5co.fsf@pike.rch.stglabs.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=linkw@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).