public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization/109747 - SLP cost of CTORs
@ 2023-05-23 15:18 Richard Biener
  2023-05-23 16:53 ` Richard Sandiford
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Biener @ 2023-05-23 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

The x86 backend looks at the SLP node passed to the add_stmt_cost
hook when costing vec_construct, looking for elements that require
a move from a GPR to a vector register and cost that.  But since
vect_prologue_cost_for_slp decomposes the cost for an external
SLP node into individual pieces this cost gets applied N times
without a chance for the backend to know it's just dealing with
a part of the SLP node.  Just looking at a part is also not perfect
since the GPR to XMM move cost applies only once per distinct
element so handling the whole SLP node one more correctly reflects
cost (albeit without considering other external SLP nodes).

The following addresses the issue by passing down the SLP node
only for one piece and nullptr for the rest.  The x86 backend
is currently the only one looking at it.

In the future the cost of external elements is something to deal
with globally but that would require the full SLP tree be available
to costing.

It's difficult to write a testcase, at the tipping point not
vectorizing is better so I'll followup with x86 specific adjustments
and will see to add a testcase later.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard, we talked about this issue two weeks ago and I was looking
for a solution that would be OK for backporting if the need arises.
The following is what I could come up with that retains the whole
SLP-node wide "CSE" of the element move cost.  Is that OK until
we come up with a better plan for trunk at some point?

Thanks,
Richard.

	PR tree-optimization/109747
	* tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
	the SLP node only once to the cost hook.
---
 gcc/tree-vect-slp.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e5c9d7e766e..a6f277c5e21 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node,
     }
   /* ???  We're just tracking whether vectors in a single node are the same.
      Ideally we'd do something more global.  */
+  bool passed = false;
   for (unsigned int start : starts)
     {
       vect_cost_for_stmt kind;
@@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node,
 	kind = scalar_to_vec;
       else
 	kind = vec_construct;
-      record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
+      /* The target cost hook has no idea which part of the SLP node
+	 we are costing so avoid passing it down more than once.  Pass
+	 it to the first vec_construct or scalar_to_vec part since for those
+	 the x86 backend tries to account for GPR to XMM register moves.  */
+      record_stmt_cost (cost_vec, 1, kind,
+			(kind != vector_load && !passed) ? node : nullptr,
+			vectype, 0, vect_prologue);
+      if (kind != vector_load)
+	passed = true;
     }
 }
 
-- 
2.35.3

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] tree-optimization/109747 - SLP cost of CTORs
  2023-05-23 15:18 [PATCH] tree-optimization/109747 - SLP cost of CTORs Richard Biener
@ 2023-05-23 16:53 ` Richard Sandiford
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Sandiford @ 2023-05-23 16:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Richard Biener <rguenther@suse.de> writes:
> The x86 backend looks at the SLP node passed to the add_stmt_cost
> hook when costing vec_construct, looking for elements that require
> a move from a GPR to a vector register and cost that.  But since
> vect_prologue_cost_for_slp decomposes the cost for an external
> SLP node into individual pieces this cost gets applied N times
> without a chance for the backend to know it's just dealing with
> a part of the SLP node.  Just looking at a part is also not perfect
> since the GPR to XMM move cost applies only once per distinct
> element so handling the whole SLP node one more correctly reflects
> cost (albeit without considering other external SLP nodes).
>
> The following addresses the issue by passing down the SLP node
> only for one piece and nullptr for the rest.  The x86 backend
> is currently the only one looking at it.
>
> In the future the cost of external elements is something to deal
> with globally but that would require the full SLP tree be available
> to costing.
>
> It's difficult to write a testcase, at the tipping point not
> vectorizing is better so I'll followup with x86 specific adjustments
> and will see to add a testcase later.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard, we talked about this issue two weeks ago and I was looking
> for a solution that would be OK for backporting if the need arises.
> The following is what I could come up with that retains the whole
> SLP-node wide "CSE" of the element move cost.  Is that OK until
> we come up with a better plan for trunk at some point?

Yeah, seems like a neat workaround to me FWIW.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 	PR tree-optimization/109747
> 	* tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
> 	the SLP node only once to the cost hook.
> ---
>  gcc/tree-vect-slp.cc | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..a6f277c5e21 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node,
>      }
>    /* ???  We're just tracking whether vectors in a single node are the same.
>       Ideally we'd do something more global.  */
> +  bool passed = false;
>    for (unsigned int start : starts)
>      {
>        vect_cost_for_stmt kind;
> @@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node,
>  	kind = scalar_to_vec;
>        else
>  	kind = vec_construct;
> -      record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
> +      /* The target cost hook has no idea which part of the SLP node
> +	 we are costing so avoid passing it down more than once.  Pass
> +	 it to the first vec_construct or scalar_to_vec part since for those
> +	 the x86 backend tries to account for GPR to XMM register moves.  */
> +      record_stmt_cost (cost_vec, 1, kind,
> +			(kind != vector_load && !passed) ? node : nullptr,
> +			vectype, 0, vect_prologue);
> +      if (kind != vector_load)
> +	passed = true;
>      }
>  }

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-05-23 16:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 15:18 [PATCH] tree-optimization/109747 - SLP cost of CTORs Richard Biener
2023-05-23 16:53 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).