public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing
@ 2022-03-25 12:11 Richard Biener
  2022-03-25 13:25 ` Hongtao Liu
  2022-03-28 13:37 ` Richard Sandiford
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2022-03-25 12:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, hongtao.liu

Since we're now vectorizing by default at -O2 issues like PR101908
become more important where we apply basic-block vectorization to
parts of the function covering loads from function parameters passed
on the stack.  Since we have no good idea how the stack pushing
was performed but we do have a good idea that it was done recent
when at the start of the function a STLF failure is inevitable
if the argument passing code didn't use the same or larger vector
stores and also has them aligned the same way.

Until there's a robust IPA based solution the following implements
target independent heuristics in the vectorizer to retain scalar
loads for loads from parameters likely passed in memory (I use
a BLKmode DECL_MODE check for this rather than firing up
cummulative-args).  I've restricted this also to loads from the
first "block" (that can be less than the first basic block if there's
a call for example), since that covers the testcase.

Note that for the testcase (but not c-ray from the bugreport) there's
a x86 peephole2 that vectorizes things back, so the patch is
not effective there.

Any comments?  I know we're also looking at x86 port specific
mitigations but the issue will hit arm and power/z as well I think.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2022-03-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101908
	* tree-vect-stmts.cc (get_group_load_store_type): Add
	heuristic to limit BB vectorization of function parameters.

	* gcc.dg/vect/bb-slp-pr101908.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++
 gcc/tree-vect-stmts.cc                      | 27 ++++++++++++++++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
new file mode 100644
index 00000000000..b7534a18f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+struct vec3 {
+  double x, y, z;
+};
+
+struct ray {
+  struct vec3 orig, dir;
+};
+
+void ray_sphere (struct ray ray, double *res)
+{
+  res[0] = ray.orig.y * ray.dir.x;
+  res[1] = ray.orig.z * ray.dir.y;
+}
+
+/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f7449a79d1c..1e37e9678b6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
   /* Stores can't yet have gaps.  */
   gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0);
 
-  if (slp_node)
+  tree parm;
+  if (!loop_vinfo
+      && vls_type == VLS_LOAD
+      /* The access is based on a PARM_DECL.  */
+      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
+      && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), true)
+      && TREE_CODE (parm) == PARM_DECL
+      /* Likely passed on the stack.  */
+      && DECL_MODE (parm) == BLKmode
+      /* The access is in the first group.  */
+      && first_dr_info->group == 0)
+    {
+      /* When doing BB vectorizing force early loads from function parameters
+	 passed on the stack and thus stored recently to be done elementwise
+	 to avoid store-to-load forwarding penalties.
+	 Note this will cause vectorization to fail for the load because of
+	 the fear of underestimating the cost of elementwise accesses,
+	 see the end of get_load_store_type.  We are then going to effectively
+	 do the same via handling the loads as external input to the SLP.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Not using vector loads from function parameter "
+			 "for the fear of causing a STLF fail\n");
+      *memory_access_type = VMAT_ELEMENTWISE;
+    }
+  else if (slp_node)
     {
       /* For SLP vectorization we directly vectorize a subchain
 	 without permutation.  */
-- 
2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-29  6:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 12:11 [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing Richard Biener
2022-03-25 13:25 ` Hongtao Liu
2022-03-25 13:42   ` Richard Biener
2022-03-25 14:09     ` Hongtao Liu
2022-03-28 13:37 ` Richard Sandiford
2022-03-28 13:49   ` Richard Biener
2022-03-28 14:35     ` Richard Sandiford
2022-03-29  6:21       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).