public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
@ 2021-03-24 13:26 Stam Markianos-Wright
  2021-03-24 13:46 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Stam Markianos-Wright @ 2021-03-24 13:26 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, ook, Richard Earnshaw, richard.sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Hi all,

This patch resolves bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

This is achieved by forcing a re-calculation of *stmt_vectype_out if an
incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with 
an extra introduced max_nunits ceiling.

I am not 100% sure if this is the best way to go about fixing this, 
because this is my first look at the vectorizer and I lack knowledge of 
the wider context, so do let me know if you see a better way to do this!

I have added the previously ICE-ing reproducer as a new test.

This is compiled as "g++ -Ofast -march=armv8.2-a+sve 
-fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for 
GCC10.

(the non-fdisable-tree-fre4 version has gone latent on GCC11)

Bootstrapped and reg-tested on aarch64-linux-gnu.
Also reg-tested on aarch64-none-elf.


gcc/ChangeLog:

         * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
         parameter to core function and add new function overload.
         (vect_get_vector_types_for_stmt): Add re-calculation logic.

gcc/testsuite/ChangeLog:

         * g++.target/aarch64/sve/pr96974.C: New test.

[-- Attachment #2: PR96974.patch --]
[-- Type: text/x-patch, Size: 4624 bytes --]

diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index 0000000000000000000000000000000000000000..2f6ebd6ce3dd8626f5e666edba77d2c925739b7d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+    c() {
+      for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+    }
+    int coeffs[10];
+} f;
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c2d1f39fe0f4bbc90ffa079cb6a8fcf87b76b3af..f8d3eac38718e18bf957b85109cccbc03e21c041 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11342,7 +11342,7 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 
 tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
-			     unsigned int group_size)
+			     unsigned int group_size, unsigned int max_nunits)
 {
   /* For BB vectorization, we should always have a group size once we've
      constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
@@ -11375,13 +11375,16 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
 	 fail (in the latter case because GROUP_SIZE is too small
 	 for the target), but it's possible that a target could have
 	 a hole between supported vector types.
+	 There is also the option to artificially pass a max_nunits,
+	 which is smaller than GROUP_SIZE, if the use of GROUP_SIZE
+	 would result in an incompatible mode for the target.
 
 	 If GROUP_SIZE is not a power of 2, this has the effect of
 	 trying the largest power of 2 that fits within the group,
 	 even though the group is not a multiple of that vector size.
 	 The BB vectorizer will then try to carve up the group into
 	 smaller pieces.  */
-      unsigned int nunits = 1 << floor_log2 (group_size);
+      unsigned int nunits = 1 << floor_log2 (max_nunits);
       do
 	{
 	  vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
@@ -11394,6 +11397,14 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
   return vectype;
 }
 
+tree
+get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
+			     unsigned int group_size)
+{
+  return get_vectype_for_scalar_type (vinfo, scalar_type,
+				     group_size, group_size);
+}
+
 /* Return the vector type corresponding to SCALAR_TYPE as supported
    by the target.  NODE, if nonnull, is the SLP tree node that will
    use the returned vector type.  */
@@ -12172,6 +12183,8 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 
   tree vectype;
   tree scalar_type = NULL_TREE;
+  tree scalar_type_orig = NULL_TREE;
+
   if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
     {
       vectype = STMT_VINFO_VECTYPE (stmt_info);
@@ -12210,6 +12223,7 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+      scalar_type_orig = scalar_type;
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
@@ -12249,6 +12263,36 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 	}
     }
 
+  /* In rare cases with different types and sizes we may reach an invalid
+     combination where nunits_vectype has fewer TYPE_VECTOR_SUBPARTS than
+     *stmt_vectype_out.  In that case attempt to re-calculate
+     *stmt_vectype_out with an imposed max taken from nunits_vectype.  */
+  unsigned int max_nunits;
+  if (known_lt (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+	       "Incompatable number of vector subparts between %T and %T\n",
+	       nunits_vectype, *stmt_vectype_out);
+
+      max_nunits = TYPE_VECTOR_SUBPARTS (nunits_vectype).to_constant ();
+      *stmt_vectype_out = get_vectype_for_scalar_type (vinfo, scalar_type_orig,
+						       group_size,
+						       max_nunits);
+
+      if (*stmt_vectype_out)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Re-calculated data-types to : %T, %T\n",
+			     nunits_vectype, *stmt_vectype_out);
+	}
+      else
+	return opt_result::failure_at
+	  (stmt, "Not vectorized: failed to re-calculate data-types.\n");
+    }
+
   gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
 			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-24 13:26 [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974] Stam Markianos-Wright
@ 2021-03-24 13:46 ` Richard Biener
  2021-03-25 13:48   ` Stam Markianos-Wright
  2021-03-26 17:19   ` Richard Sandiford
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2021-03-24 13:46 UTC (permalink / raw)
  To: Stam Markianos-Wright
  Cc: gcc-patches, ook, Richard Earnshaw, richard.sandiford, Kyrylo Tkachov

On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:

> Hi all,
> 
> This patch resolves bug:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
> 
> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
> extra introduced max_nunits ceiling.
> 
> I am not 100% sure if this is the best way to go about fixing this, because
> this is my first look at the vectorizer and I lack knowledge of the wider
> context, so do let me know if you see a better way to do this!
> 
> I have added the previously ICE-ing reproducer as a new test.
> 
> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
> 
> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
> 
> Bootstrapped and reg-tested on aarch64-linux-gnu.
> Also reg-tested on aarch64-none-elf.

I don't think this is going to work well given uses will expect
a vector type that's consistent here.

I think giving up is for the moment the best choice, thus replacing
the assert with vectorization failure.

In the end we shouldn't require those nunits vectypes to be
separately computed - we compute the vector type of the defs
anyway and in case they're invariant the vectorizable_* function
either can deal with the type mix or not anyway.

That said, the goal should be to simplify things here.

Richard.

> 
> gcc/ChangeLog:
> 
>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
>         parameter to core function and add new function overload.
>         (vect_get_vector_types_for_stmt): Add re-calculation logic.
> 
> gcc/testsuite/ChangeLog:
> 
>         * g++.target/aarch64/sve/pr96974.C: New test.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-24 13:46 ` Richard Biener
@ 2021-03-25 13:48   ` Stam Markianos-Wright
  2021-03-25 13:50     ` Richard Biener
  2021-03-26 17:19   ` Richard Sandiford
  1 sibling, 1 reply; 8+ messages in thread
From: Stam Markianos-Wright @ 2021-03-25 13:48 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, ook, Richard Earnshaw, richard.sandiford, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]

On 24/03/2021 13:46, Richard Biener wrote:
> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
> 
>> Hi all,
>>
>> This patch resolves bug:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
>>
>> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
>> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
>> extra introduced max_nunits ceiling.
>>
>> I am not 100% sure if this is the best way to go about fixing this, because
>> this is my first look at the vectorizer and I lack knowledge of the wider
>> context, so do let me know if you see a better way to do this!
>>
>> I have added the previously ICE-ing reproducer as a new test.
>>
>> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
>> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
>>
>> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
>>
>> Bootstrapped and reg-tested on aarch64-linux-gnu.
>> Also reg-tested on aarch64-none-elf.
> 
> I don't think this is going to work well given uses will expect
> a vector type that's consistent here.
> 
> I think giving up is for the moment the best choice, thus replacing
> the assert with vectorization failure.
> 
> In the end we shouldn't require those nunits vectypes to be
> separately computed - we compute the vector type of the defs
> anyway and in case they're invariant the vectorizable_* function
> either can deal with the type mix or not anyway.
> 

Yea good point! I agree and after all we are very close to releases now ;)

I've attached the patch that just do the graceful vectorization failure 
and add a slightly better test now. Re-tested as previously with no 
issues ofc.

gcc-10.patch is what I'd backport to GCC10 (the only difference between 
that and gcc-11.patch is that one compiles with `-fdisable-tree-fre4` 
and the other without it).

Ok to push this to the GCC11 branch and backport to the GCC10 branch?

Cheers :D
Stam

> That said, the goal should be to simplify things here.
> 
> Richard.
> 
>>
>> gcc/ChangeLog:
>>
>>          * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
>>          parameter to core function and add new function overload.
>>          (vect_get_vector_types_for_stmt): Add re-calculation logic.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * g++.target/aarch64/sve/pr96974.C: New test.
>>
> 


[-- Attachment #2: gcc-11.patch --]
[-- Type: text/x-patch, Size: 1375 bytes --]

diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index 00000000000..363241d18df
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 -fdump-tree-slp-details" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+    c() {
+      for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+    }
+    int coeffs[10];
+} f;
+
+/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d791d3a4720..4c01e82ff39 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12148,8 +12148,12 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
     }
 
-  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
-			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
+  if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		   TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+    return opt_result::failure_at (stmt,
+				   "Not vectorized: Incompatible number "
+				   "of vector subparts between %T and %T\n",
+				   nunits_vectype, *stmt_vectype_out);
 
   if (dump_enabled_p ())
     {

[-- Attachment #3: gcc-10.patch --]
[-- Type: text/x-patch, Size: 1338 bytes --]

diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index 00000000000..2023c55e3e6
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdump-tree-slp-details" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+    c() {
+      for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+    }
+    int coeffs[10];
+} f;
+
+/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c2d1f39fe0f..6418edb5204 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12249,8 +12249,12 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 	}
     }
 
-  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
-			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
+  if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		   TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+    return opt_result::failure_at (stmt,
+				   "Not vectorized: Incompatible number "
+				   "of vector subparts between %T and %T\n",
+				   nunits_vectype, *stmt_vectype_out);
 
   if (dump_enabled_p ())
     {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-25 13:48   ` Stam Markianos-Wright
@ 2021-03-25 13:50     ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2021-03-25 13:50 UTC (permalink / raw)
  To: Stam Markianos-Wright
  Cc: gcc-patches, ook, Richard Earnshaw, richard.sandiford, Kyrylo Tkachov

On Thu, 25 Mar 2021, Stam Markianos-Wright wrote:

> On 24/03/2021 13:46, Richard Biener wrote:
> > On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
> > 
> >> Hi all,
> >>
> >> This patch resolves bug:
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
> >>
> >> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
> >> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
> >> extra introduced max_nunits ceiling.
> >>
> >> I am not 100% sure if this is the best way to go about fixing this, because
> >> this is my first look at the vectorizer and I lack knowledge of the wider
> >> context, so do let me know if you see a better way to do this!
> >>
> >> I have added the previously ICE-ing reproducer as a new test.
> >>
> >> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4"
> >> for
> >> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
> >>
> >> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
> >>
> >> Bootstrapped and reg-tested on aarch64-linux-gnu.
> >> Also reg-tested on aarch64-none-elf.
> > 
> > I don't think this is going to work well given uses will expect
> > a vector type that's consistent here.
> > 
> > I think giving up is for the moment the best choice, thus replacing
> > the assert with vectorization failure.
> > 
> > In the end we shouldn't require those nunits vectypes to be
> > separately computed - we compute the vector type of the defs
> > anyway and in case they're invariant the vectorizable_* function
> > either can deal with the type mix or not anyway.
> > 
> 
> Yea good point! I agree and after all we are very close to releases now ;)
> 
> I've attached the patch that just do the graceful vectorization failure and
> add a slightly better test now. Re-tested as previously with no issues ofc.
> 
> gcc-10.patch is what I'd backport to GCC10 (the only difference between that
> and gcc-11.patch is that one compiles with `-fdisable-tree-fre4` and the other
> without it).
> 
> Ok to push this to the GCC11 branch and backport to the GCC10 branch?

OK.

Thanks,
Richard.

> Cheers :D
> Stam
> 
> > That said, the goal should be to simplify things here.
> > 
> > Richard.
> > 
> >>
> >> gcc/ChangeLog:
> >>
> >>          * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
> >>          parameter to core function and add new function overload.
> >>          (vect_get_vector_types_for_stmt): Add re-calculation logic.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>          * g++.target/aarch64/sve/pr96974.C: New test.
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-24 13:46 ` Richard Biener
  2021-03-25 13:48   ` Stam Markianos-Wright
@ 2021-03-26 17:19   ` Richard Sandiford
  2021-03-29  9:20     ` Richard Biener
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2021-03-26 17:19 UTC (permalink / raw)
  To: Richard Biener
  Cc: Stam Markianos-Wright, gcc-patches, ook, Richard Earnshaw,
	Kyrylo Tkachov

Richard Biener <rguenther@suse.de> writes:
> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
>
>> Hi all,
>> 
>> This patch resolves bug:
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
>> 
>> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
>> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
>> extra introduced max_nunits ceiling.
>> 
>> I am not 100% sure if this is the best way to go about fixing this, because
>> this is my first look at the vectorizer and I lack knowledge of the wider
>> context, so do let me know if you see a better way to do this!
>> 
>> I have added the previously ICE-ing reproducer as a new test.
>> 
>> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
>> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
>> 
>> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
>> 
>> Bootstrapped and reg-tested on aarch64-linux-gnu.
>> Also reg-tested on aarch64-none-elf.
>
> I don't think this is going to work well given uses will expect
> a vector type that's consistent here.
>
> I think giving up is for the moment the best choice, thus replacing
> the assert with vectorization failure.
>
> In the end we shouldn't require those nunits vectypes to be
> separately computed - we compute the vector type of the defs
> anyway and in case they're invariant the vectorizable_* function
> either can deal with the type mix or not anyway.

I agree this area needs simplification, but I think the direction of
travel should be to make the assert valid.  I agree this is probably
the pragmatic fix for GCC 11 and earlier though.

Also, IMO it's a bug that we use OImode, CImode or XImode for plain
vectors.  We should only need to use them for LD[234] and ST[234] arrays.

Thanks,
Richard

>
> That said, the goal should be to simplify things here.
>
> Richard.
>
>> 
>> gcc/ChangeLog:
>> 
>>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
>>         parameter to core function and add new function overload.
>>         (vect_get_vector_types_for_stmt): Add re-calculation logic.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>         * g++.target/aarch64/sve/pr96974.C: New test.
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-26 17:19   ` Richard Sandiford
@ 2021-03-29  9:20     ` Richard Biener
  2021-03-31 10:03       ` Stam Markianos-Wright
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2021-03-29  9:20 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Stam Markianos-Wright, gcc-patches, ook, Richard Earnshaw,
	Kyrylo Tkachov

On Fri, 26 Mar 2021, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
> >
> >> Hi all,
> >> 
> >> This patch resolves bug:
> >> 
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
> >> 
> >> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
> >> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
> >> extra introduced max_nunits ceiling.
> >> 
> >> I am not 100% sure if this is the best way to go about fixing this, because
> >> this is my first look at the vectorizer and I lack knowledge of the wider
> >> context, so do let me know if you see a better way to do this!
> >> 
> >> I have added the previously ICE-ing reproducer as a new test.
> >> 
> >> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
> >> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
> >> 
> >> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
> >> 
> >> Bootstrapped and reg-tested on aarch64-linux-gnu.
> >> Also reg-tested on aarch64-none-elf.
> >
> > I don't think this is going to work well given uses will expect
> > a vector type that's consistent here.
> >
> > I think giving up is for the moment the best choice, thus replacing
> > the assert with vectorization failure.
> >
> > In the end we shouldn't require those nunits vectypes to be
> > separately computed - we compute the vector type of the defs
> > anyway and in case they're invariant the vectorizable_* function
> > either can deal with the type mix or not anyway.
> 
> I agree this area needs simplification, but I think the direction of
> travel should be to make the assert valid.  I agree this is probably
> the pragmatic fix for GCC 11 and earlier though.

The issue is that we compute a vector type for a use that may differ
from what we'd compute for it in the context of its definition (or
in the context of another use).  Any such "local" decision is likely
flawed and I'd rather simplify further doing the only decision on
the definition side - if there's a disconnect between the number
of lanes (and thus altering the VF won't help) then we have to give
up anyway.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-29  9:20     ` Richard Biener
@ 2021-03-31 10:03       ` Stam Markianos-Wright
  2021-03-31 10:08         ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Stam Markianos-Wright @ 2021-03-31 10:03 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford
  Cc: gcc-patches, ook, Richard Earnshaw, Kyrylo Tkachov, Christophe Lyon

On 29/03/2021 10:20, Richard Biener wrote:
> On Fri, 26 Mar 2021, Richard Sandiford wrote:
> 
>> Richard Biener <rguenther@suse.de> writes:
>>> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
>>>
>>>> Hi all,
>>>>
>>>> This patch resolves bug:
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
>>>>
>>>> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
>>>> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
>>>> extra introduced max_nunits ceiling.
>>>>
>>>> I am not 100% sure if this is the best way to go about fixing this, because
>>>> this is my first look at the vectorizer and I lack knowledge of the wider
>>>> context, so do let me know if you see a better way to do this!
>>>>
>>>> I have added the previously ICE-ing reproducer as a new test.
>>>>
>>>> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
>>>> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
>>>>
>>>> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
>>>>
>>>> Bootstrapped and reg-tested on aarch64-linux-gnu.
>>>> Also reg-tested on aarch64-none-elf.
>>>
>>> I don't think this is going to work well given uses will expect
>>> a vector type that's consistent here.
>>>
>>> I think giving up is for the moment the best choice, thus replacing
>>> the assert with vectorization failure.
>>>
>>> In the end we shouldn't require those nunits vectypes to be
>>> separately computed - we compute the vector type of the defs
>>> anyway and in case they're invariant the vectorizable_* function
>>> either can deal with the type mix or not anyway.
>>
>> I agree this area needs simplification, but I think the direction of
>> travel should be to make the assert valid.  I agree this is probably
>> the pragmatic fix for GCC 11 and earlier though.
> 
> The issue is that we compute a vector type for a use that may differ
> from what we'd compute for it in the context of its definition (or
> in the context of another use).  Any such "local" decision is likely
> flawed and I'd rather simplify further doing the only decision on
> the definition side - if there's a disconnect between the number
> of lanes (and thus altering the VF won't help) then we have to give
> up anyway.
> 
> Richard.
> 

Thank you both for the further info! Would it be fair to close the 
initial PR regarding the ICE 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and then open a 
second one at a lower priority level to address these further improvements?

Also Christophe has kindly found out that the test FAILs in ILP32, so it 
would be great to get that one in asap, too! 
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567431.html

Cheers,
Stam


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
  2021-03-31 10:03       ` Stam Markianos-Wright
@ 2021-03-31 10:08         ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2021-03-31 10:08 UTC (permalink / raw)
  To: Stam Markianos-Wright
  Cc: Richard Sandiford, gcc-patches, ook, Richard Earnshaw,
	Kyrylo Tkachov, Christophe Lyon

On Wed, 31 Mar 2021, Stam Markianos-Wright wrote:

> On 29/03/2021 10:20, Richard Biener wrote:
> > On Fri, 26 Mar 2021, Richard Sandiford wrote:
> > 
> >> Richard Biener <rguenther@suse.de> writes:
> >>> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> This patch resolves bug:
> >>>>
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
> >>>>
> >>>> This is achieved by forcing a re-calculation of *stmt_vectype_out if an
> >>>> incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
> >>>> extra introduced max_nunits ceiling.
> >>>>
> >>>> I am not 100% sure if this is the best way to go about fixing this,
> >>>> because
> >>>> this is my first look at the vectorizer and I lack knowledge of the wider
> >>>> context, so do let me know if you see a better way to do this!
> >>>>
> >>>> I have added the previously ICE-ing reproducer as a new test.
> >>>>
> >>>> This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4"
> >>>> for
> >>>> GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
> >>>>
> >>>> (the non-fdisable-tree-fre4 version has gone latent on GCC11)
> >>>>
> >>>> Bootstrapped and reg-tested on aarch64-linux-gnu.
> >>>> Also reg-tested on aarch64-none-elf.
> >>>
> >>> I don't think this is going to work well given uses will expect
> >>> a vector type that's consistent here.
> >>>
> >>> I think giving up is for the moment the best choice, thus replacing
> >>> the assert with vectorization failure.
> >>>
> >>> In the end we shouldn't require those nunits vectypes to be
> >>> separately computed - we compute the vector type of the defs
> >>> anyway and in case they're invariant the vectorizable_* function
> >>> either can deal with the type mix or not anyway.
> >>
> >> I agree this area needs simplification, but I think the direction of
> >> travel should be to make the assert valid.  I agree this is probably
> >> the pragmatic fix for GCC 11 and earlier though.
> > 
> > The issue is that we compute a vector type for a use that may differ
> > from what we'd compute for it in the context of its definition (or
> > in the context of another use).  Any such "local" decision is likely
> > flawed and I'd rather simplify further doing the only decision on
> > the definition side - if there's a disconnect between the number
> > of lanes (and thus altering the VF won't help) then we have to give
> > up anyway.
> > 
> > Richard.
> > 
> 
> Thank you both for the further info! Would it be fair to close the initial PR
> regarding the ICE (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and
> then open a second one at a lower priority level to address these further
> improvements?

I have closed the original report.  If there's a testcase where
vectorization is possible and useful but is now rejected because of
the fix then yes, please open a new missed-optimization bugreport.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-31 10:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 13:26 [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974] Stam Markianos-Wright
2021-03-24 13:46 ` Richard Biener
2021-03-25 13:48   ` Stam Markianos-Wright
2021-03-25 13:50     ` Richard Biener
2021-03-26 17:19   ` Richard Sandiford
2021-03-29  9:20     ` Richard Biener
2021-03-31 10:03       ` Stam Markianos-Wright
2021-03-31 10:08         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).