public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vect/test: Don't check for epilogue loop [PR97075]
@ 2020-09-18  2:37 Kewen.Lin
  2020-09-18 15:46 ` Segher Boessenkool
  2020-09-18 18:17 ` Richard Sandiford
  0 siblings, 2 replies; 13+ messages in thread
From: Kewen.Lin @ 2020-09-18  2:37 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Bill Schmidt, Richard Biener,
	Richard Sandiford, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi,

The commit r11-3230 brings a nice improvement to use full
vectors instead of partial vectors when available.  But
it caused some vector with length test cases to fail on
Power.

The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
exposed one issue that: we call function 
vect_need_peeling_or_partial_vectors_p in function
vect_analyze_loop_2, since it's in analysis phase, for
the epilogue loop, we could get the wrong information like
LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
answer for vect_need_peeling_or_partial_vectors_p.
For the epilogue loop in this failure specific, the niter
that we get is 58 (should be 1), vf is 2.

For epilogue loop with partial vectors, it would use the
same VF as the main loop, so it won't be able to use full
vector, this patch is to exclude epilogue loop for the
check vect_need_peeling_or_partial_vectors_p in
vect_analyze_loop_2.

The failure on gcc.target/powerpc/p9-vec-length-full-6.c
is just a test issue, the 64bit/32bit pairs are able to
use full vector, fixed in the patch accordingly.

Bootstrapped/regtested on powerpc64le-linux-gnu P9.

Is it OK for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-vect-loop.c (vect_analyze_loop_2): Don't check
	vect_need_peeling_or_partial_vectors_p for the epilogue loop.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.

[-- Attachment #2: pr97075.diff --]
[-- Type: text/plain, Size: 1566 bytes --]

diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
index cfae9bbc927..5d2357aabfa 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
@@ -9,8 +9,7 @@
 #include "p9-vec-length-6.h"
 
 /* It can use normal vector load for constant vector load.  */
-/* { dg-final { scan-assembler-not   {\mstxv\M} } } */
-/* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
-/* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
-/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
+/* 64bit/32bit pairs won't use partial vectors.  */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ab627fbf029..7273e998a99 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2278,7 +2278,8 @@ start_over:
     {
       /* Don't use partial vectors if we don't need to peel the loop.  */
       if (param_vect_partial_vector_usage == 0
-	  || !vect_need_peeling_or_partial_vectors_p (loop_vinfo))
+	  || (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+	      && !vect_need_peeling_or_partial_vectors_p (loop_vinfo)))
 	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
       else if (vect_verify_full_masking (loop_vinfo)
 	       || vect_verify_loop_lens (loop_vinfo))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-18  2:37 [PATCH] vect/test: Don't check for epilogue loop [PR97075] Kewen.Lin
@ 2020-09-18 15:46 ` Segher Boessenkool
  2020-09-18 18:17 ` Richard Sandiford
  1 sibling, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2020-09-18 15:46 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Bill Schmidt, Richard Biener, Richard Sandiford,
	Andrea Corallo

On Fri, Sep 18, 2020 at 10:37:47AM +0800, Kewen.Lin wrote:
> The commit r11-3230 brings a nice improvement to use full
> vectors instead of partial vectors when available.  But
> it caused some vector with length test cases to fail on
> Power.
> 
> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
> exposed one issue that: we call function 
> vect_need_peeling_or_partial_vectors_p in function
> vect_analyze_loop_2, since it's in analysis phase, for
> the epilogue loop, we could get the wrong information like
> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
> answer for vect_need_peeling_or_partial_vectors_p.
> For the epilogue loop in this failure specific, the niter
> that we get is 58 (should be 1), vf is 2.
> 
> For epilogue loop with partial vectors, it would use the
> same VF as the main loop, so it won't be able to use full
> vector, this patch is to exclude epilogue loop for the
> check vect_need_peeling_or_partial_vectors_p in
> vect_analyze_loop_2.
> 
> The failure on gcc.target/powerpc/p9-vec-length-full-6.c
> is just a test issue, the 64bit/32bit pairs are able to
> use full vector, fixed in the patch accordingly.

> gcc/ChangeLog:
> 
> 	* tree-vect-loop.c (vect_analyze_loop_2): Don't check
> 	vect_need_peeling_or_partial_vectors_p for the epilogue loop.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.

The testcase part of course is okay for trunk, if this is the expected
(and good :-) ) code.Thanks,


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-18  2:37 [PATCH] vect/test: Don't check for epilogue loop [PR97075] Kewen.Lin
  2020-09-18 15:46 ` Segher Boessenkool
@ 2020-09-18 18:17 ` Richard Sandiford
  2020-09-20 14:03   ` Kewen.Lin
  2020-09-21  7:29   ` Andrea Corallo
  1 sibling, 2 replies; 13+ messages in thread
From: Richard Sandiford @ 2020-09-18 18:17 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener,
	Andrea Corallo

Thanks for looking at this.

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi,
>
> The commit r11-3230 brings a nice improvement to use full
> vectors instead of partial vectors when available.  But
> it caused some vector with length test cases to fail on
> Power.
>
> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
> exposed one issue that: we call function 
> vect_need_peeling_or_partial_vectors_p in function
> vect_analyze_loop_2, since it's in analysis phase, for
> the epilogue loop, we could get the wrong information like
> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
> answer for vect_need_peeling_or_partial_vectors_p.
> For the epilogue loop in this failure specific, the niter
> that we get is 58 (should be 1), vf is 2.
>
> For epilogue loop with partial vectors, it would use the
> same VF as the main loop, so it won't be able to use full
> vector, this patch is to exclude epilogue loop for the
> check vect_need_peeling_or_partial_vectors_p in
> vect_analyze_loop_2.

Hmm.  For better or worse, I think the analysis phase is actually
operating on the assumption that the vector code needs to handle
all scalar iterations, even in the epilogue case.  We actually
rely on that to implement VECT_COMPARE_COSTS (although it wasn't
the original motivation for handling epilogues this way).

Perhaps we should expand the functionality (and name)
of determine_peel_for_niter so that it also computes
LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
that flag once we know the final epilogue scalar iteration count,
just like we recompute LOOP_VINFO_PEELING_FOR_NITER.

As a sanity check, I think it would also be good for the
renamed function to do the following parts of vect_analyze_loop_2:

  /* If epilog loop is required because of data accesses with gaps,
     one additional iteration needs to be peeled.  Check if there is
     enough iterations for vectorization.  */
  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
    {
      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
      tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);

      if (known_lt (wi::to_widest (scalar_niters), vf))
	return opt_result::failure_at (vect_location,
				       "loop has no enough iterations to"
				       " support peeling for gaps.\n");
    }

  /* If we're vectorizing an epilogue loop, the vectorized loop either needs
     to be able to handle fewer than VF scalars, or needs to have a lower VF
     than the main loop.  */
  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
      && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
		   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
    return opt_result::failure_at (vect_location,
				   "Vectorization factor too high for"
				   " epilogue loop.\n");

and then assert that no failure occurs when called for epilogues
from vect_do_peeling.  So the function would be doing three things:

- compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
  in vect_analyze_loop_2

- perform the checks quoted above

- what the function currently does

That's all speculation though -- I haven't tried any of this.

Andrea, how should we handle this?  Is it something you'd have time to
look at?

Thanks,
Richard

>
> The failure on gcc.target/powerpc/p9-vec-length-full-6.c
> is just a test issue, the 64bit/32bit pairs are able to
> use full vector, fixed in the patch accordingly.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>
> Is it OK for trunk?
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> 	* tree-vect-loop.c (vect_analyze_loop_2): Don't check
> 	vect_need_peeling_or_partial_vectors_p for the epilogue loop.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> index cfae9bbc927..5d2357aabfa 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> @@ -9,8 +9,7 @@
>  #include "p9-vec-length-6.h"
>  
>  /* It can use normal vector load for constant vector load.  */
> -/* { dg-final { scan-assembler-not   {\mstxv\M} } } */
> -/* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
> -/* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
> +/* 64bit/32bit pairs won't use partial vectors.  */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index ab627fbf029..7273e998a99 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2278,7 +2278,8 @@ start_over:
>      {
>        /* Don't use partial vectors if we don't need to peel the loop.  */
>        if (param_vect_partial_vector_usage == 0
> -	  || !vect_need_peeling_or_partial_vectors_p (loop_vinfo))
> +	  || (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +	      && !vect_need_peeling_or_partial_vectors_p (loop_vinfo)))
>  	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
>        else if (vect_verify_full_masking (loop_vinfo)
>  	       || vect_verify_loop_lens (loop_vinfo))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-18 18:17 ` Richard Sandiford
@ 2020-09-20 14:03   ` Kewen.Lin
  2020-09-21  6:50     ` Richard Sandiford
  2020-09-21  7:29   ` Andrea Corallo
  1 sibling, 1 reply; 13+ messages in thread
From: Kewen.Lin @ 2020-09-20 14:03 UTC (permalink / raw)
  To: richard.sandiford
  Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener,
	Andrea Corallo

Hi Richard,

> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi,
>>
>> The commit r11-3230 brings a nice improvement to use full
>> vectors instead of partial vectors when available.  But
>> it caused some vector with length test cases to fail on
>> Power.
>>
>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>> exposed one issue that: we call function 
>> vect_need_peeling_or_partial_vectors_p in function
>> vect_analyze_loop_2, since it's in analysis phase, for
>> the epilogue loop, we could get the wrong information like
>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>> answer for vect_need_peeling_or_partial_vectors_p.
>> For the epilogue loop in this failure specific, the niter
>> that we get is 58 (should be 1), vf is 2.
>>
>> For epilogue loop with partial vectors, it would use the
>> same VF as the main loop, so it won't be able to use full
>> vector, this patch is to exclude epilogue loop for the
>> check vect_need_peeling_or_partial_vectors_p in
>> vect_analyze_loop_2.
> 
> Hmm.  For better or worse, I think the analysis phase is actually
> operating on the assumption that the vector code needs to handle
> all scalar iterations, even in the epilogue case.  We actually
> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
> the original motivation for handling epilogues this way).
> 
> Perhaps we should expand the functionality (and name)
> of determine_peel_for_niter so that it also computes
> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
> that flag once we know the final epilogue scalar iteration count,
> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
> 
> As a sanity check, I think it would also be good for the
> renamed function to do the following parts of vect_analyze_loop_2:
> 
>   /* If epilog loop is required because of data accesses with gaps,
>      one additional iteration needs to be peeled.  Check if there is
>      enough iterations for vectorization.  */
>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>     {
>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
> 
>       if (known_lt (wi::to_widest (scalar_niters), vf))
> 	return opt_result::failure_at (vect_location,
> 				       "loop has no enough iterations to"
> 				       " support peeling for gaps.\n");
>     }
> 
>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>      than the main loop.  */
>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> 		   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>     return opt_result::failure_at (vect_location,
> 				   "Vectorization factor too high for"
> 				   " epilogue loop.\n");
> 
> and then assert that no failure occurs when called for epilogues
> from vect_do_peeling.  So the function would be doing three things:
> 
> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>   in vect_analyze_loop_2
> 
> - perform the checks quoted above
> 
> - what the function currently does
> 
> That's all speculation though -- I haven't tried any of this.
> 

Thanks for your comments!  I'm curious that are your suggestions mainly
for future extension?  IIUC, currently if the partial vector usage is 2,
we will have no epilogue loops, if the partial vector usage is 1, the
epilogue loop uses the same VF as the one of the main loop, its total
iterations count is less than VF, it has no chance to use full vector.
It looks to exclude epilogue loops is enough for now.

BR,
Kewen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-20 14:03   ` Kewen.Lin
@ 2020-09-21  6:50     ` Richard Sandiford
  2020-09-22  3:23       ` Kewen.Lin
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2020-09-21  6:50 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener,
	Andrea Corallo

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi Richard,
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>> Hi,
>>>
>>> The commit r11-3230 brings a nice improvement to use full
>>> vectors instead of partial vectors when available.  But
>>> it caused some vector with length test cases to fail on
>>> Power.
>>>
>>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>>> exposed one issue that: we call function 
>>> vect_need_peeling_or_partial_vectors_p in function
>>> vect_analyze_loop_2, since it's in analysis phase, for
>>> the epilogue loop, we could get the wrong information like
>>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>>> answer for vect_need_peeling_or_partial_vectors_p.
>>> For the epilogue loop in this failure specific, the niter
>>> that we get is 58 (should be 1), vf is 2.
>>>
>>> For epilogue loop with partial vectors, it would use the
>>> same VF as the main loop, so it won't be able to use full
>>> vector, this patch is to exclude epilogue loop for the
>>> check vect_need_peeling_or_partial_vectors_p in
>>> vect_analyze_loop_2.
>> 
>> Hmm.  For better or worse, I think the analysis phase is actually
>> operating on the assumption that the vector code needs to handle
>> all scalar iterations, even in the epilogue case.  We actually
>> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
>> the original motivation for handling epilogues this way).
>> 
>> Perhaps we should expand the functionality (and name)
>> of determine_peel_for_niter so that it also computes
>> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
>> that flag once we know the final epilogue scalar iteration count,
>> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
>> 
>> As a sanity check, I think it would also be good for the
>> renamed function to do the following parts of vect_analyze_loop_2:
>> 
>>   /* If epilog loop is required because of data accesses with gaps,
>>      one additional iteration needs to be peeled.  Check if there is
>>      enough iterations for vectorization.  */
>>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>>     {
>>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>> 
>>       if (known_lt (wi::to_widest (scalar_niters), vf))
>> 	return opt_result::failure_at (vect_location,
>> 				       "loop has no enough iterations to"
>> 				       " support peeling for gaps.\n");
>>     }
>> 
>>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>>      than the main loop.  */
>>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>> 		   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>>     return opt_result::failure_at (vect_location,
>> 				   "Vectorization factor too high for"
>> 				   " epilogue loop.\n");
>> 
>> and then assert that no failure occurs when called for epilogues
>> from vect_do_peeling.  So the function would be doing three things:
>> 
>> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>>   in vect_analyze_loop_2
>> 
>> - perform the checks quoted above
>> 
>> - what the function currently does
>> 
>> That's all speculation though -- I haven't tried any of this.
>> 
>
> Thanks for your comments!  I'm curious that are your suggestions mainly
> for future extension?  IIUC, currently if the partial vector usage is 2,
> we will have no epilogue loops, if the partial vector usage is 1, the
> epilogue loop uses the same VF as the one of the main loop, its total
> iterations count is less than VF, it has no chance to use full vector.
> It looks to exclude epilogue loops is enough for now.

The problem is that partial-vector-usage==2 does not guarantee that all
loops will be able to use partial vectors.  It's still subject to what
the target and the vectoriser support.  And on AArch64, we compare up
to 6 implementations of a loop side-by-side:

  - up to 4 SVE implementations, for different levels of unpacking
  - up to 2 Advanced SIMD implementations (128-bit and 64-bit)

So even today, we have some loop_vinfos that can use predication and
some loop_vinfos that can't.  We can then end up analysing a loop_vinfo
once, but with two conflicting uses in mind:

  (a) as an epilogue loop for a main loop that can't use predication, or
  (b) as a replacement for that main loop.

If analysis succeeds, we decide based on the cost of the loop_vinfo
whether to go for (b) or (a).

To handle this correctly, we need to make the analysis for (b) as
close as possible to the analysis without (a).

There's also a conceptual objection: it seems wrong to check for
epilogue loops here, but not do the same for the later:

  determine_peel_for_niter (loop_vinfo);

even though the information is equally wrong for (a) in both cases.

So what I suggested above was trying to keep things consistent:

  - during the initial loop_vinfo analysis, all decisions based on the
    number of iterations consider the use of the loop_vinfo as the main
    loop.

  - if we end up using the loop_vinfo as an epilogue loop, all those
    decisions are remade based on the final (now known) number of
    iterations.

Of course, it would be better if the code was structured differently
and so these things could be done more directly.  E.g. it would be
good if:

(1) the analysis phase was split up more, so that it's possible to reuse
    parts of the core analysis and just vary “epilogueness“.

(2) the actual number of epilogue iterations was available during the
    analysis phase (at least if the number is constant).

But the structure of the vectoriser tends to make things more difficult
than they should be.

In a sense, the changes I suggested in the quote above are moving
towards (1), so I think they are forward progress of a kind.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-18 18:17 ` Richard Sandiford
  2020-09-20 14:03   ` Kewen.Lin
@ 2020-09-21  7:29   ` Andrea Corallo
  2020-09-21  8:32     ` Richard Sandiford
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Corallo @ 2020-09-21  7:29 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener,
	richard.sandiford

Richard Sandiford <richard.sandiford@arm.com> writes:
[...]
> Andrea, how should we handle this?  Is it something you'd have time to
> look at?

Hi Richard,

I've not but FWIW your observations here and on today's mail make alot
of sense to me.  We maybe want to install Kewen's fix anyway while we
rework this logic?

  Andrea

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-21  7:29   ` Andrea Corallo
@ 2020-09-21  8:32     ` Richard Sandiford
  2020-09-22 14:34       ` [PATCH] vect: Fix epilogue loop handling of partial vectors Richard Sandiford
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2020-09-21  8:32 UTC (permalink / raw)
  To: Andrea Corallo
  Cc: Kewen.Lin, GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener

Andrea Corallo <andrea.corallo@arm.com> writes:
> Richard Sandiford <richard.sandiford@arm.com> writes:
> [...]
>> Andrea, how should we handle this?  Is it something you'd have time to
>> look at?
>
> Hi Richard,
>
> I've not

OK, NP.  In that case I'll give it a go.

> but FWIW your observations here and on today's mail make alot
> of sense to me.  We maybe want to install Kewen's fix anyway while we
> rework this logic?

I think that would just be trading one problem for another though.

I'll try to have a patch ready tomorrow morning European time.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]
  2020-09-21  6:50     ` Richard Sandiford
@ 2020-09-22  3:23       ` Kewen.Lin
  0 siblings, 0 replies; 13+ messages in thread
From: Kewen.Lin @ 2020-09-22  3:23 UTC (permalink / raw)
  To: richard.sandiford
  Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener,
	Andrea Corallo

Hi Richard,

on 2020/9/21 下午2:50, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi Richard,
>>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>>> Hi,
>>>>
>>>> The commit r11-3230 brings a nice improvement to use full
>>>> vectors instead of partial vectors when available.  But
>>>> it caused some vector with length test cases to fail on
>>>> Power.
>>>>
>>>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>>>> exposed one issue that: we call function 
>>>> vect_need_peeling_or_partial_vectors_p in function
>>>> vect_analyze_loop_2, since it's in analysis phase, for
>>>> the epilogue loop, we could get the wrong information like
>>>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>>>> answer for vect_need_peeling_or_partial_vectors_p.
>>>> For the epilogue loop in this failure specific, the niter
>>>> that we get is 58 (should be 1), vf is 2.
>>>>
>>>> For epilogue loop with partial vectors, it would use the
>>>> same VF as the main loop, so it won't be able to use full
>>>> vector, this patch is to exclude epilogue loop for the
>>>> check vect_need_peeling_or_partial_vectors_p in
>>>> vect_analyze_loop_2.
>>>
>>> Hmm.  For better or worse, I think the analysis phase is actually
>>> operating on the assumption that the vector code needs to handle
>>> all scalar iterations, even in the epilogue case.  We actually
>>> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
>>> the original motivation for handling epilogues this way).
>>>
>>> Perhaps we should expand the functionality (and name)
>>> of determine_peel_for_niter so that it also computes
>>> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
>>> that flag once we know the final epilogue scalar iteration count,
>>> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
>>>
>>> As a sanity check, I think it would also be good for the
>>> renamed function to do the following parts of vect_analyze_loop_2:
>>>
>>>   /* If epilog loop is required because of data accesses with gaps,
>>>      one additional iteration needs to be peeled.  Check if there is
>>>      enough iterations for vectorization.  */
>>>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>>>     {
>>>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>>
>>>       if (known_lt (wi::to_widest (scalar_niters), vf))
>>> 	return opt_result::failure_at (vect_location,
>>> 				       "loop has no enough iterations to"
>>> 				       " support peeling for gaps.\n");
>>>     }
>>>
>>>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>>>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>>>      than the main loop.  */
>>>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>>>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>>> 		   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>>>     return opt_result::failure_at (vect_location,
>>> 				   "Vectorization factor too high for"
>>> 				   " epilogue loop.\n");
>>>
>>> and then assert that no failure occurs when called for epilogues
>>> from vect_do_peeling.  So the function would be doing three things:
>>>
>>> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>>>   in vect_analyze_loop_2
>>>
>>> - perform the checks quoted above
>>>
>>> - what the function currently does
>>>
>>> That's all speculation though -- I haven't tried any of this.
>>>
>>
>> Thanks for your comments!  I'm curious that are your suggestions mainly
>> for future extension?  IIUC, currently if the partial vector usage is 2,
>> we will have no epilogue loops, if the partial vector usage is 1, the
>> epilogue loop uses the same VF as the one of the main loop, its total
>> iterations count is less than VF, it has no chance to use full vector.
>> It looks to exclude epilogue loops is enough for now.
> 
> The problem is that partial-vector-usage==2 does not guarantee that all
> loops will be able to use partial vectors.  It's still subject to what
> the target and the vectoriser support.  And on AArch64, we compare up
> to 6 implementations of a loop side-by-side:
> 
>   - up to 4 SVE implementations, for different levels of unpacking
>   - up to 2 Advanced SIMD implementations (128-bit and 64-bit)
> 
> So even today, we have some loop_vinfos that can use predication and
> some loop_vinfos that can't.  We can then end up analysing a loop_vinfo
> once, but with two conflicting uses in mind:
> 
>   (a) as an epilogue loop for a main loop that can't use predication, or
>   (b) as a replacement for that main loop.
> 
> If analysis succeeds, we decide based on the cost of the loop_vinfo
> whether to go for (b) or (a).
> 
> To handle this correctly, we need to make the analysis for (b) as
> close as possible to the analysis without (a).
> 
> There's also a conceptual objection: it seems wrong to check for
> epilogue loops here, but not do the same for the later:
> 

Yeah, good point!

>   determine_peel_for_niter (loop_vinfo);
> 
> even though the information is equally wrong for (a) in both cases.
> 
> So what I suggested above was trying to keep things consistent:
> 
>   - during the initial loop_vinfo analysis, all decisions based on the
>     number of iterations consider the use of the loop_vinfo as the main
>     loop.
> 
>   - if we end up using the loop_vinfo as an epilogue loop, all those
>     decisions are remade based on the final (now known) number of
>     iterations.
> 

Got it, to keep them consistent makes more sense, thanks for your
detailed explanation and your time on the rework!

BR,
Kewen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] vect: Fix epilogue loop handling of partial vectors
  2020-09-21  8:32     ` Richard Sandiford
@ 2020-09-22 14:34       ` Richard Sandiford
  2020-09-23  3:07         ` Kewen.Lin
  2020-09-23  7:58         ` Richard Biener
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Sandiford @ 2020-09-22 14:34 UTC (permalink / raw)
  To: Andrea Corallo
  Cc: Kewen.Lin, GCC Patches, Segher Boessenkool, Bill Schmidt, Richard Biener

Richard Sandiford <richard.sandiford@arm.com> writes:
> I'll try to have a patch ready tomorrow morning European time.

Well, I totally failed to hit that deadline.  When testing on Power,
I saw a couple of extra failures, but I now think they're improvements
rather than regressions.  See the point about single-iteration
epilogues below.

---

This patch fixes the fallout that Kewen reported on Power after
the recent change to avoid unnecessary use of partial vectors.
As Kewen said, the problem is that vect_analyze_loop_2 doesn't
know how many epilogue iterations there will be, and so it
cannot make a final decision about whether the number of
iterations forces an epilogue loop to use partial vectors.

This is similar to the current situation for peeling: we don't know
during initial analysis whether an epilogue loop will itself require
peeling.  Instead we decide that during vect_do_peeling, where the
final number of epilogue loop iterations is known.

The patch takes a similar approach for the decision about whether
to use partial vectors.  As the comments in the patch say, the
idea is that vect_analyze_loop_2 should make peeling and partial-
vector decisions based on the assumption that the loop_vinfo will
be used as the main loop, while vect_do_peeling should make them
in the knowledge that the loop_vinfo will be used as an epilogue loop.

This allows the same analysis to be used for both cases, which we
rely on for implementing VECT_COMPARE_COSTS; see the big comment
in vect_analyze_loop for details.

I hope the patch makes the (mostly preexisting) structure a bit
more obvious.  It isn't what anyone would design from scratch,
but that's the nature of working with a mature vector framework.

Arranging things this way means that vect_verify_full_masking
and vect_verify_loop_lens now become part of the “can” rather
than “will” test for partial vectors.

Also, while splitting out the logic that handles epilogues with
constant iterations, I added a check to make sure that we don't
try to use partial vectors to vectorise a single-scalar loop.
This required some changes to the Power tests.

Tested on aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
powerpc64le-linux-gnu.  OK to install?

Richard


gcc/
	* tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
	(vect_determine_partial_vectors_and_peeling): ...this new function.
	* tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
	Reject using vector epilogue loops for single iterations.  Install
	the constant number of epilogue loop iterations in the associated
	loop_vinfo.  Rely on vect_determine_partial_vectors_and_peeling
	to do the main part of the test.
	(vect_do_peeling): Use vect_update_epilogue_niters to handle
	epilogue loops with a known number of iterations.  Skip recomputing
	the number of iterations later in that case.  Otherwise, use
	vect_determine_partial_vectors_and_peeling to decide whether the
	epilogue loop needs to use partial vectors or peeling.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
	default can_use_partial_vectors_p to false if partial-vector-usage=0.
	(determine_peel_for_niter): Remove in favor of...
	(vect_determine_partial_vectors_and_peeling): ...this new function,
	split out from...
	(vect_analyze_loop_2): ...here.  Reflect the vect_verify_full_masking
	and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
	rather than USING_PARTIAL_VECTORS_P.

gcc/testsuite/
	* gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
	single-iteration epilogues of the 64-bit loops to be vectorized.
	* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
	* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
---
 .../gcc.target/powerpc/p9-vec-length-epil-1.c |   4 +-
 .../gcc.target/powerpc/p9-vec-length-epil-7.c |   2 +-
 .../gcc.target/powerpc/p9-vec-length-epil-8.c |   4 +-
 gcc/tree-vect-loop-manip.c                    |  83 +++++---
 gcc/tree-vect-loop.c                          | 196 ++++++++++++------
 gcc/tree-vectorizer.h                         |   3 +-
 6 files changed, 192 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9dffc5570e5..b7fa6bc8d2f 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *,
 extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo);
 bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
 /* Used in tree-vect-loop-manip.c */
-extern void determine_peel_for_niter (loop_vec_info);
+extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_info,
+							      bool);
 /* Used in gimple-loop-interchange.c and tree-parloops.c.  */
 extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree,
 				  enum tree_code);
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 47cfa6f4061..7cf00e6eed4 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
     rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
 }
 
+/* EPILOGUE_VINFO is an epilogue loop that we now know would need to
+   iterate exactly CONST_NITERS times.  Make a final decision about
+   whether the epilogue loop should be used, returning true if so.  */
+
+static bool
+vect_update_epilogue_niters (loop_vec_info epilogue_vinfo,
+			     unsigned HOST_WIDE_INT const_niters)
+{
+  /* Avoid wrap-around when computing const_niters - 1.  Also reject
+     using an epilogue loop for a single scalar iteration, even if
+     we could in principle implement that using partial vectors.  */
+  unsigned int gap_niters = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
+  if (const_niters <= gap_niters + 1)
+    return false;
+
+  /* Install the number of iterations.  */
+  tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo));
+  tree niters_tree = build_int_cst (niters_type, const_niters);
+  tree nitersm1_tree = build_int_cst (niters_type, const_niters - 1);
+
+  LOOP_VINFO_NITERS (epilogue_vinfo) = niters_tree;
+  LOOP_VINFO_NITERSM1 (epilogue_vinfo) = nitersm1_tree;
+
+  /* Decide what to do if the number of epilogue iterations is not
+     a multiple of the epilogue loop's vectorization factor.  */
+  return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, true);
+}
+
 /* Function vect_do_peeling.
 
    Input:
@@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   int estimated_vf;
   int prolog_peeling = 0;
   bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
+  bool vect_epilogues_updated_niters = false;
   /* We currently do not support prolog peeling if the target alignment is not
      known at compile time.  'vect_gen_prolog_loop_niters' depends on the
      target alignment being constant.  */
@@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   if (vect_epilogues
       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && prolog_peeling >= 0
-      && known_eq (vf, lowest_vf)
-      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo))
+      && known_eq (vf, lowest_vf))
     {
       unsigned HOST_WIDE_INT eiters
 	= (LOOP_VINFO_INT_NITERS (loop_vinfo)
@@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       eiters
 	= eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
 
-      unsigned int ratio;
-      unsigned int epilogue_gaps
-	= LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
-      while (!(constant_multiple_p
-	       (GET_MODE_SIZE (loop_vinfo->vector_mode),
-		GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio)
-	       && eiters >= lowest_vf / ratio + epilogue_gaps))
+      while (!vect_update_epilogue_niters (epilogue_vinfo, eiters))
 	{
 	  delete epilogue_vinfo;
 	  epilogue_vinfo = NULL;
@@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
 	  loop_vinfo->epilogue_vinfos.ordered_remove (0);
-	  epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
 	}
+      vect_epilogues_updated_niters = true;
     }
   /* Prolog loop may be skipped.  */
   bool skip_prolog = (prolog_peeling != 0);
@@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 skip_e edge.  */
       if (skip_vector)
 	{
-	  gcc_assert (update_e != NULL && skip_e != NULL);
+	  gcc_assert (update_e != NULL
+		      && skip_e != NULL
+		      && !vect_epilogues_updated_niters);
 	  gphi *new_phi = create_phi_node (make_ssa_name (TREE_TYPE (niters)),
 					   update_e->dest);
 	  tree new_ssa = make_ssa_name (TREE_TYPE (niters));
@@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  niters = PHI_RESULT (new_phi);
 	}
 
-      /* Subtract the number of iterations performed by the vectorized loop
-	 from the number of total iterations.  */
-      tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
-					  before_loop_niters,
-					  niters);
-
-      LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters;
-      LOOP_VINFO_NITERSM1 (epilogue_vinfo)
-	= fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters),
-		       epilogue_niters,
-		       build_one_cst (TREE_TYPE (epilogue_niters)));
-
       /* Set ADVANCE to the number of iterations performed by the previous
 	 loop and its prologue.  */
       *advance = niters;
 
-      /* Redo the peeling for niter analysis as the NITERs and alignment
-	 may have been updated to take the main loop into account.  */
-      determine_peel_for_niter (epilogue_vinfo);
+      if (!vect_epilogues_updated_niters)
+	{
+	  /* Subtract the number of iterations performed by the vectorized loop
+	     from the number of total iterations.  */
+	  tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
+					      before_loop_niters,
+					      niters);
+
+	  LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters;
+	  LOOP_VINFO_NITERSM1 (epilogue_vinfo)
+	    = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters),
+			   epilogue_niters,
+			   build_one_cst (TREE_TYPE (epilogue_niters)));
+
+	  /* Decide what to do if the number of epilogue iterations is not
+	     a multiple of the epilogue loop's vectorization factor.
+	     We should have rejected the loop during the analysis phase
+	     if this fails.  */
+	  if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo,
+							   true))
+	    gcc_unreachable ();
+	}
     }
 
   adjust_vec.release ();
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b1a6e1508c7..bb9d87fd200 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -814,7 +814,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     vec_outside_cost (0),
     vec_inside_cost (0),
     vectorizable (false),
-    can_use_partial_vectors_p (true),
+    can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
     using_partial_vectors_p (false),
     epil_using_partial_vectors_p (false),
     peeling_for_gaps (false),
@@ -2003,22 +2003,123 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
     }
 }
 
+/* Determine if operating on full vectors for LOOP_VINFO might leave
+   some scalar iterations still to do.  If so, decide how we should
+   handle those scalar iterations.  The possibilities are:
 
-/* Decides whether we need to create an epilogue loop to handle
-   remaining scalar iterations and sets PEELING_FOR_NITERS accordingly.  */
+   (1) Make LOOP_VINFO operate on partial vectors instead of full vectors.
+       In this case:
 
-void
-determine_peel_for_niter (loop_vec_info loop_vinfo)
+	 LOOP_VINFO_USING_PARTIAL_VECTORS_P == true
+	 LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false
+	 LOOP_VINFO_PEELING_FOR_NITER == false
+
+   (2) Make LOOP_VINFO operate on full vectors and use an epilogue loop
+       to handle the remaining scalar iterations.  In this case:
+
+	 LOOP_VINFO_USING_PARTIAL_VECTORS_P == false
+	 LOOP_VINFO_PEELING_FOR_NITER == true
+
+       There are two choices:
+
+       (2a) Consider vectorizing the epilogue loop at the same VF as the
+	    main loop, but using partial vectors instead of full vectors.
+	    In this case:
+
+	      LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == true
+
+       (2b) Consider vectorizing the epilogue loop at lower VFs only.
+	    In this case:
+
+	      LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false
+
+   When FOR_EPILOGUE_P is true, make this determination based on the
+   assumption that LOOP_VINFO is an epilogue loop, otherwise make it
+   based on the assumption that LOOP_VINFO is the main loop.  The caller
+   has made sure that the number of iterations is set appropriately for
+   this value of FOR_EPILOGUE_P.  */
+
+opt_result
+vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo,
+					    bool for_epilogue_p)
 {
-  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
+  /* Determine whether there would be any scalar iterations left over.  */
+  bool need_peeling_or_partial_vectors_p
+    = vect_need_peeling_or_partial_vectors_p (loop_vinfo);
 
-  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
-    /* The main loop handles all iterations.  */
-    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
-  else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo))
-    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
-}
+  /* Decide whether to vectorize the loop with partial vectors.  */
+  LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
+  LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+      && need_peeling_or_partial_vectors_p)
+    {
+      /* For partial-vector-usage=1, try to push the handling of partial
+	 vectors to the epilogue, with the main loop continuing to operate
+	 on full vectors.
+
+	 ??? We could then end up failing to use partial vectors if we
+	 decide to peel iterations into a prologue, and if the main loop
+	 then ends up processing fewer than VF iterations.  */
+      if (param_vect_partial_vector_usage == 1
+	  && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+	  && !vect_known_niters_smaller_than_vf (loop_vinfo))
+	LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
+      else
+	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
+    }
+
+  if (dump_enabled_p ())
+    {
+      if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "operating on partial vectors%s.\n",
+			 for_epilogue_p ? " for epilogue loop" : "");
+      else
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "operating only on full vectors%s.\n",
+			 for_epilogue_p ? " for epilogue loop" : "");
+    }
 
+  if (for_epilogue_p)
+    {
+      loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+      gcc_assert (orig_loop_vinfo);
+      if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+	gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+			      LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)));
+    }
+
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
+    {
+      /* Check that the loop processes at least one full vector.  */
+      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      tree scalar_niters = LOOP_VINFO_NITERS (loop_vinfo);
+      if (known_lt (wi::to_widest (scalar_niters), vf))
+	return opt_result::failure_at (vect_location,
+				       "loop does not have enough iterations"
+				       " to support vectorization.\n");
+
+      /* If we need to peel an extra epilogue iteration to handle data
+	 accesses with gaps, check that there are enough scalar iterations
+	 available.
+
+	 The check above is redundant with this one when peeling for gaps,
+	 but the distinction is useful for diagnostics.  */
+      tree scalar_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
+      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+	  && known_lt (wi::to_widest (scalar_nitersm1), vf))
+	return opt_result::failure_at (vect_location,
+				       "loop does not have enough iterations"
+				       " to support peeling for gaps.\n");
+    }
+
+  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+    = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+       && need_peeling_or_partial_vectors_p);
+
+  return opt_result::success ();
+}
 
 /* Function vect_analyze_loop_2.
 
@@ -2272,72 +2373,32 @@ start_over:
       LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
     }
 
-  /* Decide whether to vectorize a loop with partial vectors for
-     this vectorization factor.  */
-  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
-    {
-      /* Don't use partial vectors if we don't need to peel the loop.  */
-      if (param_vect_partial_vector_usage == 0
-	  || !vect_need_peeling_or_partial_vectors_p (loop_vinfo))
-	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
-      else if (vect_verify_full_masking (loop_vinfo)
-	       || vect_verify_loop_lens (loop_vinfo))
-	{
-	  /* The epilogue and other known niters less than VF
-	    cases can still use vector access with length fully.  */
-	  if (param_vect_partial_vector_usage == 1
-	      && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
-	      && !vect_known_niters_smaller_than_vf (loop_vinfo))
-	    {
-	      LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
-	      LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
-	    }
-	  else
-	    LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
-	}
-      else
-	LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
-    }
-  else
-    LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
-
-  if (dump_enabled_p ())
-    {
-      if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "operating on partial vectors.\n");
-      else
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "operating only on full vectors.\n");
-    }
-
-  /* If epilog loop is required because of data accesses with gaps,
-     one additional iteration needs to be peeled.  Check if there is
-     enough iterations for vectorization.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
-    {
-      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-      tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
-
-      if (known_lt (wi::to_widest (scalar_niters), vf))
-	return opt_result::failure_at (vect_location,
-				       "loop has no enough iterations to"
-				       " support peeling for gaps.\n");
-    }
+  /* If we still have the option of using partial vectors,
+     check whether we can generate the necessary loop controls.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+      && !vect_verify_full_masking (loop_vinfo)
+      && !vect_verify_loop_lens (loop_vinfo))
+    LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
 
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
      to be able to handle fewer than VF scalars, or needs to have a lower VF
      than the main loop.  */
   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
-      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+      && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
 		   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
     return opt_result::failure_at (vect_location,
 				   "Vectorization factor too high for"
 				   " epilogue loop.\n");
 
+  /* Decide whether this loop_vinfo should use partial vectors or peeling,
+     assuming that the loop will be used as a main loop.  We will redo
+     this analysis later if we instead decide to use the loop as an
+     epilogue loop.  */
+  ok = vect_determine_partial_vectors_and_peeling (loop_vinfo, false);
+  if (!ok)
+    return ok;
+
   /* Check the costings of the loop make vectorizing worthwhile.  */
   res = vect_analyze_loop_costing (loop_vinfo);
   if (res < 0)
@@ -2350,7 +2411,6 @@ start_over:
     return opt_result::failure_at (vect_location,
 				   "Loop costings not worthwhile.\n");
 
-  determine_peel_for_niter (loop_vinfo);
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
       || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
index ebb2f45c917..d248f091b52 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
@@ -10,6 +10,6 @@
 
 /* { dg-final { scan-assembler-times {\mlxvx?\M} 20 } } */
 /* { dg-final { scan-assembler-times {\mstxvx?\M} 10 } } */
-/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 14 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
index 9d403287923..a27ee347ca1 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
@@ -8,4 +8,4 @@
 
 #include "p9-vec-length-7.h"
 
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
index 6b54a29efaa..961df0d5646 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
@@ -8,5 +8,5 @@
 
 #include "p9-vec-length-8.h"
 
-/* { dg-final { scan-assembler-times {\mlxvl\M} 30 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect: Fix epilogue loop handling of partial vectors
  2020-09-22 14:34       ` [PATCH] vect: Fix epilogue loop handling of partial vectors Richard Sandiford
@ 2020-09-23  3:07         ` Kewen.Lin
  2020-09-23 11:33           ` Richard Sandiford
  2020-09-23  7:58         ` Richard Biener
  1 sibling, 1 reply; 13+ messages in thread
From: Kewen.Lin @ 2020-09-23  3:07 UTC (permalink / raw)
  To: richard.sandiford
  Cc: Andrea Corallo, GCC Patches, Segher Boessenkool, Bill Schmidt,
	Richard Biener

Hi Richard,

on 2020/9/22 下午10:34, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@arm.com> writes:
>> I'll try to have a patch ready tomorrow morning European time.
> 
> Well, I totally failed to hit that deadline.  When testing on Power,
> I saw a couple of extra failures, but I now think they're improvements

Thanks for the testing!

> rather than regressions.  See the point about single-iteration
> epilogues below.
> 
> ---
> 
> This patch fixes the fallout that Kewen reported on Power after
> the recent change to avoid unnecessary use of partial vectors.
> As Kewen said, the problem is that vect_analyze_loop_2 doesn't
> know how many epilogue iterations there will be, and so it
> cannot make a final decision about whether the number of
> iterations forces an epilogue loop to use partial vectors.
> 
> This is similar to the current situation for peeling: we don't know
> during initial analysis whether an epilogue loop will itself require
> peeling.  Instead we decide that during vect_do_peeling, where the
> final number of epilogue loop iterations is known.
> 
> The patch takes a similar approach for the decision about whether
> to use partial vectors.  As the comments in the patch say, the
> idea is that vect_analyze_loop_2 should make peeling and partial-
> vector decisions based on the assumption that the loop_vinfo will
> be used as the main loop, while vect_do_peeling should make them
> in the knowledge that the loop_vinfo will be used as an epilogue loop.
> 
> This allows the same analysis to be used for both cases, which we
> rely on for implementing VECT_COMPARE_COSTS; see the big comment
> in vect_analyze_loop for details.
> 
> I hope the patch makes the (mostly preexisting) structure a bit
> more obvious.  It isn't what anyone would design from scratch,
> but that's the nature of working with a mature vector framework.
> 
> Arranging things this way means that vect_verify_full_masking
> and vect_verify_loop_lens now become part of the “can” rather
> than “will” test for partial vectors.
> 
> Also, while splitting out the logic that handles epilogues with
> constant iterations, I added a check to make sure that we don't
> try to use partial vectors to vectorise a single-scalar loop.
> This required some changes to the Power tests.

Thanks for the great rework!  Using partial vector should be more
costly than just being with one single scalar iteration, I think
this is an improvement.  But I'm not sure whether someone would
argue that in the context of no-vect-cost-model, it's expected to
go with vectorized code, cost-modeling check can punt this when
it takes effects.  Personally I'm fine on this anyway.

BR,
Kewen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect: Fix epilogue loop handling of partial vectors
  2020-09-22 14:34       ` [PATCH] vect: Fix epilogue loop handling of partial vectors Richard Sandiford
  2020-09-23  3:07         ` Kewen.Lin
@ 2020-09-23  7:58         ` Richard Biener
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Biener @ 2020-09-23  7:58 UTC (permalink / raw)
  To: Andrea Corallo, Kewen.Lin, GCC Patches, Segher Boessenkool,
	Bill Schmidt, Richard Biener, Richard Sandiford

On Tue, Sep 22, 2020 at 4:34 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > I'll try to have a patch ready tomorrow morning European time.
>
> Well, I totally failed to hit that deadline.  When testing on Power,
> I saw a couple of extra failures, but I now think they're improvements
> rather than regressions.  See the point about single-iteration
> epilogues below.
>
> ---
>
> This patch fixes the fallout that Kewen reported on Power after
> the recent change to avoid unnecessary use of partial vectors.
> As Kewen said, the problem is that vect_analyze_loop_2 doesn't
> know how many epilogue iterations there will be, and so it
> cannot make a final decision about whether the number of
> iterations forces an epilogue loop to use partial vectors.
>
> This is similar to the current situation for peeling: we don't know
> during initial analysis whether an epilogue loop will itself require
> peeling.  Instead we decide that during vect_do_peeling, where the
> final number of epilogue loop iterations is known.
>
> The patch takes a similar approach for the decision about whether
> to use partial vectors.  As the comments in the patch say, the
> idea is that vect_analyze_loop_2 should make peeling and partial-
> vector decisions based on the assumption that the loop_vinfo will
> be used as the main loop, while vect_do_peeling should make them
> in the knowledge that the loop_vinfo will be used as an epilogue loop.
>
> This allows the same analysis to be used for both cases, which we
> rely on for implementing VECT_COMPARE_COSTS; see the big comment
> in vect_analyze_loop for details.
>
> I hope the patch makes the (mostly preexisting) structure a bit
> more obvious.  It isn't what anyone would design from scratch,
> but that's the nature of working with a mature vector framework.

Indeed :/

> Arranging things this way means that vect_verify_full_masking
> and vect_verify_loop_lens now become part of the “can” rather
> than “will” test for partial vectors.
>
> Also, while splitting out the logic that handles epilogues with
> constant iterations, I added a check to make sure that we don't
> try to use partial vectors to vectorise a single-scalar loop.
> This required some changes to the Power tests.
>
> Tested on aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
> powerpc64le-linux-gnu.  OK to install?

Thanks for the nice work.

OK.
Richard.

> Richard
>
>
> gcc/
>         * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
>         (vect_determine_partial_vectors_and_peeling): ...this new function.
>         * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
>         Reject using vector epilogue loops for single iterations.  Install
>         the constant number of epilogue loop iterations in the associated
>         loop_vinfo.  Rely on vect_determine_partial_vectors_and_peeling
>         to do the main part of the test.
>         (vect_do_peeling): Use vect_update_epilogue_niters to handle
>         epilogue loops with a known number of iterations.  Skip recomputing
>         the number of iterations later in that case.  Otherwise, use
>         vect_determine_partial_vectors_and_peeling to decide whether the
>         epilogue loop needs to use partial vectors or peeling.
>         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
>         default can_use_partial_vectors_p to false if partial-vector-usage=0.
>         (determine_peel_for_niter): Remove in favor of...
>         (vect_determine_partial_vectors_and_peeling): ...this new function,
>         split out from...
>         (vect_analyze_loop_2): ...here.  Reflect the vect_verify_full_masking
>         and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
>         rather than USING_PARTIAL_VECTORS_P.
>
> gcc/testsuite/
>         * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
>         single-iteration epilogues of the 64-bit loops to be vectorized.
>         * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
>         * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
> ---
>  .../gcc.target/powerpc/p9-vec-length-epil-1.c |   4 +-
>  .../gcc.target/powerpc/p9-vec-length-epil-7.c |   2 +-
>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |   4 +-
>  gcc/tree-vect-loop-manip.c                    |  83 +++++---
>  gcc/tree-vect-loop.c                          | 196 ++++++++++++------
>  gcc/tree-vectorizer.h                         |   3 +-
>  6 files changed, 192 insertions(+), 100 deletions(-)
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 9dffc5570e5..b7fa6bc8d2f 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *,
>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo);
>  bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
>  /* Used in tree-vect-loop-manip.c */
> -extern void determine_peel_for_niter (loop_vec_info);
> +extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_info,
> +                                                             bool);
>  /* Used in gimple-loop-interchange.c and tree-parloops.c.  */
>  extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree,
>                                   enum tree_code);
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 47cfa6f4061..7cf00e6eed4 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
>      rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
>  }
>
> +/* EPILOGUE_VINFO is an epilogue loop that we now know would need to
> +   iterate exactly CONST_NITERS times.  Make a final decision about
> +   whether the epilogue loop should be used, returning true if so.  */
> +
> +static bool
> +vect_update_epilogue_niters (loop_vec_info epilogue_vinfo,
> +                            unsigned HOST_WIDE_INT const_niters)
> +{
> +  /* Avoid wrap-around when computing const_niters - 1.  Also reject
> +     using an epilogue loop for a single scalar iteration, even if
> +     we could in principle implement that using partial vectors.  */
> +  unsigned int gap_niters = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
> +  if (const_niters <= gap_niters + 1)
> +    return false;
> +
> +  /* Install the number of iterations.  */
> +  tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo));
> +  tree niters_tree = build_int_cst (niters_type, const_niters);
> +  tree nitersm1_tree = build_int_cst (niters_type, const_niters - 1);
> +
> +  LOOP_VINFO_NITERS (epilogue_vinfo) = niters_tree;
> +  LOOP_VINFO_NITERSM1 (epilogue_vinfo) = nitersm1_tree;
> +
> +  /* Decide what to do if the number of epilogue iterations is not
> +     a multiple of the epilogue loop's vectorization factor.  */
> +  return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, true);
> +}
> +
>  /* Function vect_do_peeling.
>
>     Input:
> @@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>    int estimated_vf;
>    int prolog_peeling = 0;
>    bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> +  bool vect_epilogues_updated_niters = false;
>    /* We currently do not support prolog peeling if the target alignment is not
>       known at compile time.  'vect_gen_prolog_loop_niters' depends on the
>       target alignment being constant.  */
> @@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>    if (vect_epilogues
>        && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && prolog_peeling >= 0
> -      && known_eq (vf, lowest_vf)
> -      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo))
> +      && known_eq (vf, lowest_vf))
>      {
>        unsigned HOST_WIDE_INT eiters
>         = (LOOP_VINFO_INT_NITERS (loop_vinfo)
> @@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        eiters
>         = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
>
> -      unsigned int ratio;
> -      unsigned int epilogue_gaps
> -       = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
> -      while (!(constant_multiple_p
> -              (GET_MODE_SIZE (loop_vinfo->vector_mode),
> -               GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio)
> -              && eiters >= lowest_vf / ratio + epilogue_gaps))
> +      while (!vect_update_epilogue_niters (epilogue_vinfo, eiters))
>         {
>           delete epilogue_vinfo;
>           epilogue_vinfo = NULL;
> @@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>             }
>           epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
>           loop_vinfo->epilogue_vinfos.ordered_remove (0);
> -         epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
>         }
> +      vect_epilogues_updated_niters = true;
>      }
>    /* Prolog loop may be skipped.  */
>    bool skip_prolog = (prolog_peeling != 0);
> @@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>          skip_e edge.  */
>        if (skip_vector)
>         {
> -         gcc_assert (update_e != NULL && skip_e != NULL);
> +         gcc_assert (update_e != NULL
> +                     && skip_e != NULL
> +                     && !vect_epilogues_updated_niters);
>           gphi *new_phi = create_phi_node (make_ssa_name (TREE_TYPE (niters)),
>                                            update_e->dest);
>           tree new_ssa = make_ssa_name (TREE_TYPE (niters));
> @@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>           niters = PHI_RESULT (new_phi);
>         }
>
> -      /* Subtract the number of iterations performed by the vectorized loop
> -        from the number of total iterations.  */
> -      tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
> -                                         before_loop_niters,
> -                                         niters);
> -
> -      LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters;
> -      LOOP_VINFO_NITERSM1 (epilogue_vinfo)
> -       = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters),
> -                      epilogue_niters,
> -                      build_one_cst (TREE_TYPE (epilogue_niters)));
> -
>        /* Set ADVANCE to the number of iterations performed by the previous
>          loop and its prologue.  */
>        *advance = niters;
>
> -      /* Redo the peeling for niter analysis as the NITERs and alignment
> -        may have been updated to take the main loop into account.  */
> -      determine_peel_for_niter (epilogue_vinfo);
> +      if (!vect_epilogues_updated_niters)
> +       {
> +         /* Subtract the number of iterations performed by the vectorized loop
> +            from the number of total iterations.  */
> +         tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters),
> +                                             before_loop_niters,
> +                                             niters);
> +
> +         LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters;
> +         LOOP_VINFO_NITERSM1 (epilogue_vinfo)
> +           = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters),
> +                          epilogue_niters,
> +                          build_one_cst (TREE_TYPE (epilogue_niters)));
> +
> +         /* Decide what to do if the number of epilogue iterations is not
> +            a multiple of the epilogue loop's vectorization factor.
> +            We should have rejected the loop during the analysis phase
> +            if this fails.  */
> +         if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo,
> +                                                          true))
> +           gcc_unreachable ();
> +       }
>      }
>
>    adjust_vec.release ();
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index b1a6e1508c7..bb9d87fd200 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -814,7 +814,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      vec_outside_cost (0),
>      vec_inside_cost (0),
>      vectorizable (false),
> -    can_use_partial_vectors_p (true),
> +    can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
>      using_partial_vectors_p (false),
>      epil_using_partial_vectors_p (false),
>      peeling_for_gaps (false),
> @@ -2003,22 +2003,123 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
>      }
>  }
>
> +/* Determine if operating on full vectors for LOOP_VINFO might leave
> +   some scalar iterations still to do.  If so, decide how we should
> +   handle those scalar iterations.  The possibilities are:
>
> -/* Decides whether we need to create an epilogue loop to handle
> -   remaining scalar iterations and sets PEELING_FOR_NITERS accordingly.  */
> +   (1) Make LOOP_VINFO operate on partial vectors instead of full vectors.
> +       In this case:
>
> -void
> -determine_peel_for_niter (loop_vec_info loop_vinfo)
> +        LOOP_VINFO_USING_PARTIAL_VECTORS_P == true
> +        LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false
> +        LOOP_VINFO_PEELING_FOR_NITER == false
> +
> +   (2) Make LOOP_VINFO operate on full vectors and use an epilogue loop
> +       to handle the remaining scalar iterations.  In this case:
> +
> +        LOOP_VINFO_USING_PARTIAL_VECTORS_P == false
> +        LOOP_VINFO_PEELING_FOR_NITER == true
> +
> +       There are two choices:
> +
> +       (2a) Consider vectorizing the epilogue loop at the same VF as the
> +           main loop, but using partial vectors instead of full vectors.
> +           In this case:
> +
> +             LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == true
> +
> +       (2b) Consider vectorizing the epilogue loop at lower VFs only.
> +           In this case:
> +
> +             LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false
> +
> +   When FOR_EPILOGUE_P is true, make this determination based on the
> +   assumption that LOOP_VINFO is an epilogue loop, otherwise make it
> +   based on the assumption that LOOP_VINFO is the main loop.  The caller
> +   has made sure that the number of iterations is set appropriately for
> +   this value of FOR_EPILOGUE_P.  */
> +
> +opt_result
> +vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo,
> +                                           bool for_epilogue_p)
>  {
> -  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
> +  /* Determine whether there would be any scalar iterations left over.  */
> +  bool need_peeling_or_partial_vectors_p
> +    = vect_need_peeling_or_partial_vectors_p (loop_vinfo);
>
> -  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> -    /* The main loop handles all iterations.  */
> -    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
> -  else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo))
> -    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
> -}
> +  /* Decide whether to vectorize the loop with partial vectors.  */
> +  LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +  LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +      && need_peeling_or_partial_vectors_p)
> +    {
> +      /* For partial-vector-usage=1, try to push the handling of partial
> +        vectors to the epilogue, with the main loop continuing to operate
> +        on full vectors.
> +
> +        ??? We could then end up failing to use partial vectors if we
> +        decide to peel iterations into a prologue, and if the main loop
> +        then ends up processing fewer than VF iterations.  */
> +      if (param_vect_partial_vector_usage == 1
> +         && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +         && !vect_known_niters_smaller_than_vf (loop_vinfo))
> +       LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> +      else
> +       LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "operating on partial vectors%s.\n",
> +                        for_epilogue_p ? " for epilogue loop" : "");
> +      else
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "operating only on full vectors%s.\n",
> +                        for_epilogue_p ? " for epilogue loop" : "");
> +    }
>
> +  if (for_epilogue_p)
> +    {
> +      loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
> +      gcc_assert (orig_loop_vinfo);
> +      if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> +       gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> +                             LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)));
> +    }
> +
> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> +    {
> +      /* Check that the loop processes at least one full vector.  */
> +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +      tree scalar_niters = LOOP_VINFO_NITERS (loop_vinfo);
> +      if (known_lt (wi::to_widest (scalar_niters), vf))
> +       return opt_result::failure_at (vect_location,
> +                                      "loop does not have enough iterations"
> +                                      " to support vectorization.\n");
> +
> +      /* If we need to peel an extra epilogue iteration to handle data
> +        accesses with gaps, check that there are enough scalar iterations
> +        available.
> +
> +        The check above is redundant with this one when peeling for gaps,
> +        but the distinction is useful for diagnostics.  */
> +      tree scalar_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +         && known_lt (wi::to_widest (scalar_nitersm1), vf))
> +       return opt_result::failure_at (vect_location,
> +                                      "loop does not have enough iterations"
> +                                      " to support peeling for gaps.\n");
> +    }
> +
> +  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +    = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> +       && need_peeling_or_partial_vectors_p);
> +
> +  return opt_result::success ();
> +}
>
>  /* Function vect_analyze_loop_2.
>
> @@ -2272,72 +2373,32 @@ start_over:
>        LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>      }
>
> -  /* Decide whether to vectorize a loop with partial vectors for
> -     this vectorization factor.  */
> -  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> -    {
> -      /* Don't use partial vectors if we don't need to peel the loop.  */
> -      if (param_vect_partial_vector_usage == 0
> -         || !vect_need_peeling_or_partial_vectors_p (loop_vinfo))
> -       LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -      else if (vect_verify_full_masking (loop_vinfo)
> -              || vect_verify_loop_lens (loop_vinfo))
> -       {
> -         /* The epilogue and other known niters less than VF
> -           cases can still use vector access with length fully.  */
> -         if (param_vect_partial_vector_usage == 1
> -             && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> -             && !vect_known_niters_smaller_than_vf (loop_vinfo))
> -           {
> -             LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -             LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> -           }
> -         else
> -           LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> -       }
> -      else
> -       LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -    }
> -  else
> -    LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -
> -  if (dump_enabled_p ())
> -    {
> -      if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> -       dump_printf_loc (MSG_NOTE, vect_location,
> -                        "operating on partial vectors.\n");
> -      else
> -       dump_printf_loc (MSG_NOTE, vect_location,
> -                        "operating only on full vectors.\n");
> -    }
> -
> -  /* If epilog loop is required because of data accesses with gaps,
> -     one additional iteration needs to be peeled.  Check if there is
> -     enough iterations for vectorization.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> -      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> -    {
> -      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -      tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
> -
> -      if (known_lt (wi::to_widest (scalar_niters), vf))
> -       return opt_result::failure_at (vect_location,
> -                                      "loop has no enough iterations to"
> -                                      " support peeling for gaps.\n");
> -    }
> +  /* If we still have the option of using partial vectors,
> +     check whether we can generate the necessary loop controls.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +      && !vect_verify_full_masking (loop_vinfo)
> +      && !vect_verify_loop_lens (loop_vinfo))
> +    LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>
>    /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>       to be able to handle fewer than VF scalars, or needs to have a lower VF
>       than the main loop.  */
>    if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> -      && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> +      && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
>        && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>                    LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>      return opt_result::failure_at (vect_location,
>                                    "Vectorization factor too high for"
>                                    " epilogue loop.\n");
>
> +  /* Decide whether this loop_vinfo should use partial vectors or peeling,
> +     assuming that the loop will be used as a main loop.  We will redo
> +     this analysis later if we instead decide to use the loop as an
> +     epilogue loop.  */
> +  ok = vect_determine_partial_vectors_and_peeling (loop_vinfo, false);
> +  if (!ok)
> +    return ok;
> +
>    /* Check the costings of the loop make vectorizing worthwhile.  */
>    res = vect_analyze_loop_costing (loop_vinfo);
>    if (res < 0)
> @@ -2350,7 +2411,6 @@ start_over:
>      return opt_result::failure_at (vect_location,
>                                    "Loop costings not worthwhile.\n");
>
> -  determine_peel_for_niter (loop_vinfo);
>    /* If an epilogue loop is required make sure we can create one.  */
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>        || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
> index ebb2f45c917..d248f091b52 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c
> @@ -10,6 +10,6 @@
>
>  /* { dg-final { scan-assembler-times {\mlxvx?\M} 20 } } */
>  /* { dg-final { scan-assembler-times {\mstxvx?\M} 10 } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 14 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> index 9d403287923..a27ee347ca1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> @@ -8,4 +8,4 @@
>
>  #include "p9-vec-length-7.h"
>
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> index 6b54a29efaa..961df0d5646 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> @@ -8,5 +8,5 @@
>
>  #include "p9-vec-length-8.h"
>
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 30 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect: Fix epilogue loop handling of partial vectors
  2020-09-23  3:07         ` Kewen.Lin
@ 2020-09-23 11:33           ` Richard Sandiford
  2020-09-24  2:35             ` Kewen.Lin
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2020-09-23 11:33 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Andrea Corallo, GCC Patches, Segher Boessenkool, Bill Schmidt,
	Richard Biener

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> on 2020/9/22 下午10:34, Richard Sandiford wrote:
>> Also, while splitting out the logic that handles epilogues with
>> constant iterations, I added a check to make sure that we don't
>> try to use partial vectors to vectorise a single-scalar loop.
>> This required some changes to the Power tests.
>
> Thanks for the great rework!  Using partial vector should be more
> costly than just being with one single scalar iteration, I think
> this is an improvement.  But I'm not sure whether someone would
> argue that in the context of no-vect-cost-model, it's expected to
> go with vectorized code, cost-modeling check can punt this when
> it takes effects.  Personally I'm fine on this anyway.

Yeah, if we costed the epilogue loop against its final iteration
count, we'd hopefully reject it then.  But the same problem applies
to costing as to other things: we do the costing based on the number
of iterations of the main loop, rather than on the number of iterations
of the epilogue.

FWIW, we don't try to vectorize single-iteration main loops either
(although of course they shouldn't exist in the first place :-)).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vect: Fix epilogue loop handling of partial vectors
  2020-09-23 11:33           ` Richard Sandiford
@ 2020-09-24  2:35             ` Kewen.Lin
  0 siblings, 0 replies; 13+ messages in thread
From: Kewen.Lin @ 2020-09-24  2:35 UTC (permalink / raw)
  To: richard.sandiford
  Cc: Andrea Corallo, GCC Patches, Segher Boessenkool, Bill Schmidt,
	Richard Biener

on 2020/9/23 下午7:33, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> on 2020/9/22 下午10:34, Richard Sandiford wrote:
>>> Also, while splitting out the logic that handles epilogues with
>>> constant iterations, I added a check to make sure that we don't
>>> try to use partial vectors to vectorise a single-scalar loop.
>>> This required some changes to the Power tests.
>>
>> Thanks for the great rework!  Using partial vector should be more
>> costly than just being with one single scalar iteration, I think
>> this is an improvement.  But I'm not sure whether someone would
>> argue that in the context of no-vect-cost-model, it's expected to
>> go with vectorized code, cost-modeling check can punt this when
>> it takes effects.  Personally I'm fine on this anyway.
> 
> Yeah, if we costed the epilogue loop against its final iteration
> count, we'd hopefully reject it then.  But the same problem applies
> to costing as to other things: we do the costing based on the number
> of iterations of the main loop, rather than on the number of iterations
> of the epilogue.

Indeed!  The iteration count estimation for epilogue loop is inexact
in most cases (VF-1 can only be same as the actual value for V2DI/V2DF).

> 
> FWIW, we don't try to vectorize single-iteration main loops either
> (although of course they shouldn't exist in the first place :-)).
> 

So true.  :-)

BR,
Kewen

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-24  2:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18  2:37 [PATCH] vect/test: Don't check for epilogue loop [PR97075] Kewen.Lin
2020-09-18 15:46 ` Segher Boessenkool
2020-09-18 18:17 ` Richard Sandiford
2020-09-20 14:03   ` Kewen.Lin
2020-09-21  6:50     ` Richard Sandiford
2020-09-22  3:23       ` Kewen.Lin
2020-09-21  7:29   ` Andrea Corallo
2020-09-21  8:32     ` Richard Sandiford
2020-09-22 14:34       ` [PATCH] vect: Fix epilogue loop handling of partial vectors Richard Sandiford
2020-09-23  3:07         ` Kewen.Lin
2020-09-23 11:33           ` Richard Sandiford
2020-09-24  2:35             ` Kewen.Lin
2020-09-23  7:58         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).