adjust vectorization expectations for ppc costmodel 76b

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* adjust vectorization expectations for ppc costmodel 76b
@ 2021-03-10  9:12 Alexandre Oliva
  2024-04-22  9:28 ` [PATCH] " Alexandre Oliva
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandre Oliva @ 2021-03-10  9:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: Rainer Orth, Mike Stump, Segher Boessenkool, David Edelsohn


This test expects vectorization at power8+ because strict alignment is
not required for vectors.  For power7, vectorization is not to take
place because it's not deemed profitable: 12 iterations would be
required to make it so.

But for power6 and below, the test's 10 iterations are enough to make
vectorization profitable, but the test doesn't expect this.  Assuming
the decision is indeed appropriate, I'm adjusting the expectations.

This was regstrapped on x86_64-linux-gnu, tested with a cross to a
ppc64-vxworks7r2, and I'm now also regstrapping on ppc64-linux-gnu just
to be sure.  Ok to install?


for  gcc/testsuite/ChangeLog

	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
	expectations for cpus below power7.
---
 .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index 5da4343198c10..937985012286c 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -45,9 +45,10 @@ int main (void)
   return 0;
 }
 
-/* Peeling to align the store is used. Overhead of peeling is too high.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { vector_alignment_reachable && {! vect_no_align} } } } } */
-/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */
+/* Peeling to align the store is used. Overhead of peeling is too high
+   for power7, but acceptable for earlier architectures.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_hw_misalign} } } } } } */
 
 /* Versioning to align the store is used. Overhead of versioning is not too high.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || {! vector_alignment_reachable} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } } } } */

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist         GNU Toolchain Engineer
        Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2021-03-10  9:12 adjust vectorization expectations for ppc costmodel 76b Alexandre Oliva
@ 2024-04-22  9:28 ` Alexandre Oliva
  2024-04-24  8:24   ` Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandre Oliva @ 2024-04-22  9:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool, Kewen Lin

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html


This test expects vectorization at power8+ because strict alignment is
not required for vectors.  For power7, vectorization is not to take
place because it's not deemed profitable: 12 iterations would be
required to make it so.

But for power6 and below, the test's 10 iterations are enough to make
vectorization profitable, but the test doesn't expect this.  Assuming
the decision is indeed appropriate, I'm adjusting the expectations.


for  gcc/testsuite/ChangeLog

	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
	expectations for cpus below power7.
---
 .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index cbbfbb24658f8..0dab2c08acdb4 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -46,9 +46,10 @@ int main (void)
   return 0;
 }
 
-/* Peeling to align the store is used. Overhead of peeling is too high.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { vector_alignment_reachable && {! vect_no_align} } } } } */
-/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */
+/* Peeling to align the store is used. Overhead of peeling is too high
+   for power7, but acceptable for earlier architectures.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_hw_misalign} } } } } } */
 
 /* Versioning to align the store is used. Overhead of versioning is not too high.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || {! vector_alignment_reachable} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } } } } */


-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2024-04-22  9:28 ` [PATCH] " Alexandre Oliva
@ 2024-04-24  8:24   ` Kewen.Lin
  2024-04-28  8:14     ` Alexandre Oliva
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2024-04-24  8:24 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool,
	Kewen Lin, gcc-patches

Hi,

on 2024/4/22 17:28, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html
> 
> 
> This test expects vectorization at power8+ because strict alignment is
> not required for vectors.  For power7, vectorization is not to take
> place because it's not deemed profitable: 12 iterations would be
> required to make it so.
> 
> But for power6 and below, the test's 10 iterations are enough to make
> vectorization profitable, but the test doesn't expect this.  Assuming
> the decision is indeed appropriate, I'm adjusting the expectations.

For a record, the cost difference between power6 and power7 is the cost
for vec_perm, it's:

* p6 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 1 in body

vs.

* p7 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 3 in body

, it further cause minimum iters for profitability difference.

> 
> 
> for  gcc/testsuite/ChangeLog
> 
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
> 	expectations for cpus below power7.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index cbbfbb24658f8..0dab2c08acdb4 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -46,9 +46,10 @@ int main (void)
>    return 0;
>  }
>  
> -/* Peeling to align the store is used. Overhead of peeling is too high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { vector_alignment_reachable && {! vect_no_align} } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */
> +/* Peeling to align the store is used. Overhead of peeling is too high
> +   for power7, but acceptable for earlier architectures.  */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { has_arch_pwr7 && { vector_alignment_reachable && {! vect_hw_misalign} } } } } } */
>  
>  /* Versioning to align the store is used. Overhead of versioning is not too high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || {! vector_alignment_reachable} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } } } } */

For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line above)
shows the original intention of this case is to expect not profitable for peeling
so it's not expected to be handled here, can we just tweak the loop bound instead,
such as:

-#define N 14
+#define N 13
 #define OFF 4 

?, it can make this loop not profitable to be vectorized for !vect_no_align with
peeling (both pwr7 and pwr6) and keep consistent.

BR,
Kewen

> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2024-04-24  8:24   ` Kewen.Lin
@ 2024-04-28  8:14     ` Alexandre Oliva
  2024-04-28  9:31       ` Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandre Oliva @ 2024-04-28  8:14 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool,
	Kewen Lin, gcc-patches

On Apr 24, 2024, "Kewen.Lin" <linkw@linux.ibm.com> wrote:

> For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line above)
> shows the original intention of this case is to expect not profitable for peeling
> so it's not expected to be handled here, can we just tweak the loop bound instead,
> such as:

> -#define N 14
> +#define N 13
>  #define OFF 4 

> ?, it can make this loop not profitable to be vectorized for !vect_no_align with
> peeling (both pwr7 and pwr6) and keep consistent.

Like this?  I didn't feel I could claim authorship of this one-liner
just because I turned it into a patch and tested it, so I took the
liberty of turning your own words above into the commit message.  So
far, tested on ppc64le-linux-gnu (ppc9).  Testing with vxworks targets
now.  Would you like to tweak the commit message to your liking?
Otherwise, is this ok to install?

Thanks,


adjust iteration count for ppc costmodel 76b

From: Kewen Lin <linkw@linux.ibm.com>

The original intention of this case is to expect not profitable for
peeling.  Tweak the loop bound to make this loop not profitable to be
vectorized for !vect_no_align with peeling (both pwr7 and pwr6) and
keep consistent.


for  gcc/testsuite/ChangeLog

	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c (N): Tweak.
---
 .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index cbbfbb24658f8..e48b0ab759e75 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -6,7 +6,7 @@
 
 /* On Power7 without misalign vector support, this case is to check it's not
    profitable to perform vectorization by peeling to align the store.  */
-#define N 14
+#define N 13
 #define OFF 4
 
 /* Check handling of accesses for which the "initial condition" -


-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2024-04-28  8:14     ` Alexandre Oliva
@ 2024-04-28  9:31       ` Kewen.Lin
  2024-04-29  6:28         ` Alexandre Oliva
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2024-04-28  9:31 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool,
	Kewen Lin, gcc-patches

Hi,

on 2024/4/28 16:14, Alexandre Oliva wrote:
> On Apr 24, 2024, "Kewen.Lin" <linkw@linux.ibm.com> wrote:
> 
>> For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line above)
>> shows the original intention of this case is to expect not profitable for peeling
>> so it's not expected to be handled here, can we just tweak the loop bound instead,
>> such as:
> 
>> -#define N 14
>> +#define N 13
>>  #define OFF 4 
> 
>> ?, it can make this loop not profitable to be vectorized for !vect_no_align with
>> peeling (both pwr7 and pwr6) and keep consistent.
> 
> Like this?  I didn't feel I could claim authorship of this one-liner
> just because I turned it into a patch and tested it, so I took the
> liberty of turning your own words above into the commit message.  So

Feel free to do so!

> far, tested on ppc64le-linux-gnu (ppc9).  Testing with vxworks targets
> now.  Would you like to tweak the commit message to your liking?

OK, tweaked as below.

> Otherwise, is this ok to install?
> 
> Thanks,
> 
> 
> adjust iteration count for ppc costmodel 76b

Nit: Maybe add a prefix "testsuite: ".

> 
> From: Kewen Lin <linkw@linux.ibm.com>

Thanks, you can just drop this.  :)

> 
> The original intention of this case is to expect not profitable for
> peeling.  Tweak the loop bound to make this loop not profitable to be
> vectorized for !vect_no_align with peeling (both pwr7 and pwr6) and
> keep consistent.

For some hardware which doesn't support unaligned vector memory access,
test case costmodel-vect-76b.c expects to see cost modeling would make
the decision that it's not profitable for peeling, according to the
commit history, test case comments and the way to check.

For now, the existing loop bound 14 works well for Power7, but it does
not for some targets on which the cost of operation vec_perm can be
different from Power7, such as: Power6, it's 3 vs. 1.  This difference
further causes the difference (10 vs. 12) on the minimum iteration for
profitability and cause the failure.  To keep the original test point,
this patch is to tweak the loop bound to ensure it's not profitable
to be vectorized for !vect_no_align with peeling.

OK for trunk (assuming the testings run well on p6/p7 too), thanks!

BR,
Kewen

> 
> 
> for  gcc/testsuite/ChangeLog
> 
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c (N): Tweak.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index cbbfbb24658f8..e48b0ab759e75 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -6,7 +6,7 @@
>  
>  /* On Power7 without misalign vector support, this case is to check it's not
>     profitable to perform vectorization by peeling to align the store.  */
> -#define N 14
> +#define N 13
>  #define OFF 4
>  
>  /* Check handling of accesses for which the "initial condition" -
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2024-04-28  9:31       ` Kewen.Lin
@ 2024-04-29  6:28         ` Alexandre Oliva
  2024-04-29  8:56           ` Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandre Oliva @ 2024-04-29  6:28 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool,
	Kewen Lin, gcc-patches

On Apr 28, 2024, "Kewen.Lin" <linkw@linux.ibm.com> wrote:

> Nit: Maybe add a prefix "testsuite: ".

ACK

>> 
>> From: Kewen Lin <linkw@linux.ibm.com>

> Thanks, you can just drop this.  :)

I've turned it into Co-Authored-By, since you insist.

But unfortunately with the patch it still fails when testing for
-mcpu=power7 on ppc64le-linux-gnu: it does vectorize the loop with 13
iterations.  We need 16 iterations, as in an earlier version of this
test, for it to pass for -mcpu=power7, but then it doesn't pass for
-mcpu=power6.

It looks like we're going to have to adjust the expectations.

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b
  2024-04-29  6:28         ` Alexandre Oliva
@ 2024-04-29  8:56           ` Kewen.Lin
  0 siblings, 0 replies; 7+ messages in thread
From: Kewen.Lin @ 2024-04-29  8:56 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Rainer Orth, Mike Stump, David Edelsohn, Segher Boessenkool,
	Kewen Lin, gcc-patches

on 2024/4/29 14:28, Alexandre Oliva wrote:
> On Apr 28, 2024, "Kewen.Lin" <linkw@linux.ibm.com> wrote:
> 
>> Nit: Maybe add a prefix "testsuite: ".
> 
> ACK
> 
>>>
>>> From: Kewen Lin <linkw@linux.ibm.com>
> 
>> Thanks, you can just drop this.  :)
> 
> I've turned it into Co-Authored-By, since you insist.
> 
> But unfortunately with the patch it still fails when testing for
> -mcpu=power7 on ppc64le-linux-gnu: it does vectorize the loop with 13
> iterations.  We need 16 iterations, as in an earlier version of this
> test, for it to pass for -mcpu=power7, but then it doesn't pass for
> -mcpu=power6.
> 
> It looks like we're going to have to adjust the expectations.
> 

I had a look at the failure, it's due to that "vect_no_align" is
evaluated as true unexpectedly.

  "selector_expression: ` vect_no_align || {! vector_alignment_reachable} ' 1"

Currently powerpc* checks check_p8vector_hw_available, ppc64le-linux-gnu
has at least Power8 support (that is testing machine supports p8vector run),
so it concludes vect_no_align is true.

proc check_effective_target_vect_no_align { } {
    return [check_cached_effective_target_indexed vect_no_align {
      expr { [istarget mipsisa64*-*-*]
	     || [istarget mips-sde-elf]
	     || [istarget sparc*-*-*]
	     || [istarget ia64-*-*]
	     || [check_effective_target_arm_vect_no_misalign]
	     || ([istarget powerpc*-*-*] && [check_p8vector_hw_available])

I'll fix this in PR113535 which was filed previously for visiting powerpc
specific check in these vect* effective targets.  If the testing just goes
with native cpu type, this issue will become invisible.  I think you can
still push the patch as the testing just exposes another issue.

BR,
Kewen


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-29  8:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10  9:12 adjust vectorization expectations for ppc costmodel 76b Alexandre Oliva
2024-04-22  9:28 ` [PATCH] " Alexandre Oliva
2024-04-24  8:24   ` Kewen.Lin
2024-04-28  8:14     ` Alexandre Oliva
2024-04-28  9:31       ` Kewen.Lin
2024-04-29  6:28         ` Alexandre Oliva
2024-04-29  8:56           ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).