[Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
@ 2023-05-25 22:05 seurer at gcc dot gnu.org
  2023-05-25 22:32 ` [Bug target/109971] " juzhe.zhong at rivai dot ai
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: seurer at gcc dot gnu.org @ 2023-05-25 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

            Bug ID: 109971
           Summary: [14 regression] Several powerpc64 vector test cases
                    fail after r14-1242-gf574e2dfae7905
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55158
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55158&action=edit
diff of assembler output between r14-1241 and r14-1242

g:f574e2dfae79055f16d0c63cc12df24815d8ead6, r14-1242-gf574e2dfae7905

make  -k check-gcc
RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/p9-vec-length-full-1.c"
FAIL: gcc.target/powerpc/p9-vec-length-full-1.c scan-assembler-times \\mlxvl\\M
20
FAIL: gcc.target/powerpc/p9-vec-length-full-1.c scan-assembler-times
\\mstxvl\\M 10
# of expected passes            5
# of unexpected failures        2

There are a few others, too:

FAIL: gcc.target/powerpc/p9-vec-length-full-2.c scan-assembler-times
\\\\mlxvl\\\\M 20
FAIL: gcc.target/powerpc/p9-vec-length-full-2.c scan-assembler-times
\\\\mstxvl\\\\M 10
FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-times
\\\\mlxvl\\\\M 10
FAIL: gcc.target/powerpc/p9-vec-length-full-6.c scan-assembler-times
\\\\mstxvl\\\\M 10
FAIL: gcc.target/powerpc/p9-vec-length-full-7.c scan-assembler-times
\\\\mstxvl\\\\M 12

There are hundreds of assembler code differences caused by the commit.  Diff
file attached.

commit f574e2dfae79055f16d0c63cc12df24815d8ead6 (HEAD, refs/bisect/bad)
Author: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Date:   Thu May 25 22:42:35 2023 +0800

    VECT: Add decrement IV iteration loop control by variable amount support

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
@ 2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
  2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-25 22:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
It seems this condition:

+  /* If we're vectorizing a loop that uses length "controls" and
+     can iterate more than once, we apply decrementing IV approach
+     in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+      && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+      && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+      && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+          && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+                       LOOP_VINFO_VECT_FACTOR (loop_vinfo))))
+    LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

dit not disable decrement IV in powerPC. 
Sorry for creating this issue since I only tested on X86.
Should I add target hook for decrement IV?
I am waiting for Richard's comments.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
  2023-05-25 22:32 ` [Bug target/109971] " juzhe.zhong at rivai dot ai
@ 2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
  2023-05-26  1:15 ` linkw at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-25 22:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
It seems this condition:

+  /* If we're vectorizing a loop that uses length "controls" and
+     can iterate more than once, we apply decrementing IV approach
+     in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+      && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+      && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+      && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+          && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+                       LOOP_VINFO_VECT_FACTOR (loop_vinfo))))
+    LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

dit not disable decrement IV in powerPC. 
Sorry for creating this issue since I only tested on X86.
Should I add target hook for decrement IV?
I am waiting for Richard's comments.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
  2023-05-25 22:32 ` [Bug target/109971] " juzhe.zhong at rivai dot ai
  2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
@ 2023-05-26  1:15 ` linkw at gcc dot gnu.org
  2023-05-26  1:51 ` juzhe.zhong at rivai dot ai
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: linkw at gcc dot gnu.org @ 2023-05-26  1:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |linkw at gcc dot gnu.org
                 CC|                            |linkw at gcc dot gnu.org

--- Comment #3 from Kewen Lin <linkw at gcc dot gnu.org> ---
I'll take a look first.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-05-26  1:15 ` linkw at gcc dot gnu.org
@ 2023-05-26  1:51 ` juzhe.zhong at rivai dot ai
  2023-05-26  5:20 ` linkw at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-26  1:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Kewen Lin from comment #3)
> I'll take a look first.

Thanks a lot. I am sorry for causing such issue to you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-05-26  1:51 ` juzhe.zhong at rivai dot ai
@ 2023-05-26  5:20 ` linkw at gcc dot gnu.org
  2023-05-26  5:28 ` juzhe.zhong at rivai dot ai
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: linkw at gcc dot gnu.org @ 2023-05-26  5:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-05-26

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #4)
> (In reply to Kewen Lin from comment #3)
> > I'll take a look first.
> 
> Thanks a lot. I am sorry for causing such issue to you.

Never mind! Some failures can't be even caught by normal testings, or not
responsible by the culprit patch itself but just exposed by it instead.

As your comment #c2, it seems that you want to disable this on Power (and s390)
for now? (It's disabled for s390 apparently since it has
LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS 1 always). 

After some checkings, I found that:
 1) for failures on p9-vec-length-full-{1,2,6}.c, the root cause is that the
main loop becomes neat and rtl pass bbro is able to duplicate it, the expected
counts on vector with length instructions change accordingly, I think they are
test issues.

With decrement IV, the optimized IR actually becomes better, it also aligns
with our discussion here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615629.html (Thanks for
the improvement!)

For example on full-1.c int8_t type:

  <bb 3> [local count: 75161909]:
  # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
  # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
  # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
  # ivtmp_29 = PHI <ivtmp_30(5), 0(2)>
  # loop_len_16 = PHI <_34(5), 16(2)>
  vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
  vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
  vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
  vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
  vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
  vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
  .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
  vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
  vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
  vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
  ivtmp_30 = ivtmp_29 + 16;
  _32 = MIN_EXPR <ivtmp_30, 127>;
  _33 = 127 - _32;
  _34 = MIN_EXPR <_33, 16>;
  if (ivtmp_30 <= 126)
    goto <bb 5>; [85.71%]
  else
    goto <bb 4>; [14.29%]

vs.

  <bb 3> [local count: 75161909]:
  # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
  # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
  # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
  # ivtmp_29 = PHI <ivtmp_30(5), 127(2)>
  loop_len_16 = MIN_EXPR <ivtmp_29, 16>;
  vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
  vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
  vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
  vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
  vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
  vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
  .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
  vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
  vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
  vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
  ivtmp_30 = ivtmp_29 - loop_len_16;
  if (ivtmp_30 != 0)
    goto <bb 5>; [85.71%]
  else
    goto <bb 4>; [14.29%]

2) for failure on p9-vec-length-full-7.c ({u,}int8_t), the IR difference causes
cunroll not to unroll the loop further, so IR has some differences during
optimized dumpings:

  <bb 4> [local count: 18146240]:
  MEM <vector(16) signed char> [(signed char *)&x_int8_t + 16B] = { 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 };
  MEM <vector(16) signed char> [(signed char *)&x_int8_t + 32B] = { 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46 };
  .LEN_STORE (&MEM <int8_t[64]> [(void *)&x_int8_t + 48B], 128B, 11, { 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 }, 0); [tail call]
  return;

vs.

  <bb 5> [local count: 72584963]:
  # vect_vec_iv_.6_50 = PHI <_51(5), { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30 }(4)>
  # ivtmp_57 = PHI <ivtmp_58(5), 43(4)>
  # ivtmp.12_11 = PHI <ivtmp.12_22(5), ivtmp.12_26(4)>
  loop_len_55 = MIN_EXPR <ivtmp_57, 16>;
  _51 = vect_vec_iv_.6_50 + { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16 };
  _5 = (void *) ivtmp.12_11;
  _14 = &MEM <vector(16) signed char> [(signed char *)_5];
  .LEN_STORE (_14, 128B, loop_len_55, vect_vec_iv_.6_50, 0);
  ivtmp_58 = ivtmp_57 - loop_len_55;
  ivtmp.12_22 = ivtmp.12_11 + 16;
  if (ivtmp_58 != 0)
    goto <bb 5>; [75.00%]
  else
    goto <bb 6>; [25.00%]

It exposes something inefficient at -O2, it seems we can teach cunroll further
about this kind of new sequence.

If you meant to disable decrement IV on Power (but now actually enable it
unexpectedly), then probably we can just keep it (not disabling), for Power we
mainly adopt --param=vect-partial-vector-usage=1, it shouldn't be affected, for
--param=vect-partial-vector-usage=2, it does generate better code sequence for
most cases and we can improve the remaining worse one gradually.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-05-26  5:20 ` linkw at gcc dot gnu.org
@ 2023-05-26  5:28 ` juzhe.zhong at rivai dot ai
  2023-05-26  6:22 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-26  5:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #6 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
>> With decrement IV, the optimized IR actually becomes better, it also aligns >>with our discussion here: https://gcc.gnu.org/pipermail/gcc-patches/2023->>April/615629.html (Thanks for the improvement!)

Oh, I see. It seems that my patch is overall beneficial to PowerPC?
That's what I want since I want my patch can not only help RVV but also
other targets.

>>It exposes something inefficient at -O2, it seems we can teach cunroll further >>about this kind of new sequence.

Ok, we can optimize it for decrement IV in the future.


>>If you meant to disable decrement IV on Power (but now actually enable it >>unexpectedly), then probably we can just keep it (not disabling), for Power we >>mainly adopt --param=vect-partial-vector-usage=1, it shouldn't be affected, for >>--param=vect-partial-vector-usage=2, it does generate better code sequence for >>most cases and we can improve the remaining worse one gradually.

I am not meant to disable decrement IV on Power. Actually, I really hope Power
can reuse the flow that I build for RVV. It makes things more meaningful.
If it works for power and it seems to improve power codegen in most cases, I'd
like to see power enable it by default. Then we can optimize it togther.

Thanks a lot for your information.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-05-26  5:28 ` juzhe.zhong at rivai dot ai
@ 2023-05-26  6:22 ` rguenth at gcc dot gnu.org
  2023-05-31  2:50 ` linkw at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-26  6:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
   Target Milestone|---                         |14.0

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Kewen Lin from comment #5)
> For example on full-1.c int8_t type:
> 
>   <bb 3> [local count: 75161909]:
>   # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
>   # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
>   # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
>   # ivtmp_29 = PHI <ivtmp_30(5), 0(2)>
>   # loop_len_16 = PHI <_34(5), 16(2)>
>   vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
>   vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
>   vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
>   vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
>   vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
>   vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
>   .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
>   vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
>   vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
>   vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
>   ivtmp_30 = ivtmp_29 + 16;
>   _32 = MIN_EXPR <ivtmp_30, 127>;
>   _33 = 127 - _32;
>   _34 = MIN_EXPR <_33, 16>;
>   if (ivtmp_30 <= 126)

With this exit condition niter analysis can work.

>     goto <bb 5>; [85.71%]
>   else
>     goto <bb 4>; [14.29%]
> 
> vs.
> 
>   <bb 3> [local count: 75161909]:
>   # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
>   # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
>   # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
>   # ivtmp_29 = PHI <ivtmp_30(5), 127(2)>
>   loop_len_16 = MIN_EXPR <ivtmp_29, 16>;
>   vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
>   vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
>   vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
>   vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
>   vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
>   vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
>   .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
>   vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
>   vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
>   vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
>   ivtmp_30 = ivtmp_29 - loop_len_16;
>   if (ivtmp_30 != 0)

While here it will fail because ivtmp_30 isn't affine - it doesn't
decrement by an invariant amount but instead by MIN <ivtmp_29, 16>.

Note this will not only pessimize niter analysis but all analyses relying
on SCEV (for uses of this IV!).

The decrement is essentially saturating to zero so we might be able to
special-case this in niter analysis - but still I don't see how to
generally handle this in SCEV.  If we know that niter will fit into
a signed IV we could rewrite the exit test to ivtmp_30 > 0 and decrement
by constant 16.  Alternatively one can test the pre-decrement value,
in the above case

  if (ivtmp_29 >= 16)

which isn't ideal for IV coalescing later but it also allows

  ivtmp_30 = ivtmp_29 - 16;

here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-05-26  6:22 ` rguenth at gcc dot gnu.org
@ 2023-05-31  2:50 ` linkw at gcc dot gnu.org
  2023-05-31  2:54 ` juzhe.zhong at rivai dot ai
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: linkw at gcc dot gnu.org @ 2023-05-31  2:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|testsuite-fail              |missed-optimization
           Assignee|linkw at gcc dot gnu.org           |juzhe.zhong at rivai dot ai

--- Comment #8 from Kewen Lin <linkw at gcc dot gnu.org> ---
I did SPEC2017 int/fp evaluation on Power10 at Ofast and an extra explicit
--param=vect-partial-vector-usage=2 (the default is 1 on Power), baseline
r14-1241 vs. new r14-1242, the results showed that it can offer some speedups
for 500.perlbench_r 1.12%, 525.x264_r 1.96%, 544.nab_r 1.91%, 549.fotonik3d_r
1.25%, but it degraded 510.parest_r by 5.01%.

I just tested Juzhe's new proposed fix which makes the loop closing iv SCEV-ed,
it can fix the degradation of 510.parest_r, also the miss optimization on
cunroll (in #c5), the test failures are gone as well. One SPEC2017
re-evaluation with that fix is ongoing, I'd expect it won't degrade anything.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-05-31  2:50 ` linkw at gcc dot gnu.org
@ 2023-05-31  2:54 ` juzhe.zhong at rivai dot ai
  2023-05-31  3:11 ` linkw at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-05-31  2:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Kewen Lin from comment #8)
> I did SPEC2017 int/fp evaluation on Power10 at Ofast and an extra explicit
> --param=vect-partial-vector-usage=2 (the default is 1 on Power), baseline
> r14-1241 vs. new r14-1242, the results showed that it can offer some
> speedups for 500.perlbench_r 1.12%, 525.x264_r 1.96%, 544.nab_r 1.91%,
> 549.fotonik3d_r 1.25%, but it degraded 510.parest_r by 5.01%.
> 
> I just tested Juzhe's new proposed fix which makes the loop closing iv
> SCEV-ed, it can fix the degradation of 510.parest_r, also the miss
> optimization on cunroll (in #c5), the test failures are gone as well. One
> SPEC2017 re-evaluation with that fix is ongoing, I'd expect it won't degrade
> anything.

Thanks so much. You mean you are trying this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html ?

I believe it can improve even more for IBM's target.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-05-31  2:54 ` juzhe.zhong at rivai dot ai
@ 2023-05-31  3:11 ` linkw at gcc dot gnu.org
  2023-06-02 11:50 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: linkw at gcc dot gnu.org @ 2023-05-31  3:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #10 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #9)
> (In reply to Kewen Lin from comment #8)
> > I did SPEC2017 int/fp evaluation on Power10 at Ofast and an extra explicit
> > --param=vect-partial-vector-usage=2 (the default is 1 on Power), baseline
> > r14-1241 vs. new r14-1242, the results showed that it can offer some
> > speedups for 500.perlbench_r 1.12%, 525.x264_r 1.96%, 544.nab_r 1.91%,
> > 549.fotonik3d_r 1.25%, but it degraded 510.parest_r by 5.01%.
> > 
> > I just tested Juzhe's new proposed fix which makes the loop closing iv
> > SCEV-ed, it can fix the degradation of 510.parest_r, also the miss
> > optimization on cunroll (in #c5), the test failures are gone as well. One
> > SPEC2017 re-evaluation with that fix is ongoing, I'd expect it won't degrade
> > anything.
> 
> Thanks so much. You mean you are trying this patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html ?

Yes, it means that Richi's concern (niter analysis but all analyses relying on
SCEV are pessimized) does affect the exposed degradation and failures. Thanks
for looking into it.

> 
> I believe it can improve even more for IBM's target.

Hope so, I'll post the new SPEC2017 results once the run finishes.

btw, the SPEC2017 run with --param=vect-partial-vector-usage=2 here is mainly
to verify the expectation on the decrement IV change, the normal SPEC2017 runs
still use --param=vect-partial-vector-usage=1 which isn't affected by this
change and it beats the former in general as the cost for length setting up.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-05-31  3:11 ` linkw at gcc dot gnu.org
@ 2023-06-02 11:50 ` cvs-commit at gcc dot gnu.org
  2023-07-15  6:01 ` pinskia at gcc dot gnu.org
  2023-07-17  2:24 ` linkw at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-02 11:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:bffc52838e393a775e13dc48162669b0f43ebe09

commit r14-1493-gbffc52838e393a775e13dc48162669b0f43ebe09
Author: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Jun 1 12:36:17 2023 +0800

    VECT: Change flow of decrement IV

    Follow Richi's suggestion, I change current decrement IV flow from:

    do {
       remain -= MIN (vf, remain);
    } while (remain != 0);

    into:

    do {
       old_remain = remain;
       len = MIN (vf, remain);
       remain -= vf;
    } while (old_remain >= vf);

    to enhance SCEV.

    Include fixes from kewen.

    This patch will need to wait for Kewen's test feedback.

    Testing on X86 is on-going

    Co-Authored by: Kewen Lin  <linkw@linux.ibm.com>

      PR tree-optimization/109971

    gcc/ChangeLog:

            * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change
decrement IV flow.
            (vect_set_loop_condition_partial_vectors): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-06-02 11:50 ` cvs-commit at gcc dot gnu.org
@ 2023-07-15  6:01 ` pinskia at gcc dot gnu.org
  2023-07-17  2:24 ` linkw at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15  6:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Has this been fixed?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
  2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-07-15  6:01 ` pinskia at gcc dot gnu.org
@ 2023-07-17  2:24 ` linkw at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: linkw at gcc dot gnu.org @ 2023-07-17  2:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #12)
> Has this been fixed?

Yes, those failures should be fixed then. Thanks for reminding.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-17  2:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 22:05 [Bug target/109971] New: [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 seurer at gcc dot gnu.org
2023-05-25 22:32 ` [Bug target/109971] " juzhe.zhong at rivai dot ai
2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
2023-05-26  1:15 ` linkw at gcc dot gnu.org
2023-05-26  1:51 ` juzhe.zhong at rivai dot ai
2023-05-26  5:20 ` linkw at gcc dot gnu.org
2023-05-26  5:28 ` juzhe.zhong at rivai dot ai
2023-05-26  6:22 ` rguenth at gcc dot gnu.org
2023-05-31  2:50 ` linkw at gcc dot gnu.org
2023-05-31  2:54 ` juzhe.zhong at rivai dot ai
2023-05-31  3:11 ` linkw at gcc dot gnu.org
2023-06-02 11:50 ` cvs-commit at gcc dot gnu.org
2023-07-15  6:01 ` pinskia at gcc dot gnu.org
2023-07-17  2:24 ` linkw at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).