[Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "linkw at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905
Date: Fri, 26 May 2023 05:20:00 +0000	[thread overview]
Message-ID: <bug-109971-4-Xifn0JM7Vz@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109971-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-05-26

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #4)
> (In reply to Kewen Lin from comment #3)
> > I'll take a look first.
> 
> Thanks a lot. I am sorry for causing such issue to you.

Never mind! Some failures can't be even caught by normal testings, or not
responsible by the culprit patch itself but just exposed by it instead.

As your comment #c2, it seems that you want to disable this on Power (and s390)
for now? (It's disabled for s390 apparently since it has
LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS 1 always). 

After some checkings, I found that:
 1) for failures on p9-vec-length-full-{1,2,6}.c, the root cause is that the
main loop becomes neat and rtl pass bbro is able to duplicate it, the expected
counts on vector with length instructions change accordingly, I think they are
test issues.

With decrement IV, the optimized IR actually becomes better, it also aligns
with our discussion here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615629.html (Thanks for
the improvement!)

For example on full-1.c int8_t type:

  <bb 3> [local count: 75161909]:
  # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
  # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
  # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
  # ivtmp_29 = PHI <ivtmp_30(5), 0(2)>
  # loop_len_16 = PHI <_34(5), 16(2)>
  vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
  vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
  vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
  vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
  vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
  vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
  .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
  vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
  vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
  vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
  ivtmp_30 = ivtmp_29 + 16;
  _32 = MIN_EXPR <ivtmp_30, 127>;
  _33 = 127 - _32;
  _34 = MIN_EXPR <_33, 16>;
  if (ivtmp_30 <= 126)
    goto <bb 5>; [85.71%]
  else
    goto <bb 4>; [14.29%]

vs.

  <bb 3> [local count: 75161909]:
  # vectp_a_int8_t.4_18 = PHI <vectp_a_int8_t.4_17(5), &a_int8_t(2)>
  # vectp_b_int8_t.8_8 = PHI <vectp_b_int8_t.8_7(5), &b_int8_t(2)>
  # vectp_c_int8_t.14_26 = PHI <vectp_c_int8_t.14_27(5), &c_int8_t(2)>
  # ivtmp_29 = PHI <ivtmp_30(5), 127(2)>
  loop_len_16 = MIN_EXPR <ivtmp_29, 16>;
  vect__1.6_13 = .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0);
  vect__2.7_12 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__1.6_13);
  vect__3.10_22 = .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0);
  vect__4.11_23 = VIEW_CONVERT_EXPR<vector(16) unsigned char>(vect__3.10_22);
  vect__5.12_24 = vect__2.7_12 + vect__4.11_23;
  vect__6.13_25 = VIEW_CONVERT_EXPR<vector(16) signed char>(vect__5.12_24);
  .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0);
  vectp_a_int8_t.4_17 = vectp_a_int8_t.4_18 + 16;
  vectp_b_int8_t.8_7 = vectp_b_int8_t.8_8 + 16;
  vectp_c_int8_t.14_27 = vectp_c_int8_t.14_26 + 16;
  ivtmp_30 = ivtmp_29 - loop_len_16;
  if (ivtmp_30 != 0)
    goto <bb 5>; [85.71%]
  else
    goto <bb 4>; [14.29%]

2) for failure on p9-vec-length-full-7.c ({u,}int8_t), the IR difference causes
cunroll not to unroll the loop further, so IR has some differences during
optimized dumpings:

  <bb 4> [local count: 18146240]:
  MEM <vector(16) signed char> [(signed char *)&x_int8_t + 16B] = { 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 };
  MEM <vector(16) signed char> [(signed char *)&x_int8_t + 32B] = { 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46 };
  .LEN_STORE (&MEM <int8_t[64]> [(void *)&x_int8_t + 48B], 128B, 11, { 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 }, 0); [tail call]
  return;

vs.

  <bb 5> [local count: 72584963]:
  # vect_vec_iv_.6_50 = PHI <_51(5), { 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30 }(4)>
  # ivtmp_57 = PHI <ivtmp_58(5), 43(4)>
  # ivtmp.12_11 = PHI <ivtmp.12_22(5), ivtmp.12_26(4)>
  loop_len_55 = MIN_EXPR <ivtmp_57, 16>;
  _51 = vect_vec_iv_.6_50 + { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16 };
  _5 = (void *) ivtmp.12_11;
  _14 = &MEM <vector(16) signed char> [(signed char *)_5];
  .LEN_STORE (_14, 128B, loop_len_55, vect_vec_iv_.6_50, 0);
  ivtmp_58 = ivtmp_57 - loop_len_55;
  ivtmp.12_22 = ivtmp.12_11 + 16;
  if (ivtmp_58 != 0)
    goto <bb 5>; [75.00%]
  else
    goto <bb 6>; [25.00%]

It exposes something inefficient at -O2, it seems we can teach cunroll further
about this kind of new sequence.

If you meant to disable decrement IV on Power (but now actually enable it
unexpectedly), then probably we can just keep it (not disabling), for Power we
mainly adopt --param=vect-partial-vector-usage=1, it shouldn't be affected, for
--param=vect-partial-vector-usage=2, it does generate better code sequence for
most cases and we can improve the remaining worse one gradually.

next prev parent reply	other threads:[~2023-05-26  5:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 22:05 [Bug target/109971] New: " seurer at gcc dot gnu.org
2023-05-25 22:32 ` [Bug target/109971] " juzhe.zhong at rivai dot ai
2023-05-25 22:32 ` juzhe.zhong at rivai dot ai
2023-05-26  1:15 ` linkw at gcc dot gnu.org
2023-05-26  1:51 ` juzhe.zhong at rivai dot ai
2023-05-26  5:20 ` linkw at gcc dot gnu.org [this message]
2023-05-26  5:28 ` juzhe.zhong at rivai dot ai
2023-05-26  6:22 ` rguenth at gcc dot gnu.org
2023-05-31  2:50 ` linkw at gcc dot gnu.org
2023-05-31  2:54 ` juzhe.zhong at rivai dot ai
2023-05-31  3:11 ` linkw at gcc dot gnu.org
2023-06-02 11:50 ` cvs-commit at gcc dot gnu.org
2023-07-15  6:01 ` pinskia at gcc dot gnu.org
2023-07-17  2:24 ` linkw at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109971-4-Xifn0JM7Vz@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).