From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7B8233858D3C; Fri, 26 May 2023 05:20:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7B8233858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685078401; bh=gKkbcsGitF8F+7YcPLFXgFDvDnsDfb5s9XpKVNPgn/Y=; h=From:To:Subject:Date:In-Reply-To:References:From; b=L35zNNUByszoihC/1cUTqhchVx6WHIkg4llUJmi31huWhxSNWcDUH7TwoLwFJvNYT e3YmQsQD05Edz87xx6xpynkNJIXsmvyvuFLMnYjRP7G54kY4rCpUOOHqBUQOeg4hTk TYGTDJMi7zdtEJfyoJFYW0tE+WJH8hNW4osRbNKg= From: "linkw at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109971] [14 regression] Several powerpc64 vector test cases fail after r14-1242-gf574e2dfae7905 Date: Fri, 26 May 2023 05:20:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: linkw at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: everconfirmed bug_status cf_reconfirmed_on Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109971 Kewen Lin changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2023-05-26 --- Comment #5 from Kewen Lin --- (In reply to JuzheZhong from comment #4) > (In reply to Kewen Lin from comment #3) > > I'll take a look first. >=20 > Thanks a lot. I am sorry for causing such issue to you. Never mind! Some failures can't be even caught by normal testings, or not responsible by the culprit patch itself but just exposed by it instead. As your comment #c2, it seems that you want to disable this on Power (and s= 390) for now? (It's disabled for s390 apparently since it has LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS 1 always).=20 After some checkings, I found that: 1) for failures on p9-vec-length-full-{1,2,6}.c, the root cause is that the main loop becomes neat and rtl pass bbro is able to duplicate it, the expec= ted counts on vector with length instructions change accordingly, I think they = are test issues. With decrement IV, the optimized IR actually becomes better, it also aligns with our discussion here: https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615629.html (Thanks for the improvement!) For example on full-1.c int8_t type: [local count: 75161909]: # vectp_a_int8_t.4_18 =3D PHI # vectp_b_int8_t.8_8 =3D PHI # vectp_c_int8_t.14_26 =3D PHI # ivtmp_29 =3D PHI # loop_len_16 =3D PHI <_34(5), 16(2)> vect__1.6_13 =3D .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0); vect__2.7_12 =3D VIEW_CONVERT_EXPR(vect__1.6_13= ); vect__3.10_22 =3D .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0); vect__4.11_23 =3D VIEW_CONVERT_EXPR(vect__3.10_= 22); vect__5.12_24 =3D vect__2.7_12 + vect__4.11_23; vect__6.13_25 =3D VIEW_CONVERT_EXPR(vect__5.12_24= ); .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0); vectp_a_int8_t.4_17 =3D vectp_a_int8_t.4_18 + 16; vectp_b_int8_t.8_7 =3D vectp_b_int8_t.8_8 + 16; vectp_c_int8_t.14_27 =3D vectp_c_int8_t.14_26 + 16; ivtmp_30 =3D ivtmp_29 + 16; _32 =3D MIN_EXPR ; _33 =3D 127 - _32; _34 =3D MIN_EXPR <_33, 16>; if (ivtmp_30 <=3D 126) goto ; [85.71%] else goto ; [14.29%] vs. [local count: 75161909]: # vectp_a_int8_t.4_18 =3D PHI # vectp_b_int8_t.8_8 =3D PHI # vectp_c_int8_t.14_26 =3D PHI # ivtmp_29 =3D PHI loop_len_16 =3D MIN_EXPR ; vect__1.6_13 =3D .LEN_LOAD (vectp_a_int8_t.4_18, 8B, loop_len_16, 0); vect__2.7_12 =3D VIEW_CONVERT_EXPR(vect__1.6_13= ); vect__3.10_22 =3D .LEN_LOAD (vectp_b_int8_t.8_8, 8B, loop_len_16, 0); vect__4.11_23 =3D VIEW_CONVERT_EXPR(vect__3.10_= 22); vect__5.12_24 =3D vect__2.7_12 + vect__4.11_23; vect__6.13_25 =3D VIEW_CONVERT_EXPR(vect__5.12_24= ); .LEN_STORE (vectp_c_int8_t.14_26, 8B, loop_len_16, vect__6.13_25, 0); vectp_a_int8_t.4_17 =3D vectp_a_int8_t.4_18 + 16; vectp_b_int8_t.8_7 =3D vectp_b_int8_t.8_8 + 16; vectp_c_int8_t.14_27 =3D vectp_c_int8_t.14_26 + 16; ivtmp_30 =3D ivtmp_29 - loop_len_16; if (ivtmp_30 !=3D 0) goto ; [85.71%] else goto ; [14.29%] 2) for failure on p9-vec-length-full-7.c ({u,}int8_t), the IR difference ca= uses cunroll not to unroll the loop further, so IR has some differences during optimized dumpings: [local count: 18146240]: MEM [(signed char *)&x_int8_t + 16B] =3D { 15, 1= 6, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 }; MEM [(signed char *)&x_int8_t + 32B] =3D { 31, 3= 2, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46 }; .LEN_STORE (&MEM [(void *)&x_int8_t + 48B], 128B, 11, { 47, = 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 }, 0); [tail call] return; vs. [local count: 72584963]: # vect_vec_iv_.6_50 =3D PHI <_51(5), { 15, 16, 17, 18, 19, 20, 21, 22, 23= , 24, 25, 26, 27, 28, 29, 30 }(4)> # ivtmp_57 =3D PHI # ivtmp.12_11 =3D PHI loop_len_55 =3D MIN_EXPR ; _51 =3D vect_vec_iv_.6_50 + { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,= 16, 16, 16, 16, 16 }; _5 =3D (void *) ivtmp.12_11; _14 =3D &MEM [(signed char *)_5]; .LEN_STORE (_14, 128B, loop_len_55, vect_vec_iv_.6_50, 0); ivtmp_58 =3D ivtmp_57 - loop_len_55; ivtmp.12_22 =3D ivtmp.12_11 + 16; if (ivtmp_58 !=3D 0) goto ; [75.00%] else goto ; [25.00%] It exposes something inefficient at -O2, it seems we can teach cunroll furt= her about this kind of new sequence. If you meant to disable decrement IV on Power (but now actually enable it unexpectedly), then probably we can just keep it (not disabling), for Power= we mainly adopt --param=3Dvect-partial-vector-usage=3D1, it shouldn't be affec= ted, for --param=3Dvect-partial-vector-usage=3D2, it does generate better code seque= nce for most cases and we can improve the remaining worse one gradually.=