From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7072039CC950; Tue, 1 Jun 2021 08:11:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7072039CC950 From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/100849] New: Poor placement of vector IVs Date: Tue, 01 Jun 2021 08:11:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jun 2021 08:11:54 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100849 Bug ID: 100849 Summary: Poor placement of vector IVs Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Vector IV increments are usually placed at the beginning of a loop body. This means that both the old and new IV values are live at the same time, forcing a move. E.g.: int x[100], y[100]; void f1 (void) { for (int i =3D 0; i < 100; ++i) x[i] =3D (i & 11) =3D=3D 2 ? y[i] : 1; } produces: [local count: 268435400]: # vect_vec_iv_.7_47 =3D PHI <_48(3), { 4, 5, 6, 7 }(2)> # ivtmp.21_21 =3D PHI _48 =3D vect_vec_iv_.7_47 + { 4, 4, 4, 4 }; vect__1.8_50 =3D vect_vec_iv_.7_47 & { 11, 11, 11, 11 }; vect_iftmp.11_54 =3D MEM [(int *)&y + 16B + ivtmp.21_21 *= 1]; vect_iftmp.12_58 =3D .VCOND (vect__1.8_50, { 2, 2, 2, 2 }, vect_iftmp.11_= 54, { 1, 1, 1, 1 }, 113); MEM [(int *)&x + 16B + ivtmp.21_21 * 1] =3D vect_iftmp.12= _58; ivtmp.21_16 =3D ivtmp.21_21 + 16; if (ivtmp.21_16 !=3D 384) goto ; [96.00%] else goto ; [4.00%] It might be better to place the vector IV at the same place as the original scalar increment (or at the end of the loop body?) The AArch64 Advanced SIMD code is: .L2: mov v0.16b, v1.16b add x2, x4, x0 add v1.4s, v1.4s, v6.4s add x1, x3, x0 add x0, x0, 16 ldr q3, [x2, 16] and v0.16b, v0.16b, v5.16b cmeq v0.4s, v0.4s, v4.4s bsl v0.16b, v3.16b, v2.16b str q0, [x1, 16] cmp x0, 384 bne .L2=