From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 7072039CC950; Tue,  1 Jun 2021 08:11:54 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7072039CC950
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/100849] New: Poor placement of vector IVs
Date: Tue, 01 Jun 2021 08:11:54 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rsandifo at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 keywords bug_severity priority component assigned_to reporter
 target_milestone
Message-ID: <bug-100849-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Jun 2021 08:11:54 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100849

            Bug ID: 100849
           Summary: Poor placement of vector IVs
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

Vector IV increments are usually placed at the beginning of a loop body.
This means that both the old and new IV values are live at the same time,
forcing a move.

E.g.:

int x[100], y[100];

void f1 (void)
{
  for (int i =3D 0; i < 100; ++i)
    x[i] =3D (i & 11) =3D=3D 2 ? y[i] : 1;
}

produces:

  <bb 3> [local count: 268435400]:
  # vect_vec_iv_.7_47 =3D PHI <_48(3), { 4, 5, 6, 7 }(2)>
  # ivtmp.21_21 =3D PHI <ivtmp.21_16(3), 0(2)>
  _48 =3D vect_vec_iv_.7_47 + { 4, 4, 4, 4 };
  vect__1.8_50 =3D vect_vec_iv_.7_47 & { 11, 11, 11, 11 };
  vect_iftmp.11_54 =3D MEM <vector(4) int> [(int *)&y + 16B + ivtmp.21_21 *=
 1];
  vect_iftmp.12_58 =3D .VCOND (vect__1.8_50, { 2, 2, 2, 2 }, vect_iftmp.11_=
54, {
1, 1, 1, 1 }, 113);
  MEM <vector(4) int> [(int *)&x + 16B + ivtmp.21_21 * 1] =3D vect_iftmp.12=
_58;
  ivtmp.21_16 =3D ivtmp.21_21 + 16;
  if (ivtmp.21_16 !=3D 384)
    goto <bb 3>; [96.00%]
  else
    goto <bb 4>; [4.00%]

It might be better to place the vector IV at the same place as
the original scalar increment (or at the end of the loop body?)

The AArch64 Advanced SIMD code is:

.L2:
        mov     v0.16b, v1.16b
        add     x2, x4, x0
        add     v1.4s, v1.4s, v6.4s
        add     x1, x3, x0
        add     x0, x0, 16
        ldr     q3, [x2, 16]
        and     v0.16b, v0.16b, v5.16b
        cmeq    v0.4s, v0.4s, v4.4s
        bsl     v0.16b, v3.16b, v2.16b
        str     q0, [x1, 16]
        cmp     x0, 384
        bne     .L2=