From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id DD74C3858C83; Tue, 18 Oct 2022 10:37:07 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD74C3858C83
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1666089427;
	bh=VCfpzkrftlRrzcHtL2IfP5nqVTGwM8JfQe9i/HhS8q8=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=jhY0LqL9AaIW/s+ga9WRbWuQKU1irrwe92kQ8dqatq0tFJfT8z52rNQwgNdq1TkU1
	 HyIUUS1dN9wWZFMrh9ztf8vktay9+7hiSQB9zP11G0mQyRAAplfoFWVXrnwkHx9Zw1
	 Ou465B9C4KQB1xdOJ6UPUkeIMV3HTrkGkW7FWHDQ=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized
 by clang and not by gcc
Date: Tue, 18 Oct 2022 10:37:07 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-99395-4-MNJVDRwrhr@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-99395-4@http.gcc.gnu.org/bugzilla/>
References: <bug-99395-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixing the CSE in the testcase by doing

double a[1024];
void foo ()
{
  for (int i =3D 0; i < 1022; i +=3D 2)
    {
      double tem =3D a[i+1];
      a[i] =3D tem * a[i];
      a[i+1] =3D a[i+2] * tem;
    }
}

gets us

t.c:4:21: note:   Detected interleaving load a[i_15] and a[_1]
t.c:4:21: note:   Detected interleaving store a[i_15] and a[_1]
t.c:4:21: note:   Detected interleaving load of size 2
t.c:4:21: note:         _2 =3D a[i_15];
t.c:4:21: note:         tem_10 =3D a[_1];
t.c:4:21: note:   Detected single element interleaving a[_4] step 16
t.c:4:21: note:   Detected interleaving store of size 2
t.c:4:21: note:         a[i_15] =3D _3;
t.c:4:21: note:         a[_1] =3D _6;

in the loop pass and failed dependence analysis and
with the SLP pass (no predcom):

t.c:10:1: note:   Detected interleaving load a[i_15] and a[_1]
t.c:10:1: note:   Detected interleaving load a[i_15] and a[_4]
t.c:10:1: note:   Detected interleaving store a[i_15] and a[_1]
t.c:10:1: note:   Detected interleaving load of size 3
t.c:10:1: note:         _2 =3D a[i_15];
t.c:10:1: note:         tem_10 =3D a[_1];
t.c:10:1: note:         _5 =3D a[_4];
t.c:10:1: note:   Detected interleaving store of size 2
t.c:10:1: note:         a[i_15] =3D _3;
t.c:10:1: note:         a[_1] =3D _6;

which then runs into gap vect issues for how we'd vectorize the three
element load.

The dependence analysis is done by analyzing the validity of the
vectorized load/store placement and the implied motion of the
scalar load/store statements.  The missed optimization here would
be the missed alternate placement that would be correct.  But I
think the way we form groups would need to be revisited first here.=