From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id CA306385DC00; Tue, 30 Jan 2024 12:26:43 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA306385DC00
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1706617603;
	bh=bsf3i5sqTws7twJ9mR4UbPfFiTY7TDyJ1uWLFdfnYbg=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=cqygpkbaoWolHONy30E3gh7HskAbBabPr/hvtqSj3b9+jwqq+qG0nd5QBf8b+TAnO
	 6Mhadptc6G0JZDSt7E2CcphfnE4t3kmUwlsvAlGGQ5VGO/pTWnV4jRrEk2wW4nDR5/
	 KFpsxZlPzeR6heJbLzDvf44veG9zLS69pVDFDNxI=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized
 by clang and not by gcc
Date: Tue, 30 Jan 2024 12:26:41 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-99395-4-oOU7d70xZW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-99395-4@http.gcc.gnu.org/bugzilla/>
References: <bug-99395-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #8)
> Hi, Richard.
>=20
> Now, I find the time to GCC vectorization optimization.
>=20
> I find this case:
>=20
>   _2 =3D a[_1];
>   ...
>   a[i_16] =3D _4;
>   ,,,
>   _7 =3D a[_1];    ---> This load should be eliminated and re-use _2.
>=20
> Am I right ?
>=20
> Could you guide me which pass should do this CSE optimization ?
>=20
> Thanks.

In principle it's value-numbering.  The reason it doesn't do this is
compile-time cost of doing full data-ref analysis.  In principle it's
as "easy" as hooking that up into vn_reference_lookup_3 as part of the
early work therein to disambiguate more defs.

Iff we chose to refrain from valueizing any of the SSA uses we could
cache both the data references and the dependence resolution.

One could also think of doing very simple recognition of these
single index expressions and / or integrating this with other cases.
IIRC there's some warranting SCEV processing / niter analysis as well
for example to figure that

 for (int i =3D 0; i < 128; ++i)
   a[i] =3D 1;
 return a[5];

returns 1.=