From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CA306385DC00; Tue, 30 Jan 2024 12:26:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA306385DC00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706617603; bh=bsf3i5sqTws7twJ9mR4UbPfFiTY7TDyJ1uWLFdfnYbg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=cqygpkbaoWolHONy30E3gh7HskAbBabPr/hvtqSj3b9+jwqq+qG0nd5QBf8b+TAnO 6Mhadptc6G0JZDSt7E2CcphfnE4t3kmUwlsvAlGGQ5VGO/pTWnV4jRrEk2wW4nDR5/ KFpsxZlPzeR6heJbLzDvf44veG9zLS69pVDFDNxI= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc Date: Tue, 30 Jan 2024 12:26:41 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395 --- Comment #9 from Richard Biener --- (In reply to JuzheZhong from comment #8) > Hi, Richard. >=20 > Now, I find the time to GCC vectorization optimization. >=20 > I find this case: >=20 > _2 =3D a[_1]; > ... > a[i_16] =3D _4; > ,,, > _7 =3D a[_1]; ---> This load should be eliminated and re-use _2. >=20 > Am I right ? >=20 > Could you guide me which pass should do this CSE optimization ? >=20 > Thanks. In principle it's value-numbering. The reason it doesn't do this is compile-time cost of doing full data-ref analysis. In principle it's as "easy" as hooking that up into vn_reference_lookup_3 as part of the early work therein to disambiguate more defs. Iff we chose to refrain from valueizing any of the SSA uses we could cache both the data references and the dependence resolution. One could also think of doing very simple recognition of these single index expressions and / or integrating this with other cases. IIRC there's some warranting SCEV processing / niter analysis as well for example to figure that for (int i =3D 0; i < 128; ++i) a[i] =3D 1; return a[5]; returns 1.=