From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DD74C3858C83; Tue, 18 Oct 2022 10:37:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD74C3858C83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666089427; bh=VCfpzkrftlRrzcHtL2IfP5nqVTGwM8JfQe9i/HhS8q8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=jhY0LqL9AaIW/s+ga9WRbWuQKU1irrwe92kQ8dqatq0tFJfT8z52rNQwgNdq1TkU1 HyIUUS1dN9wWZFMrh9ztf8vktay9+7hiSQB9zP11G0mQyRAAplfoFWVXrnwkHx9Zw1 Ou465B9C4KQB1xdOJ6UPUkeIMV3HTrkGkW7FWHDQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc Date: Tue, 18 Oct 2022 10:37:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395 --- Comment #5 from Richard Biener --- Fixing the CSE in the testcase by doing double a[1024]; void foo () { for (int i =3D 0; i < 1022; i +=3D 2) { double tem =3D a[i+1]; a[i] =3D tem * a[i]; a[i+1] =3D a[i+2] * tem; } } gets us t.c:4:21: note: Detected interleaving load a[i_15] and a[_1] t.c:4:21: note: Detected interleaving store a[i_15] and a[_1] t.c:4:21: note: Detected interleaving load of size 2 t.c:4:21: note: _2 =3D a[i_15]; t.c:4:21: note: tem_10 =3D a[_1]; t.c:4:21: note: Detected single element interleaving a[_4] step 16 t.c:4:21: note: Detected interleaving store of size 2 t.c:4:21: note: a[i_15] =3D _3; t.c:4:21: note: a[_1] =3D _6; in the loop pass and failed dependence analysis and with the SLP pass (no predcom): t.c:10:1: note: Detected interleaving load a[i_15] and a[_1] t.c:10:1: note: Detected interleaving load a[i_15] and a[_4] t.c:10:1: note: Detected interleaving store a[i_15] and a[_1] t.c:10:1: note: Detected interleaving load of size 3 t.c:10:1: note: _2 =3D a[i_15]; t.c:10:1: note: tem_10 =3D a[_1]; t.c:10:1: note: _5 =3D a[_4]; t.c:10:1: note: Detected interleaving store of size 2 t.c:10:1: note: a[i_15] =3D _3; t.c:10:1: note: a[_1] =3D _6; which then runs into gap vect issues for how we'd vectorize the three element load. The dependence analysis is done by analyzing the validity of the vectorized load/store placement and the implied motion of the scalar load/store statements. The missed optimization here would be the missed alternate placement that would be correct. But I think the way we form groups would need to be revisited first here.=