From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4EA3C385735A; Tue, 14 Jun 2022 10:22:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4EA3C385735A From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/105968] GCC vectorizes but reports that it did not vectorize Date: Tue, 14 Jun 2022 10:22:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: WONTFIX X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2022 10:22:07 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105968 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #2 from Richard Biener --- > ./cc1 -quiet t.c -O3 -mavx2 -fopt-info t.c:11:25: optimized: loops interchanged in loop nest > ./cc1 -quiet t.c -O2 -mavx2 -fopt-info t.c:14:19: optimized: loop vectorized using 32 byte vectors so we interchange the loop to for (i =3D 0; i < N; ++i) for (times =3D 0; times < NTIMES; times++) r[i] =3D (a[i] + b[i]) * c[i]; which is indeed good for memory locality (now, we should then eliminate the inner loop completely but we have no such facility - only unrolling and DSE/DCE would do this but nothing on the high-level loop form). "Benchmark" issue. The outer loop should have a memory clobber. Oh, and we should in theory be able to vectorize the outer loop if N is a multiple of the vector element count. But: t.c:11:25: note: =3D=3D=3D vect_analyze_data_ref_accesses =3D=3D=3D t.c:11:25: note: zero step in inner loop of nest t.c:11:25: missed: not vectorized: complicated access pattern. t.c:15:14: missed: not vectorized: complicated access pattern. t.c:11:25: missed: bad data access. so we don't handle this exact issue (maybe the offending check can simply be elided - assuming dependence checking handles zero steps correctly). Putting __asm__ volatile ("" : : : "memory"); at the end of the outer loop vectorizes with -O3 as well (but doesn't interchange). Not a bug I think unless you want to make it a bug about not vectorizing the outer loop after interchange.=