From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 72BAC3858000; Thu, 2 Nov 2023 10:00:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 72BAC3858000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698919250; bh=Xy9385kyPfpewa66oPwS6tWATj2ABuJfTz5VFhT79gQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ySWPpMnPwZMaKGVU2/XkwrA+f+R3Szl+rwL+a5x21yPDqDr+2LhnZRsahOWXhMIke BVXMtpDp1Njj1Y/dfc4iRqXKRMKYsEBBqHkbngchRWe4Z6645cik0A0Uo2/S2bpMk6 apYvqVm63Mx18LF1WAcsV+v2RyJhBRMsxXINxmZU= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/112331] Fail vectorization after loop interchange Date: Thu, 02 Nov 2023 10:00:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: short_desc cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112331 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|middle-end: Fail |Fail vectorization after |vectorization |loop interchange CC| |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- Well, the "issue" is that we are performing loop interchange on this benchm= ark loop and the vectorizer doesn't like the zero-step in the then innermost lo= op. It's not a practical example, nobody would do such outer loop in practice. There's a missed optimization in that we fail to elide the then inner loop. The solution is to insert a use of 'a' after the inner loop, like TSVC benchmarks usually have: real_t s111(struct args_t * func_args) { // linear dependence testing // no dependence - vectorizable initialise_arrays(__func__); for (int nl =3D 0; nl < 2*iterations; nl++) { for (int i =3D 1; i < LEN_1D; i +=3D 2) { a[i] =3D a[i - 1] + b[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } return calc_checksum(__func__); } the it just works(TM). WONTFIX (in the vectorizer). In "theory" the interchanged loop could be vectorized by outer loop vectorization. But as said, IMHO a waste of time to cheat badly written benchmarks.=