From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 92A4B3858D38; Fri, 8 Mar 2024 12:23:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 92A4B3858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709900639; bh=qlw3kzByyRWnSP+QjY2E2c191F/cRzOiMF4P6HFCUrA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=FQ0RCh0rdfDTw1qbPSqhrH7tcvCsT2Paa/UXMsrIjkQs/2hheL5NhbmgQDiBtaJv9 79tKklGJ6keSdvJ3Euu5txuEqkcG2Fw6R8AV38Q1QROp++4Ou1BybS5PFmZBU46wt9 UOIv43GEdooyNdGn7CJQK38CzHTO/sJCHeMJC+mU= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114269] [14 Regression] Multiple 3-27% exec time regressions of 434.zeusmp since r14-9193-ga0b1798042d033 Date: Fri, 08 Mar 2024 12:23:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114269 --- Comment #4 from Richard Biener --- The following is a C testcase for a case where ranges will not help: void foo (int *a, long js, long je, long is, long ie, long ks, long ke, long xi, long xj) { for (long j =3D js; j < je; ++j) for (long i =3D is; i < ie; ++i) for (long k =3D ks; k < ke; ++k) a[i + j*xi + k*xi*xj] =3D 5; } SCEV analysis result before/after shows issues. When you re-order the loops so the fast increment goes innermost this doesn't make a difference for vectorization though. In the order above we now require (emulated) gather which with SSE didn't work out and previously we used strided stores. The reason seems to be that when analyzing k*xi*xj the first multiply yields (long int) {(unsigned long) ks_21(D) * (unsigned long) xi_24(D), +, (unsign= ed long) xi_24(D)}_3 but when then asking to fold the multiply by xj we fail as we run into tree chrec_fold_multiply (tree type, tree op0, tree op1) {=20=20=20=20=20=20=20=20=20 ... CASE_CONVERT: if (tree_contains_chrecs (op0, NULL)) return chrec_dont_know; /* FALLTHRU */=20 but this case is somewhat odd as all other unhandled cases simply run into fold_build2. This possibly means we'd never build other ops with CHREC operands. This was added for PR42326. I think we can handle sign-conversions from unsigned just fine, chrec_fold_= plus does such thing already (but it misses one case). Doing this restores things to some extent. I'm testing this as an intermediate step before considering reversion of the change.=