From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 92A4B3858D38; Fri,  8 Mar 2024 12:23:59 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 92A4B3858D38
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1709900639;
	bh=qlw3kzByyRWnSP+QjY2E2c191F/cRzOiMF4P6HFCUrA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=FQ0RCh0rdfDTw1qbPSqhrH7tcvCsT2Paa/UXMsrIjkQs/2hheL5NhbmgQDiBtaJv9
	 79tKklGJ6keSdvJ3Euu5txuEqkcG2Fw6R8AV38Q1QROp++4Ou1BybS5PFmZBU46wt9
	 UOIv43GEdooyNdGn7CJQK38CzHTO/sJCHeMJC+mU=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114269] [14 Regression] Multiple 3-27% exec
 time regressions of 434.zeusmp since r14-9193-ga0b1798042d033
Date: Fri, 08 Mar 2024 12:23:58 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114269-4-S4HWJTF8UJ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114269-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114269-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114269
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following is a C testcase for a case where ranges will not help:

void foo (int *a, long js, long je, long is, long ie, long ks, long ke, long
xi, long xj)
{
  for (long j =3D js; j < je; ++j)
    for (long i =3D is; i < ie; ++i)
      for (long k =3D ks; k < ke; ++k)
        a[i + j*xi + k*xi*xj] =3D 5;
}

SCEV analysis result before/after shows issues.  When you re-order the loops
so the fast increment goes innermost this doesn't make a difference for
vectorization though.  In the order above we now require (emulated) gather
which with SSE didn't work out and previously we used strided stores.

The reason seems to be that when analyzing k*xi*xj the first multiply
yields

(long int) {(unsigned long) ks_21(D) * (unsigned long) xi_24(D), +, (unsign=
ed
long) xi_24(D)}_3

but when then asking to fold the multiply by xj we fail as we run into

tree
chrec_fold_multiply (tree type,
                     tree op0,
                     tree op1)
{=20=20=20=20=20=20=20=20=20
...
    CASE_CONVERT:
      if (tree_contains_chrecs (op0, NULL))
        return chrec_dont_know;
      /* FALLTHRU */=20

but this case is somewhat odd as all other unhandled cases simply run into
fold_build2.  This possibly means we'd never build other ops with
CHREC operands.  This was added for PR42326.

I think we can handle sign-conversions from unsigned just fine, chrec_fold_=
plus
does such thing already (but it misses one case).

Doing this restores things to some extent.

I'm testing this as an intermediate step before considering reversion of the
change.=