From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id CBD253858C83; Thu,  1 Dec 2022 15:19:29 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CBD253858C83
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1669907969;
	bh=BlK3st/ZQK++7FdF+N/ltuzsQuEmCkHK0aNE6GFDZKA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=ooq27Cm8PbRTl9TnoEKrY5NIE0/gxejveHLiKTdIGtOuxevx57CHchRhNiyaR2QuY
	 aUjiwPhY/KI8nuZbtt8dMu6dKpGkWUvT5oHQqNHqNT5k5ZwA2FaIlzguQhAssv7gqN
	 5YiqBMoFuBHxJ3VdoscaeH69o8QB5PAZRWYSO1Vk=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107946] [13 Regression] 507.cactuBSSN_r
 regresses by ~9% on znver3 with PGO since r13-3875-g9e11ceef165bc0
Date: Thu, 01 Dec 2022 15:19:29 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: target_milestone
Message-ID: <bug-107946-4-BIvWZO5GDK@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107946-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107946-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107946

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Nope, it wasn't supposed to speedup the benchmark but it indeed (with -Ofas=
t)
causes the hot loop kernels to be unswitched.

Btw, do we know if train and ref data align up in these loops?

Btw, with -Ofast on znver2 I didn't observe any change when benchmarking th=
is.

I'm trying to reproduce.

OK, so with -O2 -flto -march=3Dznver2 and FDO I get a runtime of 173s while
adding -fno-unswitch-loops gets me 188s.  There's currently no knob to
specifically disable outer loop unswitching so I have to instead patch
that up.  With -O2 -flto -funswitch-loops (w/o FDO) I get 178s.  I'm going
to add a --param to allow easier reproduction.=