From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E88FE38449C8; Fri,  3 May 2024 21:20:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E88FE38449C8
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1714771210;
	bh=LmuHoB+L8xEtParBfhEj2QoD449ooo9OF5yy7JPjI7k=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=pqRAoYSe5nEyK5B+otagu0U3JgfxE6lWh1jO8nlouog1EY5b/cy+SHdnonRcARohb
	 9oyVVXzAmEJmBKCLCmPIxnbknaMIcL0USQaMflabHBuVaud4wkuSp1L79TIdf67Z88
	 R6Y4ryArRE1Q6A27RdflTAuoNrnmwSq262QGCob4=
From: "pinskia at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114860] [14/15 regression] [aarch64] 511.povray
 regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since
 r14-10014-ga2f4be3dae04fa
Date: Fri, 03 May 2024 21:20:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: pinskia at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114860-4-W7q9N3btrW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114860-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114860-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114860
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to prathamesh3492 from comment #4)
> To check for any
> possible icache misses I used L1I_CACHE_REFILL counter, and turns out that
> there are 64% more L1 icache misses for above adrp instruction with
> a2f4be3dae0 compared to 82d6d385f97, which may (partially) explain the
> performance difference ? Although perf stat shows there are around 7% more
> L1 icache misses for whole program run with 82d6d385f97 compared to
> a2f4be3dae0.

This makes it sound like there is some code alignment issue going on or a
branch misprediction issue going on.=20

bad alignment: 4aeae4
good alignment 4aec44

The good alignment case is at the (almost) start at an icache line while the
bad alignment case is in the middle. (I am assuming 64byte cache lines whic=
h I
think is correct)

Maybe look at mispredicted branches too. It might be the branch leading to =
this
code is being mispredicted more due to the address of the branch is now
interfeeing with another branch.

It might just have been bad luck that caused this regression in both cases
really; alignment differences and/or address differences can be bad luck.=