From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E88FE38449C8; Fri, 3 May 2024 21:20:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E88FE38449C8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1714771210; bh=LmuHoB+L8xEtParBfhEj2QoD449ooo9OF5yy7JPjI7k=; h=From:To:Subject:Date:In-Reply-To:References:From; b=pqRAoYSe5nEyK5B+otagu0U3JgfxE6lWh1jO8nlouog1EY5b/cy+SHdnonRcARohb 9oyVVXzAmEJmBKCLCmPIxnbknaMIcL0USQaMflabHBuVaud4wkuSp1L79TIdf67Z88 R6Y4ryArRE1Q6A27RdflTAuoNrnmwSq262QGCob4= From: "pinskia at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa Date: Fri, 03 May 2024 21:20:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pinskia at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114860 --- Comment #5 from Andrew Pinski --- (In reply to prathamesh3492 from comment #4) > To check for any > possible icache misses I used L1I_CACHE_REFILL counter, and turns out that > there are 64% more L1 icache misses for above adrp instruction with > a2f4be3dae0 compared to 82d6d385f97, which may (partially) explain the > performance difference ? Although perf stat shows there are around 7% more > L1 icache misses for whole program run with 82d6d385f97 compared to > a2f4be3dae0. This makes it sound like there is some code alignment issue going on or a branch misprediction issue going on.=20 bad alignment: 4aeae4 good alignment 4aec44 The good alignment case is at the (almost) start at an icache line while the bad alignment case is in the middle. (I am assuming 64byte cache lines whic= h I think is correct) Maybe look at mispredicted branches too. It might be the branch leading to = this code is being mispredicted more due to the address of the branch is now interfeeing with another branch. It might just have been bad luck that caused this regression in both cases really; alignment differences and/or address differences can be bad luck.=