From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
 by sourceware.org (Postfix) with ESMTP id 798603857C61
 for <gcc-patches@gcc.gnu.org>; Fri, 24 Jul 2020 11:53:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 798603857C61
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=kernel.crashing.org
Authentication-Results: sourceware.org;
 spf=fail smtp.mailfrom=segher@kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
 by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 06OBrlcw016254;
 Fri, 24 Jul 2020 06:53:47 -0500
Received: (from segher@localhost)
 by gate.crashing.org (8.14.1/8.14.1/Submit) id 06OBrldm016253;
 Fri, 24 Jul 2020 06:53:47 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to
 segher@kernel.crashing.org using -f
Date: Fri, 24 Jul 2020 06:53:47 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Andrea Corallo <andrea.corallo@arm.com>
Cc: Andrew Pinski <pinskia@gmail.com>,
 Richard Earnshaw <richard.earnshaw@arm.com>, nd <nd@arm.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/2] Aarch64: Add branch diluter pass
Message-ID: <20200724115347.GM32057@gate.crashing.org>
References: <gkry2nbyiiu.fsf@arm.com> <gkrtuxzyi7v.fsf@arm.com>
 <CA+=Sn1nOLsVSuGUTBPPq9hWLnaD7uwQHsZ1EUhYJ284NL6y4fA@mail.gmail.com>
 <gkrpn8ny7tt.fsf@arm.com> <20200722164322.GA32057@gate.crashing.org>
 <gkra6zrxrjv.fsf@arm.com> <20200723224715.GK32057@gate.crashing.org>
 <gkrwo2twg4y.fsf@arm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <gkrwo2twg4y.fsf@arm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL,
 KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, TXREP, T_SPF_HELO_PERMERROR,
 T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jul 2020 11:53:52 -0000

Hi!

On Fri, Jul 24, 2020 at 09:01:33AM +0200, Andrea Corallo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> Correct, it's a sliding window only because the real load address is not
> >> known to the compiler and the algorithm is conservative.  I believe we
> >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al
> >> least) the granule size, then we should be able to insert 'nop aligned
> >> labels' precisely.
> >
> > Yeah, we have similar issues on Power...  Our "granule" (fetch group
> > size, in our terminology) is 32 typically, but we align functions to
> > just 16.  This is causing some problems, but aligning to bigger
> > boundaries isn't a very happy alternative either.  WIP...
> 
> Interesting, I was expecting other CPUs to have a similar mechanism.

On old cpus (like the 970) there were at most two branch predictions per
cycle.  Nowadays, all branches are predicted; not sure when this changed,
it is pretty long ago already.

> > (We don't have this exact same problem, because our non-ancient cores
> > can just predict *all* branches in the same cycle).
> >
> >> My main fear is that given new cores tend to have big granules code size
> >> would blow.  One advantage of the implemented algorithm is that even if
> >> slightly conservative it's impacting code size only where an high branch
> >> density shows up.
> >
> > What is "big granules" for you?
> 
> N1 is 8 instructions so 32 bytes as well, I guess this may grow further
> (my speculation).

It has to sooner rather than later, yeah.  Or the mechanism has to change
more radically.  Interesting times ahead, I guess :-)


About your patch itself.  The basic idea seems fine (I didn't look too
closely), but do you really need a new RTX class for this?  That is not
very appetising...


Segher