From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id BD4813858D38 for ; Thu, 23 Jul 2020 22:47:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BD4813858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=segher@kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 06NMlGsI001126; Thu, 23 Jul 2020 17:47:16 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 06NMlGjO001125; Thu, 23 Jul 2020 17:47:16 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 23 Jul 2020 17:47:15 -0500 From: Segher Boessenkool To: Andrea Corallo Cc: Andrew Pinski , Richard Earnshaw , nd , GCC Patches Subject: Re: [PATCH 2/2] Aarch64: Add branch diluter pass Message-ID: <20200723224715.GK32057@gate.crashing.org> References: <20200722164322.GA32057@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2020 22:47:24 -0000 On Wed, Jul 22, 2020 at 09:45:08PM +0200, Andrea Corallo wrote: > > Should that actually be a sliding window, or should there actually just > > not be more than N branches per aligned block of machine code? Like, > > per fetch group. > > > > Can you not use ASM_OUTPUT_ALIGN_WITH_NOP (or ASM_OUTPUT_MAX_SKIP_ALIGN > > even) then? GCC has infrastructure for that, already. > > Correct, it's a sliding window only because the real load address is not > known to the compiler and the algorithm is conservative. I believe we > could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al > least) the granule size, then we should be able to insert 'nop aligned > labels' precisely. Yeah, we have similar issues on Power... Our "granule" (fetch group size, in our terminology) is 32 typically, but we align functions to just 16. This is causing some problems, but aligning to bigger boundaries isn't a very happy alternative either. WIP... (We don't have this exact same problem, because our non-ancient cores can just predict *all* branches in the same cycle). > My main fear is that given new cores tend to have big granules code size > would blow. One advantage of the implemented algorithm is that even if > slightly conservative it's impacting code size only where an high branch > density shows up. What is "big granules" for you? Segher