From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 798603857C61 for ; Fri, 24 Jul 2020 11:53:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 798603857C61 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=segher@kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 06OBrlcw016254; Fri, 24 Jul 2020 06:53:47 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 06OBrldm016253; Fri, 24 Jul 2020 06:53:47 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 24 Jul 2020 06:53:47 -0500 From: Segher Boessenkool To: Andrea Corallo Cc: Andrew Pinski , Richard Earnshaw , nd , GCC Patches Subject: Re: [PATCH 2/2] Aarch64: Add branch diluter pass Message-ID: <20200724115347.GM32057@gate.crashing.org> References: <20200722164322.GA32057@gate.crashing.org> <20200723224715.GK32057@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jul 2020 11:53:52 -0000 Hi! On Fri, Jul 24, 2020 at 09:01:33AM +0200, Andrea Corallo wrote: > Segher Boessenkool writes: > >> Correct, it's a sliding window only because the real load address is not > >> known to the compiler and the algorithm is conservative. I believe we > >> could use ASM_OUTPUT_ALIGN_WITH_NOP if we align each function to (al > >> least) the granule size, then we should be able to insert 'nop aligned > >> labels' precisely. > > > > Yeah, we have similar issues on Power... Our "granule" (fetch group > > size, in our terminology) is 32 typically, but we align functions to > > just 16. This is causing some problems, but aligning to bigger > > boundaries isn't a very happy alternative either. WIP... > > Interesting, I was expecting other CPUs to have a similar mechanism. On old cpus (like the 970) there were at most two branch predictions per cycle. Nowadays, all branches are predicted; not sure when this changed, it is pretty long ago already. > > (We don't have this exact same problem, because our non-ancient cores > > can just predict *all* branches in the same cycle). > > > >> My main fear is that given new cores tend to have big granules code size > >> would blow. One advantage of the implemented algorithm is that even if > >> slightly conservative it's impacting code size only where an high branch > >> density shows up. > > > > What is "big granules" for you? > > N1 is 8 instructions so 32 bytes as well, I guess this may grow further > (my speculation). It has to sooner rather than later, yeah. Or the mechanism has to change more radically. Interesting times ahead, I guess :-) About your patch itself. The basic idea seems fine (I didn't look too closely), but do you really need a new RTX class for this? That is not very appetising... Segher