From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16949 invoked by alias); 13 Aug 2015 03:30:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 16931 invoked by uid 89); 13 Aug 2015 03:30:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Thu, 13 Aug 2015 03:30:54 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id t7D3UgOr022766; Wed, 12 Aug 2015 22:30:42 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id t7D3UddA022763; Wed, 12 Aug 2015 22:30:39 -0500 Date: Thu, 13 Aug 2015 03:30:00 -0000 From: Segher Boessenkool To: Wilco Dijkstra Cc: rth@redhat.com, "'GCC Patches'" , Richard Earnshaw , Marcus Shawcroft , dje.gcc@gmail.com Subject: Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Message-ID: <20150813033037.GE19357@gate.crashing.org> References: <005301d0d517$ddd8a030$9989e090$@com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <005301d0d517$ddd8a030$9989e090$@com> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00672.txt.bz2 On Wed, Aug 12, 2015 at 04:59:27PM +0100, Wilco Dijkstra wrote: > However it also creates new dependencies that may not be desirable > (such as hash table size, algorithm used etc). > > > Some notes about ppc64 in particular: > > > > * Constants aren't split until quite late, preventing all hope of > > CSE'ing portions of the generated code. My gut feeling is that > > this is in general a mistake, but... > > Late split is best in general as you want to CSE the original constants, > not parts of the expansion (which would be very rarely possible). CSE handles the REG_EQ* notes just fine? For rs6000 many constants are split at expansion already, and I never saw problems from that? Maybe I haven't looked at the right (wrong :-) ) testcases though. > > Comments? Especially on the shared header? > > I'm not convinced the amount of code that could be shared is enough to be > worthwhile. I agree. > Also the way it is written makes the immediate generation more > complex and likely consuming a lot of memory (each cached immediate requires > at least 64 bytes). It would take a bit less memory if it used pointers (to other cache elts) for sub-expressions, but that makes cache invalidation more interesting, and shared (within one constant) subexpressions non-trivial. > It is not obvious to me why it is a good idea to hide the > simple/fast cases behind the hashing scheme - it seems better that the backend > explicitly decides which cases should be cached. Without going through the abstraction you mean? I'd rather there was no such abstraction at all :-) > I looked at the statistics of AArch64 immediate generation a while ago. > The interesting thing is ~95% of calls are queries, and the same query is on > average repeated 10 times in a row. Huh. > So (a) it is not important to cache the expansions, It also stores the expansion from analysis until emission time. This simplifies the code a lot. Maybe that can be done without caching everything (and still not be too expensive), dunno. > and (b) the high repetition rate means a single-entry cache > has a 90% hitrate. And a 10% miss rate... > We already have a patch for this and could collect stats > comparing the approaches. If a single-entry cache can provide a similar > benefit as caching all immediates then my preference would be to keep things > simple and just cache the last query. > > Note the many repeated queries indicate a performance issue at a much higher > level (repeated cost queries on the same unchanged RTL), and solving that > problem will likely improve buildtime for all targets. You cannot see if RTL has changed from the pointer to it, so we'd have to store that info (the "unchanged" info, or the "cost" info) somewhere. Maybe it would be useful to store it in the RTL itself? Segher