From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 88551 invoked by alias); 12 Aug 2015 15:59:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 88532 invoked by uid 89); 12 Aug 2015 15:59:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 12 Aug 2015 15:59:40 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-35-FQJqyYplS82fdmqt2yi2Vg-1; Wed, 12 Aug 2015 16:59:34 +0100 Received: from e103246vm ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 12 Aug 2015 16:59:33 +0100 From: "Wilco Dijkstra" To: Cc: "'GCC Patches'" , "Richard Earnshaw" , "Marcus Shawcroft" , References: In-Reply-To: Subject: Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Date: Wed, 12 Aug 2015 15:59:00 -0000 Message-ID: <005301d0d517$ddd8a030$9989e090$@com> MIME-Version: 1.0 X-MC-Unique: FQJqyYplS82fdmqt2yi2Vg-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2015-08/txt/msg00622.txt.bz2 Richard Henderson wrote: > However, the way that aarch64 and alpha have done it hasn't > been ideal, in that there's a fairly costly search that must > be done every time. I've thought before about changing this > so that we would be able to cache results, akin to how we do > it in expmed.c for multiplication. >=20 > I've implemented such a caching scheme for three targets, as > a test of how much code could be shared. The answer appears > to be about 100 lines of boiler-plate. Minimal, true, but it > may still be worth it as a way of encouraging backends to do > similar things in a similar way. However it also creates new dependencies that may not be desirable (such as hash table size, algorithm used etc). > Some notes about ppc64 in particular: >=20 > * Constants aren't split until quite late, preventing all hope of > CSE'ing portions of the generated code. My gut feeling is that > this is in general a mistake, but... Late split is best in general as you want to CSE the original constants, not parts of the expansion (which would be very rarely possible). > * This is the only platform for which I bothered collecting any sort > of performance data: >=20 > As best I can tell, there is a 9% improvement in bootstrap speed > for ppc64. That is, 10 minutes off the original 109 minute build. >=20 > For aarch64 and alpha, I simply assumed there would be no loss, > since the basic search algorithm is unchanged for each. >=20 > Comments? Especially on the shared header? I'm not convinced the amount of code that could be shared is enough to be worthwhile. Also the way it is written makes the immediate generation more= =20 complex and likely consuming a lot of memory (each cached immediate requires at least 64 bytes). It is not obvious to me why it is a good idea to hide t= he simple/fast cases behind the hashing scheme - it seems better that the back= end explicitly decides which cases should be cached. I looked at the statistics of AArch64 immediate generation a while ago.=20 The interesting thing is ~95% of calls are queries, and the same query is o= n=20 average repeated 10 times in a row. So (a) it is not important to cache the= =20 expansions, and (b) the high repetition rate means a single-entry cache has a 90% hitrate. We already have a patch for this and could collect stats comparing the approaches. If a single-entry cache can provide a similar=20 benefit as caching all immediates then my preference would be to keep things simple and just cache the last query. Note the many repeated queries indicate a performance issue at a much highe= r=20 level (repeated cost queries on the same unchanged RTL), and solving that=20 problem will likely improve buildtime for all targets. Wilco