From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 41408 invoked by alias); 23 Nov 2015 09:36:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 40675 invoked by uid 89); 23 Nov 2015 09:36:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.2 required=5.0 tests=AWL,BAYES_20,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e06smtp12.uk.ibm.com Received: from e06smtp12.uk.ibm.com (HELO e06smtp12.uk.ibm.com) (195.75.94.108) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 23 Nov 2015 09:36:13 +0000 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 23 Nov 2015 09:36:10 -0000 Received: from d06dlp02.portsmouth.uk.ibm.com (9.149.20.14) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 23 Nov 2015 09:36:08 -0000 X-IBM-Helo: d06dlp02.portsmouth.uk.ibm.com X-IBM-MailFrom: vogt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 4EC1F219004D for ; Mon, 23 Nov 2015 09:36:02 +0000 (GMT) Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tAN9a8Bh57606290 for ; Mon, 23 Nov 2015 09:36:08 GMT Received: from d06av01.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tAN9a7au000822 for ; Mon, 23 Nov 2015 02:36:07 -0700 Received: from bl3ahm9f.de.ibm.com (sig-9-83-75-102.evts.uk.ibm.com [9.83.75.102]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id tAN9a7LL000808; Mon, 23 Nov 2015 02:36:07 -0700 Received: from dvogt by bl3ahm9f.de.ibm.com with local (Exim 4.76) (envelope-from ) id 1a0nXc-0004Cd-Dx; Mon, 23 Nov 2015 10:36:08 +0100 Date: Mon, 23 Nov 2015 09:39:00 -0000 From: Dominik Vogt To: gcc-patches@gcc.gnu.org Cc: Andreas Krebbel Subject: Re: [RFC] Cse reducing performance of register allocation with -O2 Message-ID: <20151123093608.GA13455@linux.vnet.ibm.com> Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: gcc-patches@gcc.gnu.org, Andreas Krebbel References: <20151013131230.GA30317@linux.vnet.ibm.com> <561D3A28.9090407@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <561D3A28.9090407@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15112309-0009-0000-0000-000006B986DD X-SW-Source: 2015-11/txt/msg02658.txt.bz2 On Tue, Oct 13, 2015 at 11:06:48AM -0600, Jeff Law wrote: > On 10/13/2015 07:12 AM, Dominik Vogt wrote: > >In some cases, the work of the cse1 pass is counterproductive, as > >we noticed on s390x. The effect described below is present since > >at least 4.8.0. Note that this may not become manifest in a > >performance issue problem on all platforms. Also note that -O1 > >does not show this behaviour because the responsible code is only > >executed with -O2 or higher. > > > >The core of the problem is the was cse1 sometimes handles function > >parameters. Roughly, the observed situation is > > > >Before cse1 > > > > start of function > > set pseudoreg Rp to the first argument from hardreg R2 > > (some code that uses Rp) > > set R2 to Rp > > > >After cse1: > > > > start of function > > set pseudoreg Rp to the first argument from hardreg R2 > > (some code that uses Rp) <--- The use of Rp is still present > > set R2 to R2 <--- cse1 has replaced Rp with R2 > > > >After that, the set pattern is removed completely, and now we have > >both, Rp and R2 live in the drafted code snippet. Because R2 ist > >still supposed to be live later on, the ira pass chooses a > >different hard register (R1) for Rp, and code to copy R1 back to > >R2 is added later. (See further down for Rtl and assembly code.) ... > >So, I've made an experimental hack (see attachment) and treid > >that. In a larger test suite, register copies could be saved in > >quite some places (including the test program below), but in other > >places new register copies were introduced, resulting in about > >twice as much "issues" as without the patch. > > > >Maybe the patch is just too coarse. In general I'd assume that > >the register allocator does a better job of assigning hard > >registers to pseudo registers. Is it possible to better describe > >when cse1 should keep its hands off pseudo registers? > We don't really have a way to describe this. > > I know Vlad looked at problems in this space -- essentially knowing > when two registers had the same value in the allocators/reload and > exploiting that information. > > My recollection was it didn't help in any measurable way -- I think > he discussed it during one of the old GCC summit conferences. That > was also in the reload era. > > Ultimately this feels like all the issues around coalescing and > copy-propagation. With that in mind, if we had lifetime & conflict > information, then we'd be able to query that and perhaps be able to > make different choices. I've spent some more time to try out the naive approach of detecting this situation in cse_insn(). 1. In cse_insn() IF current "set" is "set Hardreg H := Pseudoreg P" AND P is generated as a copy of C further up in the extended BB AND P and H still contain the same value AND Cse considers to replace the set with "set H := H" AND P is still live at the end of the EBB (In the test program this prevents that *all uses of P are replaced by H.) THEN do not replace => Testing this with the Spec 2006 suite on S390 results in a small gain in some cases, a small loss im lots of cases, and a substantial win in two cases and a substantial loss in one. On average there is a small win. I've not tested that on x86, but assuming that x86 does not suffer from the original problem I expect to see mostly losses. This patch requires that a per-register bitmap is created for each EBB to record which pseude registers have been generated inside the EBB. 2. IF current "set" is "set Hardreg H := Pseudoreg P" AND P is generated as a copy of C further up in the extended BB AND P and H still contain the same value AND Cse considers to replace the set with "set H := H" AND P is still live at the end of the EBB AND P is used between generation and the current instruction. THEN do not replace => Has fewer win and fewer loss situations and is only slightly better on average than (1). No real improvement. This patch requires scanning every insn in cse_insn() for all uses of all pseudo registers. At the moment there is no function in rtlanal.c to do this in one call, so I've just scanned for each one individually, causing a dramatic increase of compile time (* 2 or even more). So, my conclusion is that the attempt to fix this by patching cse_insn() is more or less futile. Replacing the pseudo register with thte hard register early is actually often a *good* thing, and to determine whether it's good or bad the code in cse_insn() would have to correctly guess what later passes do. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany