From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25515 invoked by alias); 19 Feb 2015 13:18:30 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 25469 invoked by uid 48); 19 Feb 2015 13:18:26 -0000 From: "law at redhat dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register Date: Thu, 19 Feb 2015 13:18:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: law at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg02106.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317 --- Comment #12 from Jeffrey A. Law --- I'm very aware that the x86 backend doesn't support a fixed PIC register anymore. RA was going to have to spill something. THe PIC register is needed in three different loops, L0, L1 and L2. L0 needs 9 general purpose registers, L8 needs 8 general purpose registers and L2 needs 9 general purpose registers. ie, there's going to be spills, there's simply no way around it. r107 (PIC pseudo) gets split into 3 allocnos. A1, A11 and A237, covering Loops 0, 2, 1 respectively. It's live throughout most of the resultant function by way of explicit references and the need to have %ebx set up prior to external calls. For Loop 1 & Loop 2, the respective allocnos (A237, A11) are not used/set within the loop at all, ie, they are transparent within their respective loops. IRA does exactly what we want here by keeping the PIC register in memory which frees up a register within those loops for other objects that are used within the loop. Of course, to do that we have to reload the value for the uses outside the boundary of those loops. Loop 0 (bbs 0, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) is the most interesting and is the one where we have those annoying reloads. One of the things I notice is that LRA is generating sequences like: (insn 581 89 90 6 (set (reg:SI 3 bx [107]) (mem/c:SI (plus:SI (reg/f:SI 7 sp) (const_int 28 [0x1c])) [4 %sfp+-4 S4 A32])) j.c:19 90 {*movsi_internal} (nil)) (insn 90 581 91 6 (set (reg/f:SI 3 bx [orig:142 D.2145 ] [142]) (mem/f/c:SI (plus:SI (reg:SI 3 bx [107]) (const:SI (unspec:SI [ (symbol_ref:SI ("out") [flags 0x2] ) ] UNSPEC_GOTOFF))) [1 out+0 S4 A32])) j.c:19 90 {*movsi_internal} (nil)) Note how we load %ebx from memory, then use/clobber it in the next insn. That makes it impossible for the post-reload optimizers to help clean this up. How hard would it be to generate code in LRA where those two insns set different registers in those local snippets of code? In this particular case, %ebp is locally available and there's no strong reason why we have to use %ebx. By using different destinations for insns 581 and 90, the source MEM of insn 581 would be available after insn 90. And by making the value available postreload-gcse would be able to commonize those loads.