From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-477773-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 25515 invoked by alias); 19 Feb 2015 13:18:30 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 25469 invoked by uid 48); 19 Feb 2015 13:18:26 -0000
From: "law at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
Date: Thu, 19 Feb 2015 13:18:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: law at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-64317-4-w7aauuCDgf@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64317-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64317-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg02106.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317
--- Comment #12 from Jeffrey A. Law <law at redhat dot com> ---
I'm very aware that the x86 backend doesn't support a fixed PIC register
anymore.  

RA was going to have to spill something.  THe PIC register is needed in three
different loops, L0, L1 and L2.  L0 needs 9 general purpose registers, L8 needs
8 general purpose registers and L2 needs 9 general purpose registers.  ie,
there's going to be spills, there's simply no way around it.

r107 (PIC pseudo) gets split into 3 allocnos.  A1, A11 and A237, covering Loops
0, 2, 1 respectively.  It's live throughout most of the resultant function by
way of explicit references and the need to have %ebx set up prior to external
calls.

For Loop 1 & Loop 2, the respective allocnos (A237, A11) are not used/set
within the loop at all, ie, they are transparent within their respective loops.
 IRA does exactly what we want here by keeping the PIC register in memory which
frees up a register within those loops for other objects that are used within
the loop.  Of course, to do that we have to reload the value for the uses
outside the boundary of those loops.

Loop 0 (bbs 0, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) is the most
interesting and is the one where we have those annoying reloads.

One of the things I notice is that LRA is generating sequences like:
(insn 581 89 90 6 (set (reg:SI 3 bx [107])
        (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 28 [0x1c])) [4 %sfp+-4 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))
(insn 90 581 91 6 (set (reg/f:SI 3 bx [orig:142 D.2145 ] [142])
        (mem/f/c:SI (plus:SI (reg:SI 3 bx [107])
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("out") [flags 0x2] <var_decl
0x7ffff670bc60 out>)
                        ] UNSPEC_GOTOFF))) [1 out+0 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))

Note how we load %ebx from memory, then use/clobber it in the next insn.  That
makes it impossible for the post-reload optimizers to help clean this up.  How
hard would it be to generate code in LRA where those two insns set different
registers in those local snippets of code?

In this particular case, %ebp is locally available and there's no strong reason
why we have to use %ebx.   By using different destinations for insns 581 and
90, the source MEM of insn 581 would be available after insn 90.  And by making
the value available postreload-gcse would be able to commonize those loads.