From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-476767-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11760 invoked by alias); 10 Feb 2015 22:17:53 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 11608 invoked by uid 48); 10 Feb 2015 22:17:49 -0000
From: "vmakarov at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
Date: Tue, 10 Feb 2015 22:17:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: vmakarov at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-64317-4-HosX9iv3DC@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64317-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64317-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01100.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org
--- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
  p107 is a pic pseudo. IRA spills it (most probably it improves overall
allocation).  Every usage of p107 needs a reload.  Reloads of p107 are
decreased in BB scope through inheritance.

  LRA can do inheritance in EBB scope too (reload pass was never able
to do this).  LRA does not form EBB containing more one BB as
probability of each branch in the code in question is exactly 50%.
LRA adds BB to EBB only if fall through probability is more 50%.
Therefore each EBB in the code contains one block.  If we permits BB
with exact 50% fall through, we would get the following code.

  103: L103:                     -> .L5
  104: NOTE_INSN_BASIC_BLOCK 9
  105: [cx:SI]=di:SI
  850: bp:SI=[sp:SI+0x1c]
  585: di:SI=bp:SI
  106: [di:SI+const(unspec[`out_pos'] 1)]=bx:SI
  586: di:SI=bp:SI
  107: di:SI=[di:SI+const(unspec[`val3'] 1)]
  108: {cx:SI=bx:SI+ax:SI;clobber flags:CC;}
  109: flags:CC=cmp(dx:SI,cx:SI)
  110: pc={(geu(flags:CC,0))?L114:pc}
      REG_BR_PROB 5000
  111: NOTE_INSN_BASIC_BLOCK 10
  587: bx:SI=bp:SI
  112: bx:SI=[bx:SI+const(unspec[`out'] 1)]
  113: {cx:SI=bx:SI+ax:SI;clobber flags:CC;}
  114: L114:                     -> .L6
  115: NOTE_INSN_BASIC_BLOCK 11
  116: [bx:SI]=di:SI
  852: bp:SI=[sp:SI+0x1c]
  588: bx:SI=bp:SI
  117: [bx:SI+const(unspec[`out_pos'] 1)]=cx:SI
  589: bx:SI=bp:SI
  118: di:SI=[bx:SI+const(unspec[`val4'] 1)]
  119: {bx:SI=cx:SI+ax:SI;clobber flags:CC;}
  120: flags:CC=cmp(dx:SI,bx:SI)
  121: pc={(geu(flags:CC,0))?L125:pc}
      REG_BR_PROB 5000
  122: NOTE_INSN_BASIC_BLOCK 12
  590: bx:SI=bp:SI
  123: cx:SI=[bx:SI+const(unspec[`out'] 1)]
  124: {bx:SI=cx:SI+ax:SI;clobber flags:CC;}

  As you can see, pic pseudo is loaded once for each EBB (BBs 9 10)
and EBB (BBs 11 12).

  For some reason GCSE after RA lift the pic loads up and the final
code looks like:

        leal    (%ecx,%eax), %ebx
.L5:
        movl    %edi, (%ecx)
        leal    (%ebx,%eax), %ecx
        movl    %ebx, out_pos@GOTOFF(%ebp)
        cmpl    %ecx, %edx
        movl    val3@GOTOFF(%ebp), %edi
        jnb     .L6
        movl    out@GOTOFF(%ebp), %ebx
        movl    28(%esp), %ebp               ! lifted load
        leal    (%ebx,%eax), %ecx
.L6:
        movl    %edi, (%ebx)
        leal    (%ecx,%eax), %ebx
        movl    %ecx, out_pos@GOTOFF(%ebp)
        cmpl    %ebx, %edx
        movl    val4@GOTOFF(%ebp), %edi
        jnb     .L7
    movl    out@GOTOFF(%ebp), %ecx
        movl    28(%esp), %ebp               ! lifted load
        leal    (%ecx,%eax), %ebx

  It looks better but we could still remove one redundant load.  It
needs inheritance beyond EBB.  This is very difficult to implement.
Please remember that inheritance in EBB scope is already step forward
in comparison with old reload pass one.

  The question is should we change fall through probability to get the
code above?  I don't know.  It needs at least a good benchmarking as
changing heuristic for improving one case might result in performance
degradation in more cases.

  RA is all about heuristics, it will be always some cases where RA
can be improved more but fixing them by adding new heuristics might
worsen other cases and the PR cycle will continue.

  So I don't think, this case will be solved for GCC-5.0 or even for
next releases.  Sorry.