From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11760 invoked by alias); 10 Feb 2015 22:17:53 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 11608 invoked by uid 48); 10 Feb 2015 22:17:49 -0000 From: "vmakarov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register Date: Tue, 10 Feb 2015 22:17:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: vmakarov at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-02/txt/msg01100.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317 Vladimir Makarov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at gcc dot gnu.org --- Comment #6 from Vladimir Makarov --- p107 is a pic pseudo. IRA spills it (most probably it improves overall allocation). Every usage of p107 needs a reload. Reloads of p107 are decreased in BB scope through inheritance. LRA can do inheritance in EBB scope too (reload pass was never able to do this). LRA does not form EBB containing more one BB as probability of each branch in the code in question is exactly 50%. LRA adds BB to EBB only if fall through probability is more 50%. Therefore each EBB in the code contains one block. If we permits BB with exact 50% fall through, we would get the following code. 103: L103: -> .L5 104: NOTE_INSN_BASIC_BLOCK 9 105: [cx:SI]=di:SI 850: bp:SI=[sp:SI+0x1c] 585: di:SI=bp:SI 106: [di:SI+const(unspec[`out_pos'] 1)]=bx:SI 586: di:SI=bp:SI 107: di:SI=[di:SI+const(unspec[`val3'] 1)] 108: {cx:SI=bx:SI+ax:SI;clobber flags:CC;} 109: flags:CC=cmp(dx:SI,cx:SI) 110: pc={(geu(flags:CC,0))?L114:pc} REG_BR_PROB 5000 111: NOTE_INSN_BASIC_BLOCK 10 587: bx:SI=bp:SI 112: bx:SI=[bx:SI+const(unspec[`out'] 1)] 113: {cx:SI=bx:SI+ax:SI;clobber flags:CC;} 114: L114: -> .L6 115: NOTE_INSN_BASIC_BLOCK 11 116: [bx:SI]=di:SI 852: bp:SI=[sp:SI+0x1c] 588: bx:SI=bp:SI 117: [bx:SI+const(unspec[`out_pos'] 1)]=cx:SI 589: bx:SI=bp:SI 118: di:SI=[bx:SI+const(unspec[`val4'] 1)] 119: {bx:SI=cx:SI+ax:SI;clobber flags:CC;} 120: flags:CC=cmp(dx:SI,bx:SI) 121: pc={(geu(flags:CC,0))?L125:pc} REG_BR_PROB 5000 122: NOTE_INSN_BASIC_BLOCK 12 590: bx:SI=bp:SI 123: cx:SI=[bx:SI+const(unspec[`out'] 1)] 124: {bx:SI=cx:SI+ax:SI;clobber flags:CC;} As you can see, pic pseudo is loaded once for each EBB (BBs 9 10) and EBB (BBs 11 12). For some reason GCSE after RA lift the pic loads up and the final code looks like: leal (%ecx,%eax), %ebx .L5: movl %edi, (%ecx) leal (%ebx,%eax), %ecx movl %ebx, out_pos@GOTOFF(%ebp) cmpl %ecx, %edx movl val3@GOTOFF(%ebp), %edi jnb .L6 movl out@GOTOFF(%ebp), %ebx movl 28(%esp), %ebp ! lifted load leal (%ebx,%eax), %ecx .L6: movl %edi, (%ebx) leal (%ecx,%eax), %ebx movl %ecx, out_pos@GOTOFF(%ebp) cmpl %ebx, %edx movl val4@GOTOFF(%ebp), %edi jnb .L7 movl out@GOTOFF(%ebp), %ecx movl 28(%esp), %ebp ! lifted load leal (%ecx,%eax), %ebx It looks better but we could still remove one redundant load. It needs inheritance beyond EBB. This is very difficult to implement. Please remember that inheritance in EBB scope is already step forward in comparison with old reload pass one. The question is should we change fall through probability to get the code above? I don't know. It needs at least a good benchmarking as changing heuristic for improving one case might result in performance degradation in more cases. RA is all about heuristics, it will be always some cases where RA can be improved more but fixing them by adding new heuristics might worsen other cases and the PR cycle will continue. So I don't think, this case will be solved for GCC-5.0 or even for next releases. Sorry.