public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register
@ 2014-12-15 15:03 izamyatin at gmail dot com
  2014-12-15 15:07 ` [Bug rtl-optimization/64317] " rguenth at gcc dot gnu.org
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: izamyatin at gmail dot com @ 2014-12-15 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

            Bug ID: 64317
           Summary: [5 Regression] Ineffective allocation of PIC base
                    register
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: izamyatin at gmail dot com
                CC: vmakarov at redhat dot com
            Target: i686

Created attachment 34285
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34285&action=edit
Reproducer

For the attached test compiled with -O2  -m32 -fPIE -pie  after r218059 we
generate 

    call    __x86.get_pc_thunk.ax
    addl    $_GLOBAL_OFFSET_TABLE_, %eax
    subl    $28, %esp
    .cfi_def_cfa_offset 48
    movl    48(%esp), %edi
    movl    %eax, 12(%esp)            <--- PIC reg spill
    testl    %edi, %edi
    je    .L8
    movl    12(%esp), %eax            <--- PIC reg fill
    xorl    %esi, %esi
    movl    c@GOT(%eax), %ebp
    .p2align 4,,10
    .p2align 3
.L4:
    movl    12(%esp), %ebx            <--- PIC reg fill
    addl    $1, %esi
    call    bar@PLT

while for r218058 there is no spill and only reg-reg fill:

    call    __x86.get_pc_thunk.di
    addl    $_GLOBAL_OFFSET_TABLE_, %edi
    subl    $12, %esp
    .cfi_def_cfa_offset 32
    movl    32(%esp), %eax
    testl    %eax, %eax
    je    .L8
    movl    c@GOT(%edi), %ebp
    xorl    %esi, %esi
    .p2align 4,,10
    .p2align 3
.L4:
    movl    %edi, %ebx
    addl    $1, %esi
    call    bar@PLT


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
@ 2014-12-15 15:07 ` rguenth at gcc dot gnu.org
  2015-01-13 10:26 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-12-15 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Target Milestone|---                         |5.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
  2014-12-15 15:07 ` [Bug rtl-optimization/64317] " rguenth at gcc dot gnu.org
@ 2015-01-13 10:26 ` rguenth at gcc dot gnu.org
  2015-01-23 20:17 ` vmakarov at gcc dot gnu.org
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-01-13 10:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
           Priority|P3                          |P1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-01-13
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
  2014-12-15 15:07 ` [Bug rtl-optimization/64317] " rguenth at gcc dot gnu.org
  2015-01-13 10:26 ` rguenth at gcc dot gnu.org
@ 2015-01-23 20:17 ` vmakarov at gcc dot gnu.org
  2015-02-05 12:22 ` enkovich.gnu at gmail dot com
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-01-23 20:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #2 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Fri Jan 23 20:15:56 2015
New Revision: 220060

URL: https://gcc.gnu.org/viewcvs?rev=220060&root=gcc&view=rev
Log:
2015-01-23  Vladimir Makarov  <vmakarov@redhat.com>

    PR target/64317
    * lra-lives.c (make_hard_regno_born): Add parameter.  Don't make
    REAL_PIC_OFFSET_TABLE_REGNUM conflicting with pic offset pseudo.
    (mark_regno_live, process_bb_lives): Pass new paramater value to
    make_hard_regno_born.

2015-01-23  Vladimir Makarov  <vmakarov@redhat.com>

    PR target/64317
    * gcc.target/i386/pr64317.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr64317.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-lives.c
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (2 preceding siblings ...)
  2015-01-23 20:17 ` vmakarov at gcc dot gnu.org
@ 2015-02-05 12:22 ` enkovich.gnu at gmail dot com
  2015-02-06  9:52 ` jakub at gcc dot gnu.org
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: enkovich.gnu at gmail dot com @ 2015-02-05 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Ilya Enkovich <enkovich.gnu at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |enkovich.gnu at gmail dot com

--- Comment #3 from Ilya Enkovich <enkovich.gnu at gmail dot com> ---
Created attachment 34675
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34675&action=edit
Another reproducer

I found one more reproducer for the problem.  Generated code for multiple
'output' calls inlined into test':

..
.L5:
        movl    28(%esp), %edx               <-- load PIC reg
        movl    %edi, (%ecx)
        leal    (%ebx,%eax), %ecx
        cmpl    %ecx, %ebp
        movl    %ebx, out_pos@GOTOFF(%edx)
        movl    val3@GOTOFF(%edx), %edi
        jnb     .L6
        movl    28(%esp), %ebx              <-- have value in EDX
        movl    out@GOTOFF(%ebx), %ebx
        leal    (%ebx,%eax), %ecx
.L6:
        movl    28(%esp), %edx              <-- NOP
        movl    %edi, (%ebx)
        leal    (%ecx,%eax), %ebx
        cmpl    %ebx, %ebp
        movl    %ecx, out_pos@GOTOFF(%edx)
        movl    val4@GOTOFF(%edx), %edi
        jnb     .L7
        movl    28(%esp), %ecx             <-- have value in EDX
        movl    out@GOTOFF(%ecx), %ecx
        leal    (%ecx,%eax), %ebx
.L7:
        movl    28(%esp), %edx             <-- NOP
        movl    %edi, (%ecx)
...

BTW if I put __attribute__((noinline)) for crc32 function then mentioned code
becomes better and we don't have these two useless instructions in each
function instance.

Compilation string:

gcc -Ofast -funroll-loops -m32 -march=slm -fPIE test.i -S

Used compiler:

gcc version 5.0.0 20150203 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (3 preceding siblings ...)
  2015-02-05 12:22 ` enkovich.gnu at gmail dot com
@ 2015-02-06  9:52 ` jakub at gcc dot gnu.org
  2015-02-06 10:06 ` enkovich.gnu at gmail dot com
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-06  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Does #c2 fix this, or is #c3 an unrelated bugreport that still needs fixing?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (4 preceding siblings ...)
  2015-02-06  9:52 ` jakub at gcc dot gnu.org
@ 2015-02-06 10:06 ` enkovich.gnu at gmail dot com
  2015-02-10 22:17 ` vmakarov at gcc dot gnu.org
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: enkovich.gnu at gmail dot com @ 2015-02-06 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #5 from Ilya Enkovich <enkovich.gnu at gmail dot com> ---
(In reply to Jakub Jelinek from comment #4)
> Does #c2 fix this, or is #c3 an unrelated bugreport that still needs fixing?

Problem is still seen after the fix.  I put test here because of the same
symptom.  Should I open a new one?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (5 preceding siblings ...)
  2015-02-06 10:06 ` enkovich.gnu at gmail dot com
@ 2015-02-10 22:17 ` vmakarov at gcc dot gnu.org
  2015-02-12 21:04 ` law at redhat dot com
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-10 22:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
  p107 is a pic pseudo. IRA spills it (most probably it improves overall
allocation).  Every usage of p107 needs a reload.  Reloads of p107 are
decreased in BB scope through inheritance.

  LRA can do inheritance in EBB scope too (reload pass was never able
to do this).  LRA does not form EBB containing more one BB as
probability of each branch in the code in question is exactly 50%.
LRA adds BB to EBB only if fall through probability is more 50%.
Therefore each EBB in the code contains one block.  If we permits BB
with exact 50% fall through, we would get the following code.

  103: L103:                     -> .L5
  104: NOTE_INSN_BASIC_BLOCK 9
  105: [cx:SI]=di:SI
  850: bp:SI=[sp:SI+0x1c]
  585: di:SI=bp:SI
  106: [di:SI+const(unspec[`out_pos'] 1)]=bx:SI
  586: di:SI=bp:SI
  107: di:SI=[di:SI+const(unspec[`val3'] 1)]
  108: {cx:SI=bx:SI+ax:SI;clobber flags:CC;}
  109: flags:CC=cmp(dx:SI,cx:SI)
  110: pc={(geu(flags:CC,0))?L114:pc}
      REG_BR_PROB 5000
  111: NOTE_INSN_BASIC_BLOCK 10
  587: bx:SI=bp:SI
  112: bx:SI=[bx:SI+const(unspec[`out'] 1)]
  113: {cx:SI=bx:SI+ax:SI;clobber flags:CC;}
  114: L114:                     -> .L6
  115: NOTE_INSN_BASIC_BLOCK 11
  116: [bx:SI]=di:SI
  852: bp:SI=[sp:SI+0x1c]
  588: bx:SI=bp:SI
  117: [bx:SI+const(unspec[`out_pos'] 1)]=cx:SI
  589: bx:SI=bp:SI
  118: di:SI=[bx:SI+const(unspec[`val4'] 1)]
  119: {bx:SI=cx:SI+ax:SI;clobber flags:CC;}
  120: flags:CC=cmp(dx:SI,bx:SI)
  121: pc={(geu(flags:CC,0))?L125:pc}
      REG_BR_PROB 5000
  122: NOTE_INSN_BASIC_BLOCK 12
  590: bx:SI=bp:SI
  123: cx:SI=[bx:SI+const(unspec[`out'] 1)]
  124: {bx:SI=cx:SI+ax:SI;clobber flags:CC;}

  As you can see, pic pseudo is loaded once for each EBB (BBs 9 10)
and EBB (BBs 11 12).

  For some reason GCSE after RA lift the pic loads up and the final
code looks like:

        leal    (%ecx,%eax), %ebx
.L5:
        movl    %edi, (%ecx)
        leal    (%ebx,%eax), %ecx
        movl    %ebx, out_pos@GOTOFF(%ebp)
        cmpl    %ecx, %edx
        movl    val3@GOTOFF(%ebp), %edi
        jnb     .L6
        movl    out@GOTOFF(%ebp), %ebx
        movl    28(%esp), %ebp               ! lifted load
        leal    (%ebx,%eax), %ecx
.L6:
        movl    %edi, (%ebx)
        leal    (%ecx,%eax), %ebx
        movl    %ecx, out_pos@GOTOFF(%ebp)
        cmpl    %ebx, %edx
        movl    val4@GOTOFF(%ebp), %edi
        jnb     .L7
    movl    out@GOTOFF(%ebp), %ecx
        movl    28(%esp), %ebp               ! lifted load
        leal    (%ecx,%eax), %ebx

  It looks better but we could still remove one redundant load.  It
needs inheritance beyond EBB.  This is very difficult to implement.
Please remember that inheritance in EBB scope is already step forward
in comparison with old reload pass one.

  The question is should we change fall through probability to get the
code above?  I don't know.  It needs at least a good benchmarking as
changing heuristic for improving one case might result in performance
degradation in more cases.

  RA is all about heuristics, it will be always some cases where RA
can be improved more but fixing them by adding new heuristics might
worsen other cases and the PR cycle will continue.

  So I don't think, this case will be solved for GCC-5.0 or even for
next releases.  Sorry.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (6 preceding siblings ...)
  2015-02-10 22:17 ` vmakarov at gcc dot gnu.org
@ 2015-02-12 21:04 ` law at redhat dot com
  2015-02-13  3:35 ` vmakarov at gcc dot gnu.org
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-02-12 21:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #7 from Jeffrey A. Law <law at redhat dot com> ---
Vlad,

What's the rationale behind the 50% probability cutoff for forming an EBB?  For
the purposes of inheritance, ISTM you want the biggest EBBs possible to give
you a maximal window in which to find an insn to inherit from?  Or is it the
case that EBB formation impacts both spilling and inheritance in IRA/LRA?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (7 preceding siblings ...)
  2015-02-12 21:04 ` law at redhat dot com
@ 2015-02-13  3:35 ` vmakarov at gcc dot gnu.org
  2015-02-13  3:37 ` vmakarov at gcc dot gnu.org
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-13  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #9 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #7)
> Vlad,
> 
> What's the rationale behind the 50% probability cutoff for forming an EBB? 
> For the purposes of inheritance, ISTM you want the biggest EBBs possible to
> give you a maximal window in which to find an insn to inherit from?  Or is
> it the case that EBB formation impacts both spilling and inheritance in
> IRA/LRA?

I remember I tried different cut-off probabilities.  But LRA is changing
quickly, may be it is time to check this again.

I'd agree with you that for inheritance the bigger EBB, the better.  But there
are also optional reloads.  So if we had EBB consisting of BB1 and very low
probability BB2

BB1
    op with pseudo spilled
BB2
    the spilled pseudo use

optional reload and inheritance would transform it into

BB1
    load hr, pseudo spilled
    op with hr
BB2
    hr use

With 50% cut-off we would have

BB1
    op with pseudo spilled
BB2
    the spilled pseudo use

which is better code if probability BB1 >> BB2.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (8 preceding siblings ...)
  2015-02-13  3:35 ` vmakarov at gcc dot gnu.org
@ 2015-02-13  3:37 ` vmakarov at gcc dot gnu.org
  2015-02-13  8:06 ` enkovich.gnu at gmail dot com
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-13  3:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #10 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #8)
> And for GCC 5, ISTM the question that hasn't been answered, particularly
> with regard to the second reproducer is whether or this is a regression for
> the overall performance of that code.
> 
> It's certainly possible that IRA determined that %ebx was better used to
> hold a different value and that the PIC register might end up in a
> call-clobbered register.  If the object in %ebx is heavily used, the
> benefits of keeping that value in %ebx may outweigh the cost of having the
> PIC value in a different register (say perhaps one that is call-clobbered).

I guess it is easy to check by preventing pic pseudo generation.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (9 preceding siblings ...)
  2015-02-13  3:37 ` vmakarov at gcc dot gnu.org
@ 2015-02-13  8:06 ` enkovich.gnu at gmail dot com
  2015-02-19 13:18 ` law at redhat dot com
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: enkovich.gnu at gmail dot com @ 2015-02-13  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #11 from Ilya Enkovich <enkovich.gnu at gmail dot com> ---
(In reply to Vladimir Makarov from comment #10)
> I guess it is easy to check by preventing pic pseudo generation.

i386 back-end doesn't support fixed PIC register any more.  This test case
demonstrates performance regression in some EEMBC 1.1 tests caused by pseudo
PIC register introduction.

It is unclear why RA decided to spill PIC register.  If we look at loop's code
then we see PIC register is used in each line of code and seems to be the most
used register.

It also seems weird to me that code for the first loop becomes much better
(with no PIC reg fills) if we restrict inlining for the other one.  How does
the second loop affect allocation in the first one?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (10 preceding siblings ...)
  2015-02-13  8:06 ` enkovich.gnu at gmail dot com
@ 2015-02-19 13:18 ` law at redhat dot com
  2015-02-19 16:21 ` vmakarov at gcc dot gnu.org
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-02-19 13:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #12 from Jeffrey A. Law <law at redhat dot com> ---
I'm very aware that the x86 backend doesn't support a fixed PIC register
anymore.  

RA was going to have to spill something.  THe PIC register is needed in three
different loops, L0, L1 and L2.  L0 needs 9 general purpose registers, L8 needs
8 general purpose registers and L2 needs 9 general purpose registers.  ie,
there's going to be spills, there's simply no way around it.

r107 (PIC pseudo) gets split into 3 allocnos.  A1, A11 and A237, covering Loops
0, 2, 1 respectively.  It's live throughout most of the resultant function by
way of explicit references and the need to have %ebx set up prior to external
calls.

For Loop 1 & Loop 2, the respective allocnos (A237, A11) are not used/set
within the loop at all, ie, they are transparent within their respective loops.
 IRA does exactly what we want here by keeping the PIC register in memory which
frees up a register within those loops for other objects that are used within
the loop.  Of course, to do that we have to reload the value for the uses
outside the boundary of those loops.

Loop 0 (bbs 0, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19) is the most
interesting and is the one where we have those annoying reloads.

One of the things I notice is that LRA is generating sequences like:
(insn 581 89 90 6 (set (reg:SI 3 bx [107])
        (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 28 [0x1c])) [4 %sfp+-4 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))
(insn 90 581 91 6 (set (reg/f:SI 3 bx [orig:142 D.2145 ] [142])
        (mem/f/c:SI (plus:SI (reg:SI 3 bx [107])
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("out") [flags 0x2] <var_decl
0x7ffff670bc60 out>)
                        ] UNSPEC_GOTOFF))) [1 out+0 S4 A32])) j.c:19 90
{*movsi_internal}
     (nil))

Note how we load %ebx from memory, then use/clobber it in the next insn.  That
makes it impossible for the post-reload optimizers to help clean this up.  How
hard would it be to generate code in LRA where those two insns set different
registers in those local snippets of code?

In this particular case, %ebp is locally available and there's no strong reason
why we have to use %ebx.   By using different destinations for insns 581 and
90, the source MEM of insn 581 would be available after insn 90.  And by making
the value available postreload-gcse would be able to commonize those loads.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (11 preceding siblings ...)
  2015-02-19 13:18 ` law at redhat dot com
@ 2015-02-19 16:21 ` vmakarov at gcc dot gnu.org
  2015-02-20 23:24 ` law at redhat dot com
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-19 16:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #13 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #12)
> 
> One of the things I notice is that LRA is generating sequences like:
> (insn 581 89 90 6 (set (reg:SI 3 bx [107])
>         (mem/c:SI (plus:SI (reg/f:SI 7 sp)
>                 (const_int 28 [0x1c])) [4 %sfp+-4 S4 A32])) j.c:19 90
> {*movsi_internal}
>      (nil))
> (insn 90 581 91 6 (set (reg/f:SI 3 bx [orig:142 D.2145 ] [142])
>         (mem/f/c:SI (plus:SI (reg:SI 3 bx [107])
>                 (const:SI (unspec:SI [
>                             (symbol_ref:SI ("out") [flags 0x2] <var_decl
> 0x7ffff670bc60 out>)
>                         ] UNSPEC_GOTOFF))) [1 out+0 S4 A32])) j.c:19 90
> {*movsi_internal}
>      (nil))
> 
> Note how we load %ebx from memory, then use/clobber it in the next insn. 
> That makes it impossible for the post-reload optimizers to help clean this
> up.  How hard would it be to generate code in LRA where those two insns set
> different registers in those local snippets of code?
> 
> In this particular case, %ebp is locally available and there's no strong
> reason why we have to use %ebx.   By using different destinations for insns
> 581 and 90, the source MEM of insn 581 would be available after insn 90. 
> And by making the value available postreload-gcse would be able to commonize
> those loads.

Jeff, thanks for the analysis.  PIC pseudo really can be in different regs. 
There are a lot of places in RA in what reg the pic pseudo will be placed for
particular live range.  Probably there is a place where the decision is done
for ebx without considering other alternatives.  I'll investigate this problem.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (12 preceding siblings ...)
  2015-02-19 16:21 ` vmakarov at gcc dot gnu.org
@ 2015-02-20 23:24 ` law at redhat dot com
  2015-02-21 20:10 ` vmakarov at gcc dot gnu.org
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-02-20 23:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #14 from Jeffrey A. Law <law at redhat dot com> ---
Just trying to help out where I can.   It's similar to the round robin use of
reload regs we've had in reload for a while.  THe idea was to hopefully have
reloaded values lying around in a useful place more often to improve
inheritance.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (13 preceding siblings ...)
  2015-02-20 23:24 ` law at redhat dot com
@ 2015-02-21 20:10 ` vmakarov at gcc dot gnu.org
  2015-02-27 23:07 ` vmakarov at gcc dot gnu.org
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-21 20:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #15 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #14)
> Just trying to help out where I can.   It's similar to the round robin use
> of reload regs we've had in reload for a while.  THe idea was to hopefully
> have reloaded values lying around in a useful place more often to improve
> inheritance.

Yes, I remember this.  LRA has analogous technique called hard registers
levelling. The idea is the same to use all hard registers with equal
probability.  Although I think there is some possibility to improve this making
it more CFG sensitive although it is harder to do it in LRA as hard register
allocation in LRA is done in more global scope than in reload.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (14 preceding siblings ...)
  2015-02-21 20:10 ` vmakarov at gcc dot gnu.org
@ 2015-02-27 23:07 ` vmakarov at gcc dot gnu.org
  2015-03-03 19:08 ` law at redhat dot com
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-02-27 23:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #16 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Fri Feb 27 22:02:05 2015
New Revision: 221070

URL: https://gcc.gnu.org/viewcvs?rev=221070&root=gcc&view=rev
Log:
2015-02-27  Vladimir Makarov  <vmakarov@redhat.com>

    PR target/64317
    * params.def (PARAM_LRA_INHERITANCE_EBB_PROBABILITY_CUTOFF): New.
    * params.h (LRA_INHERITANCE_EBB_PROBABILITY_CUTOFF): New.
    * lra-constraints.c: Include "params.h".
    (EBB_PROBABILITY_CUTOFF): Use
    LRA_INHERITANCE_EBB_PROBABILITY_CUTOFF.
    (lra_inheritance): Use '<' instead of '<=' for
    EBB_PROBABILITY_CUTOFF.
    * doc/invoke.texi (lra-inheritance-ebb-probability-cutoff):
    Document change.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/lra-constraints.c
    trunk/gcc/params.def
    trunk/gcc/params.h


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (15 preceding siblings ...)
  2015-02-27 23:07 ` vmakarov at gcc dot gnu.org
@ 2015-03-03 19:08 ` law at redhat dot com
  2015-03-03 22:39 ` law at redhat dot com
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-03 19:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #18 from Jeffrey A. Law <law at redhat dot com> ---
Sigh...

reload_cse_regs doesn't help us because of the CODE_LABEL at the start of block
C in the mini-cfg referenced in c#17.  In essence it's using the same extended
block definition as LRA.

postreload-gcse doesn't help us because it computes local properties, but never
propagates them through the CFG.  So a memory load that occurs in block A and
is transparent through B is not seen as available at the end of B.  Thus the
load is only considered partially redundant at the start of C.

SIGH.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (16 preceding siblings ...)
  2015-03-03 19:08 ` law at redhat dot com
@ 2015-03-03 22:39 ` law at redhat dot com
  2015-03-04 17:39 ` law at redhat dot com
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-03 22:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #19 from Jeffrey A. Law <law at redhat dot com> ---
Making reload_cse_regs less pessimistic at CODE_LABELs really isn't feasible. 
cselib isn't easily turned into a scoped hash table and the multiple assignment
nature of RTL means that a simple scoped hash table isn't sufficient anyway.

I'm currently poking at postreload-gcse.  The basic idea being to detect when
an occurrence is transparent through a block and an occurrence exists in the
block's one and only predecessor.  In that case pretend the occurrence also
exists at the end of the current block.  I don't expect this to occur often,
thus I'm not computing transparency for each expression in all blocks and doing
a traditional dataflow propagation.

With that special case in place, I get just 5 reloads of the PIC register,
which appears minimal given the other register allocation decisions.  I need to
ponder a couple implementation details, but this may be the final resolution to
this BZ :-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (17 preceding siblings ...)
  2015-03-03 22:39 ` law at redhat dot com
@ 2015-03-04 17:39 ` law at redhat dot com
  2015-03-05 15:37 ` vmakarov at gcc dot gnu.org
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-04 17:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2

--- Comment #20 from Jeffrey A. Law <law at redhat dot com> ---
Still poking, but downgrading to P2.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (18 preceding siblings ...)
  2015-03-04 17:39 ` law at redhat dot com
@ 2015-03-05 15:37 ` vmakarov at gcc dot gnu.org
  2015-03-05 20:01 ` law at redhat dot com
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-03-05 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #21 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #17)
> Thanks Vlad, that patch helped.  Prior to your patch we had 15 reloads of
> the PIC register from memory, after your patch we have just 9.   However,
> several of the remaining 9 seem to be redundant.
> 
> LRA never considers a block starting with a label as participating in an
> EBB.  That's overly conservative.  A block can participate in an EBB if all
> of its predecessors are part of the same EBB.
> 
> That's particularly useful in CFGs like
>  
>     A
>    /|
>   / |
>  B  |
>   \ |
>    \|
>     C
> 
> [ Flow downward of course. ]
> 
> 
> If we assume that B is the fallthru path, then LRA will try to make AB into
> an EBB.  But it will not consider C because C will start with a label.  That
> ultimately causes missed inheritances in this example.

I am agree the bigger scope would help but it is hard to implement.  If we
consider the graph hammock structure, inherited value of pseudo P in A to C
might be changed in B.  It works if we don't change it in B but then the
structure can be dependent on considered pseudos for inheritance.  It is harder
to update live info too and to undo inheritance transformations if they fail.

The biggest problem with PIC pseudo is in that it can not treated as the rest
of pseudos with live range splitting perspective because of possible creation
of memory with pic pseudo value.  It is hard to decide what pseudo (or hard
register) to use for such memory in a particular place as it can be in
different pseudos (hard registers) if we permit usual pseudo live range
splitting for pic pseudo.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (19 preceding siblings ...)
  2015-03-05 15:37 ` vmakarov at gcc dot gnu.org
@ 2015-03-05 20:01 ` law at redhat dot com
  2015-03-09 17:12 ` law at redhat dot com
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-05 20:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #22 from Jeffrey A. Law <law at redhat dot com> ---
Yes, absolutely, we have to compensate for the case where B clobbers something
useful, undoing transformations, etc.  It may not be reasonably feasible within
the current structure of LRA.

I'm getting pretty close on getting postreload-gcse.c to handle this.  Most of
the complexity has been in untangling the bits from gcse.c that I want to
re-use.

There's obviously cleanup still to do and some instrumentation/analysis of the
generated code on a wider codebase, but it looks promising.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (20 preceding siblings ...)
  2015-03-05 20:01 ` law at redhat dot com
@ 2015-03-09 17:12 ` law at redhat dot com
  2015-03-23  7:53 ` law at gcc dot gnu.org
  2015-03-23  7:54 ` law at redhat dot com
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-09 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #23 from Jeffrey A. Law <law at redhat dot com> ---
So with a functional prototype in place, I can get good code for this test.

However, deeper testing of that prototype is proving difficult simply because
the postreload-gcse.c code very rarely does anything useful, at least on x86 or
x86-64.

I put in some dumping in gcse_after_reload_main to always dump its statistics
if its expression table has > 0 elements.  Believe it or not, this *never*
triggers during a bootstrap on x86_64.  I was so amazed by this I fired up a VM
and bootstraped i686 as well, nope, never fires there either.  Concerned that
maybe my other changes were somehow getting in the way, I put in just the
dumping code, still doesn't fire.  Yet it fires semi regularly in the
testsuite.

The original author of the code in postreload-gcse.c was doing their
development on PPC boxes, so I'm going to grab one and see if I can get some
useful testing from on it.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (21 preceding siblings ...)
  2015-03-09 17:12 ` law at redhat dot com
@ 2015-03-23  7:53 ` law at gcc dot gnu.org
  2015-03-23  7:54 ` law at redhat dot com
  23 siblings, 0 replies; 25+ messages in thread
From: law at gcc dot gnu.org @ 2015-03-23  7:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #24 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Author: law
Date: Mon Mar 23 05:21:04 2015
New Revision: 221585

URL: https://gcc.gnu.org/viewcvs?rev=221585&root=gcc&view=rev
Log:
       PR rtl-optimization/64317
        * Makefile.in (OBJS): Add gcse-common.c
        * gcse.c: Include gcse-common.h
        (struct modify_pair_s): Move structure definition to gcse-common.h
        (compute_transp): Move function to gcse-common.c.
        (canon_list_insert): Similarly.
        (record_last_mem_set_info): Break out some code and put it into
        gcse-common.c.  Call into the new common code.
        (compute_local_properties): Pass additional arguments to
compute_transp.
        * postreload-gcse.c: Include gcse-common.h and df.h
        (modify_mem_list_set, blocks_with_calls): New variables.
        (modify_mem_list, canon_modify_mem_list, transp): Likewise.
        (get_bb_avail_insn): Pass in the expression index too.
        (alloc_mem): Allocate memory for the new bitmaps and lists.
        (free_mem): Free memory for the new bitmaps and lists.
        (insert_expr_in_table): Record a bitmap index for each entry we
        add to the table.
        (record_last_mem_set_info): Call into common code in gcse-common.c.
        (get_bb_avail_insn): If no available insn was found in the requested
        BB.  If BB has a single predecessor, see if the expression is
        transparent in BB and available in that single predecessor.
        (compute_expr_transp): New wrapper for compute_transp.
        (eliminate_partially_redundant_load): Pass expression's bitmap_index
        to get_bb_avail_insn.  Compute next_pred_bb_end a bit later.
        (gcse_after_reload_main): If there are elements in the hash table,
        then compute transparency for all the elements in the hash table.
        * gcse-common.h: New file.
        * gcse-common.c: New file.

Added:
    trunk/gcc/gcse-common.c
    trunk/gcc/gcse-common.h
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/gcse.c
    trunk/gcc/postreload-gcse.c


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
  2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
                   ` (22 preceding siblings ...)
  2015-03-23  7:53 ` law at gcc dot gnu.org
@ 2015-03-23  7:54 ` law at redhat dot com
  23 siblings, 0 replies; 25+ messages in thread
From: law at redhat dot com @ 2015-03-23  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #25 from Jeffrey A. Law <law at redhat dot com> ---
Unwanted loads have been eliminated by the postreload-gcse improvements.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-03-23  5:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-15 15:03 [Bug rtl-optimization/64317] New: [5 Regression] Ineffective allocation of PIC base register izamyatin at gmail dot com
2014-12-15 15:07 ` [Bug rtl-optimization/64317] " rguenth at gcc dot gnu.org
2015-01-13 10:26 ` rguenth at gcc dot gnu.org
2015-01-23 20:17 ` vmakarov at gcc dot gnu.org
2015-02-05 12:22 ` enkovich.gnu at gmail dot com
2015-02-06  9:52 ` jakub at gcc dot gnu.org
2015-02-06 10:06 ` enkovich.gnu at gmail dot com
2015-02-10 22:17 ` vmakarov at gcc dot gnu.org
2015-02-12 21:04 ` law at redhat dot com
2015-02-13  3:35 ` vmakarov at gcc dot gnu.org
2015-02-13  3:37 ` vmakarov at gcc dot gnu.org
2015-02-13  8:06 ` enkovich.gnu at gmail dot com
2015-02-19 13:18 ` law at redhat dot com
2015-02-19 16:21 ` vmakarov at gcc dot gnu.org
2015-02-20 23:24 ` law at redhat dot com
2015-02-21 20:10 ` vmakarov at gcc dot gnu.org
2015-02-27 23:07 ` vmakarov at gcc dot gnu.org
2015-03-03 19:08 ` law at redhat dot com
2015-03-03 22:39 ` law at redhat dot com
2015-03-04 17:39 ` law at redhat dot com
2015-03-05 15:37 ` vmakarov at gcc dot gnu.org
2015-03-05 20:01 ` law at redhat dot com
2015-03-09 17:12 ` law at redhat dot com
2015-03-23  7:53 ` law at gcc dot gnu.org
2015-03-23  7:54 ` law at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).