[Bug rtl-optimization/41171] New: register allocator undoing optimal schedule

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/41171]  New: register allocator undoing optimal schedule
@ 2009-08-25 21:48 TabonyEE at austin dot rr dot com
  2009-08-25 23:06 ` [Bug rtl-optimization/41171] " nemet at gcc dot gnu dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: TabonyEE at austin dot rr dot com @ 2009-08-25 21:48 UTC (permalink / raw)
  To: gcc-bugs

When I compile the following code

void f(int *x, int *y){
  *x = 7;
  *y = 4;
}

at -O2 for Itanium, I get the following assembly:

f:
        .prologue
        .body
        .mmi
        addl r14 = 7, r0
        ;;
        st4 [r32] = r14
        addl r14 = 4, r0
        ;;
        .mib
        st4 [r33] = r14
        nop 0
        br.ret.sptk.many b0
        .endp f#

The expected output is

f:
        .prologue
        .body
        .mii
        addl r14 = 7, r0
        addl r15 = 4, r0
        ;;
        nop 0
        .mmb
        st4 [r32] = r14
        st4 [r33] = r15
        br.ret.sptk.many b0
        .endp f#

In the .sched1 dump, I see the expected schedule:

;;        0-->     7 r341=0x7                          :2_A
;;        0-->     9 r342=0x4                          :2_A
;;        1-->     8 [in0]=r341                        :2_M_only_um23
;;        1-->    10 [in1]=r342                        :2_M_only_um23
;;   total time = 1

but in the .ira dump, the RTL has reverted back to the serial code.  Because of
the anti-dependency introduced by register allocation, the .mach dump shows an
inferior schedule:

;;        0-->     7 r14=0x7                           :2_A
;;        1-->     8 [r32]=r14                         :2_M_only_um23
;;        1-->    21 r14=0x4                           :2_A
;;        2-->    10 [r33]=r14                         :2_M_only_um23
;;        2-->    25 {return;use b0;}                  :2_B
;;   total time = 2

In GCC 4.3.2 and 3.4.6, I see the lreg pass likewise creating an inferior
schedule.  However, for PowerPC, MIPS, ARM, and FR-V, GCC 4.3.2 leaves the
initial schedule intact, whereas GCC 4.4.0 changes the order of insns in the
IRA pass for all targets.

For targets other than Itanium, I'm not sure this transformation in IRA
degrades performance, and it reduces register pressure, so it seems like a
positive change.  For Itanium, this degrades performance.  What's odd is that
the Itanium port had this behavior prior to GCC 4.4.0, while other ports did
not.  Is there some set of machine-specific parameters that the Itanium port
could tune to prevent this transformation in IRA (hopefully without degrading
performance elsewhere)?

-- 
           Summary: register allocator undoing optimal schedule
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: TabonyEE at austin dot rr dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: ia64-elf

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
@ 2009-08-25 23:06 ` nemet at gcc dot gnu dot org
  2009-08-26 13:44 ` bergner at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: nemet at gcc dot gnu dot org @ 2009-08-25 23:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from nemet at gcc dot gnu dot org  2009-08-25 23:05 -------
It's also a regression if the load-immediates can dual issue.  I.e. see below
how the code gets worse between sched1 and sched2 on octeon.

I am cc'ing Vlad in case he has some ideas.



;;   ======================================================
;;   -- basic block 2 from 7 to 10 -- before reload
;;   ======================================================

;;        0-->     7 r195=0x7                         
:octeon_pipe0|octeon_pipe1
;;        0-->     9 r196=0x4                         
:octeon_pipe0|octeon_pipe1
;;        1-->     8 [$4]=r195                         :octeon_pipe0
;;        2-->    10 [$5]=r196                         :octeon_pipe0
;;      Ready list (final):  
;;   total time = 2
;;   new head = 7
;;   new tail = 10


;; Procedure interblock/speculative motions == 0/0 


;;   ======================================================
;;   -- basic block 2 from 7 to 18 -- after reload
;;   ======================================================

;;        0-->     7 $2=0x7                           
:octeon_pipe0|octeon_pipe1
;;        1-->     8 [$4]=$2                           :octeon_pipe0
;;        1-->    14 $2=0x4                           
:octeon_pipe0|octeon_pipe1
;;        2-->    10 [$5]=$2                           :octeon_pipe0
;;        3-->    18 return                            :octeon_pipe0
;;      Ready list (final):  
;;   total time = 3
;;   new head = 7
;;   new tail = 18


-- 

nemet at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu dot
                   |                            |org, nemet at gcc dot gnu
                   |                            |dot org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-08-25 23:05:53
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
  2009-08-25 23:06 ` [Bug rtl-optimization/41171] " nemet at gcc dot gnu dot org
@ 2009-08-26 13:44 ` bergner at gcc dot gnu dot org
  2009-08-26 15:14 ` bergner at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-08-26 13:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from bergner at gcc dot gnu dot org  2009-08-26 13:44 -------
The problem here, is that for some reason, IRA is spilling the two pseudos in
the test case, even though it seems it should be trivial.  Looking deeper.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
  2009-08-25 23:06 ` [Bug rtl-optimization/41171] " nemet at gcc dot gnu dot org
  2009-08-26 13:44 ` bergner at gcc dot gnu dot org
@ 2009-08-26 15:14 ` bergner at gcc dot gnu dot org
  2009-08-26 15:22 ` bergner at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-08-26 15:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from bergner at gcc dot gnu dot org  2009-08-26 15:14 -------
Actually, they're already reordered by the time we call ira_color and the ira
dumps shows that:

;; Function f (f)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
scanning new insn with uid = 14.
verify found no changes in insn with uid = 14.
deleting insn with uid = 9.
Building IRA IR
...

Giving us:

(insn 7 4 8 2 pr41171.c:4 (set (reg:SI 121)
        (const_int 7 [0x7])) 344 {*movsi_internal1} (expr_list:REG_EQUIV
(const_int 7 [0x7])
        (nil)))

(insn 8 7 14 2 pr41171.c:4 (set (mem:SI (reg:SI 3 3 [ x ]) [2 S4 A32])
        (reg:SI 121)) 344 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 121)
        (expr_list:REG_DEAD (reg:SI 3 3 [ x ])
            (expr_list:REG_EQUAL (const_int 7 [0x7])
                (nil)))))

(insn 14 8 10 2 pr41171.c:5 (set (reg:SI 122)
        (const_int 4 [0x4])) 344 {*movsi_internal1} (expr_list:REG_EQUIV
(const_int 4 [0x4])
        (nil)))

(insn 10 14 13 2 pr41171.c:5 (set (mem:SI (reg:SI 4 4 [ y ]) [2 S4 A32])
        (reg:SI 122)) 344 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 122)
        (expr_list:REG_DEAD (reg:SI 4 4 [ y ])
            (expr_list:REG_EQUAL (const_int 4 [0x4])
                (nil)))))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
                   ` (2 preceding siblings ...)
  2009-08-26 15:14 ` bergner at gcc dot gnu dot org
@ 2009-08-26 15:22 ` bergner at gcc dot gnu dot org
  2009-08-26 20:57 ` bergner at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-08-26 15:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from bergner at gcc dot gnu dot org  2009-08-26 15:22 -------
It's update_equiv_regs() that is causing this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
                   ` (3 preceding siblings ...)
  2009-08-26 15:22 ` bergner at gcc dot gnu dot org
@ 2009-08-26 20:57 ` bergner at gcc dot gnu dot org
  2009-09-02 21:48 ` bergner at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-08-26 20:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from bergner at gcc dot gnu dot org  2009-08-26 20:57 -------
>From my bug analysis and request for comment on the mailinglist:

  http://gcc.gnu.org/ml/gcc/2009-08/msg00485.html


This is caused by update_equiv_regs() which IRA inherited from local-alloc.c.
Although with gcc 4.3 and earlier, you don't see the problem, it is still
there,
because if you look at the 4.3 dumps, you will see that update_equiv_regs()
unordered them for us.  What is saving us is that sched2 reschedules them
again for us in the order we want.  With 4.4, IRA happens to reuse the same
register for both pseudos, so sched2 is hand tied and cannot schedule them
back again for us.

Looking at update_equiv_regs(), if I disable the replacement for regs
that are local to one basic block (patch below) like it existed before
John Wehle's patch way back in Oct 2000:

  http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html

then we get the ordering we want.  Does anyone know why John removed
that part of the test in his patch?  Thoughts anyone?


Peter


Index: ira.c
===================================================================
--- ira.c       (revision 151111)
+++ ira.c       (working copy)
@@ -2510,6 +2510,7 @@ update_equiv_regs (void)
                     calls.  */

                  if (REG_N_REFS (regno) == 2
+                     && REG_BASIC_BLOCK (regno) < NUM_FIXED_BLOCKS
                      && (rtx_equal_p (x, src)
                          || ! equiv_init_varies_p (src))
                      && NONJUMP_INSN_P (insn)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
                   ` (4 preceding siblings ...)
  2009-08-26 20:57 ` bergner at gcc dot gnu dot org
@ 2009-09-02 21:48 ` bergner at gcc dot gnu dot org
  2009-10-30 21:01 ` sje at cup dot hp dot com
  2009-10-30 21:57 ` vmakarov at redhat dot com
  7 siblings, 0 replies; 10+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-09-02 21:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from bergner at gcc dot gnu dot org  2009-09-02 21:47 -------
My patch solved the problem, but was very very neutral wrt SPEC2000 scores. 
Vlad's idea of moving update_equiv_regs() into its own pass before sched1 makes
sense to me and seems to produce better performing code too, so his patch wins.
:)  Thanks Vlad.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
                   ` (5 preceding siblings ...)
  2009-09-02 21:48 ` bergner at gcc dot gnu dot org
@ 2009-10-30 21:01 ` sje at cup dot hp dot com
  2009-10-30 21:57 ` vmakarov at redhat dot com
  7 siblings, 0 replies; 10+ messages in thread
From: sje at cup dot hp dot com @ 2009-10-30 21:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from sje at cup dot hp dot com  2009-10-30 21:00 -------
Has a patch to move update_equiv_regs into its own pass been submitted?
I don't see one and IA64 is still producing the 'wrong' scheduling.


-- 

sje at cup dot hp dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sje at cup dot hp dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
  2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
                   ` (6 preceding siblings ...)
  2009-10-30 21:01 ` sje at cup dot hp dot com
@ 2009-10-30 21:57 ` vmakarov at redhat dot com
  7 siblings, 0 replies; 10+ messages in thread
From: vmakarov at redhat dot com @ 2009-10-30 21:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from vmakarov at redhat dot com  2009-10-30 21:57 -------
Unfortunately, not yet because I had some failures after applying the patch. I
postponed work on this but now I have time to continue the work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/41171] register allocator undoing optimal schedule
       [not found] <bug-41171-4@http.gcc.gnu.org/bugzilla/>
@ 2014-01-01 20:55 ` steven at gcc dot gnu.org
  0 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2014-01-01 20:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #9 from Steven Bosscher <steven at gcc dot gnu.org> ---
(In reply to Peter Bergner from comment #5)
> Looking at update_equiv_regs(), if I disable the replacement for regs
> that are local to one basic block (patch below) like it existed before
> John Wehle's patch way back in Oct 2000:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html
> 
> then we get the ordering we want.  Does anyone know why John removed
> that part of the test in his patch?  Thoughts anyone?

To allow things to be moved around in, or out of loops.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-01 20:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-25 21:48 [Bug rtl-optimization/41171] New: register allocator undoing optimal schedule TabonyEE at austin dot rr dot com
2009-08-25 23:06 ` [Bug rtl-optimization/41171] " nemet at gcc dot gnu dot org
2009-08-26 13:44 ` bergner at gcc dot gnu dot org
2009-08-26 15:14 ` bergner at gcc dot gnu dot org
2009-08-26 15:22 ` bergner at gcc dot gnu dot org
2009-08-26 20:57 ` bergner at gcc dot gnu dot org
2009-09-02 21:48 ` bergner at gcc dot gnu dot org
2009-10-30 21:01 ` sje at cup dot hp dot com
2009-10-30 21:57 ` vmakarov at redhat dot com
     [not found] <bug-41171-4@http.gcc.gnu.org/bugzilla/>
2014-01-01 20:55 ` steven at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).