public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/31704]  New: x86_64 poor floating point register allocation across function call
@ 2007-04-25 15:09 ian at airs dot com
  2007-04-26  9:23 ` [Bug rtl-optimization/31704] " rguenth at gcc dot gnu dot org
  2008-02-07 16:37 ` hubicka at gcc dot gnu dot org
  0 siblings, 2 replies; 3+ messages in thread
From: ian at airs dot com @ 2007-04-25 15:09 UTC (permalink / raw)
  To: gcc-bugs

When I compile this test case with -O2 for x86_64:

extern void g (void);
float
f (float sum, float mult, int *pi)
{
  int i, j;
  for (i = 0; i < 10; ++i)
    {
      g ();
      for (j = 0; j < 1000; ++j)
        sum += *pi++ * mult;
    }
  return sum;
}

I get this result:

f:
.LFB2:
        pushq   %rbp
.LCFI0:
        movaps  %xmm0, %xmm2
        xorl    %ebp, %ebp
        pushq   %rbx
.LCFI1:
        movq    %rdi, %rbx
        subq    $40, %rsp
.LCFI2:
        movss   %xmm1, 28(%rsp)
.L2:
        movss   %xmm2, (%rsp)
        call    g
        cvtsi2ss        (%rbx), %xmm0
        leaq    4(%rbx), %rax
        movl    $1, %edx
        movss   (%rsp), %xmm2
        mulss   28(%rsp), %xmm0
        addss   %xmm0, %xmm2
        .p2align 4,,7
.L3:
        cvtsi2ss        (%rax), %xmm1
        addl    $1, %edx
        addq    $4, %rax
        cmpl    $1000, %edx
        mulss   28(%rsp), %xmm1
        addss   %xmm1, %xmm2
        jne     .L3
        addl    $1, %ebp
        addq    $4000, %rbx
        cmpl    $10, %ebp
        jne     .L2
        addq    $40, %rsp
        movaps  %xmm2, %xmm0
        popq    %rbx
        popq    %rbp
        ret

In the original code, the inner loop is performance critical.  Note that this
compiles into a mulss loading a value from memory.  It would be more efficient
to have the value in a register during the inner loop.  In fact the value was
in a register, but we stored it in the stack because it crossed the function
call, and we load it from the stack once for each inner loop iteration rather
than once for each outer loop iteration.

I don't see a simple approach to fixing this.  Some sort of live range
splitting might work.


-- 
           Summary: x86_64 poor floating point register allocation across
                    function call
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ian at airs dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31704


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/31704] x86_64 poor floating point register allocation across function call
  2007-04-25 15:09 [Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call ian at airs dot com
@ 2007-04-26  9:23 ` rguenth at gcc dot gnu dot org
  2008-02-07 16:37 ` hubicka at gcc dot gnu dot org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-04-26  9:23 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization, ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31704


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/31704] x86_64 poor floating point register allocation across function call
  2007-04-25 15:09 [Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call ian at airs dot com
  2007-04-26  9:23 ` [Bug rtl-optimization/31704] " rguenth at gcc dot gnu dot org
@ 2008-02-07 16:37 ` hubicka at gcc dot gnu dot org
  1 sibling, 0 replies; 3+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2008-02-07 16:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from hubicka at gcc dot gnu dot org  2008-02-07 16:36 -------
This is fixed by the call frequency patch on mainline.
.L2:
        cvtsi2ss        (%ebx,%eax,4), %xmm0
        addl    $1, %eax
        cmpl    $1000, %eax
        mulss   %xmm2, %xmm0
        addss   %xmm0, %xmm1
        jne     .L2

(on i386, but x86-64 behaves same way)

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31704


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-02-07 16:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-25 15:09 [Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call ian at airs dot com
2007-04-26  9:23 ` [Bug rtl-optimization/31704] " rguenth at gcc dot gnu dot org
2008-02-07 16:37 ` hubicka at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).