[Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted
@ 2010-12-19  2:43 schnetter at gmail dot com
  2010-12-28 14:51 ` [Bug rtl-optimization/47010] " rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: schnetter at gmail dot com @ 2010-12-19  2:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47010

           Summary: Missed optimization: x86-64 prologue not deleted
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: schnetter@gmail.com


Created attachment 22818
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22818
pre-processed bzipped source code

The following code is generated by g++ 4.5.1 on an x86-64 architecture (Mac OS
10.6). This is a static function where g++ may even have modified the argument
list. I believe the three instructions "pushq", "movq", and "leave" are not
necessary. This routine is called in a compute-intensive inner loop that has
problems fitting into the level 1 instruction cache.

The disassembled routine is:

__ZL20PDstandardNth11_implPKdll.clone.1:
0000000000000140        pushq   %rbp
0000000000000141        movupd  0x10(%rdi),%xmm3
0000000000000146        movupd  0xf0(%rdi),%xmm0
000000000000014b        movupd  0x08(%rdi),%xmm2
0000000000000150        addpd   %xmm3,%xmm0
0000000000000154        movupd  0xf8(%rdi),%xmm1
0000000000000159        movq    %rsp,%rbp
000000000000015c        addpd   %xmm2,%xmm1
0000000000000160        mulpd   0x000a0578(%rip),%xmm1
0000000000000168        addpd   %xmm0,%xmm1
000000000000016c        movupd  (%rdi),%xmm0
0000000000000170        mulpd   0x000a0578(%rip),%xmm0
0000000000000178        leave
0000000000000179        addpd   %xmm1,%xmm0
000000000000017d        ret

The original function is defined as:

static CCTK_REAL_VEC PDstandardNth11_impl(CCTK_REAL const* restrict const u,
ptrdiff_t const dj, ptrdiff_t const dk) __attribute__((pure))
__attribute__((noinline)) __attribute__((unused));

static CCTK_REAL_VEC PDstandardNth11_impl(CCTK_REAL const* restrict const u,
ptrdiff_t const dj, ptrdiff_t const dk)
{ return
kmadd(ToReal(30),vec_loadu_maybe3(0,0,0,(u)[(0)+dj*(0)+dk*(0)]),kmadd(ToReal(-16),kadd(vec_loadu_maybe3(-1,0,0,(u)[(-1)+dj*(0)+dk*(0)]),vec_loadu_maybe3(1,0,0,(u)[(1)+dj*(0)+dk*(0)])),kadd(vec_loadu_maybe3(-2,0,0,(u)[(-2)+dj*(0)+dk*(0)]),vec_loadu_maybe3(2,0,0,(u)[(2)+dj*(0)+dk*(0)]))));
}

where CCTK_REAL is double, and CCTK_REAL_VEC is __m128d, the SSE2 vector of
doubles. The function body contains macros that translate directly to Intel
SSE2 vector instructions.

The code was compiled with gcc 4.5.1 with the options

g++-mp-4.5 -g3 -m128bit-long-double -march=native -std=gnu++0x -O3
-funsafe-loop-optimizations -fsee -ftree-loop-linear -ftree-loop-im -fivopts
-fvect-cost-model -funroll-loops -funroll-all-loops
-fvariable-expansion-in-unroller -fprefetch-loop-arrays -ffast-math
-fassociative-math -freciprocal-math -fno-trapping-math -fexcess-precision=fast
-fopenmp -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align
-Woverloaded-virtual 

I attach the complete pre-processed and bzipped source code. The source code
itself is auto-generated.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug rtl-optimization/47010] Missed optimization: x86-64 prologue not deleted
  2010-12-19  2:43 [Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted schnetter at gmail dot com
@ 2010-12-28 14:51 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-28 14:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47010

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-28 14:50:51 UTC ---
I think it sets up a frame to have possible spills of xmm registers land in
aligned stack slots.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-12-28 14:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-19  2:43 [Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted schnetter at gmail dot com
2010-12-28 14:51 ` [Bug rtl-optimization/47010] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).