[Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
@ 2012-07-14 20:52 bfriesen at simple dot dallas.tx.us
  2012-07-14 20:56 ` [Bug c/53967] " bfriesen at simple dot dallas.tx.us
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 20:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

             Bug #: 53967
           Summary: GCC produces slow code for convolution algorithm with
                    -mfpmath=sse (the AMD_64 default)
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: bfriesen@simple.dallas.tx.us

Created attachment 27792
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27792
Convolution example C file, pre-processed version, build log, assembler output

The classic convolution algorithm (as implemented in GraphicsMagick) is
observed to run 2X slower with -mfpmath=sse than with -mfpmath=387. 
Unfortunately -mfpmath=sse is the default for -m64 builds on AMD_64 so this has
large impact for users.

Even with -mfpmath=387 other compilers (LLVM, Open64, and Oracle Studio)
produce faster code by default so some of these compilers are producing up to
3X better overall run-time performance and all of them are at least 2X faster
than the GCC default for x86-64.

This issue has been verified under Solaris 10, OpenIndiana, and Ubuntu Linux on
Opteron and several modern Xeon CPUs.

Please note that AMD Opteron 6200 family CPUs were not observed to suffer from
this issue.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
@ 2012-07-14 20:56 ` bfriesen at simple dot dallas.tx.us
  2012-07-14 20:57 ` bfriesen at simple dot dallas.tx.us
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 20:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #1 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:55:48 UTC ---
Created attachment 27793
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27793
Build log


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
  2012-07-14 20:56 ` [Bug c/53967] " bfriesen at simple dot dallas.tx.us
@ 2012-07-14 20:57 ` bfriesen at simple dot dallas.tx.us
  2012-07-14 20:58 ` bfriesen at simple dot dallas.tx.us
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 20:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #2 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:56:55 UTC ---
Created attachment 27794
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27794
Sample portable source file


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
  2012-07-14 20:56 ` [Bug c/53967] " bfriesen at simple dot dallas.tx.us
  2012-07-14 20:57 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-14 20:58 ` bfriesen at simple dot dallas.tx.us
  2012-07-14 20:59 ` bfriesen at simple dot dallas.tx.us
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 20:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #3 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:57:58 UTC ---
Created attachment 27795
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27795
Pre-processed source


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (2 preceding siblings ...)
  2012-07-14 20:58 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-14 20:59 ` bfriesen at simple dot dallas.tx.us
  2012-07-14 21:06 ` [Bug target/53967] " bfriesen at simple dot dallas.tx.us
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 20:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #4 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:58:59 UTC ---
Created attachment 27796
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27796
Generated assembler code


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (3 preceding siblings ...)
  2012-07-14 20:59 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-14 21:06 ` bfriesen at simple dot dallas.tx.us
  2012-07-14 21:42 ` bfriesen at simple dot dallas.tx.us
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 21:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #5 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:06:27 UTC ---
Please note that while I mentioned GCC 4.6.2, the same problem is also observed
with GCC 4.7.1.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (4 preceding siblings ...)
  2012-07-14 21:06 ` [Bug target/53967] " bfriesen at simple dot dallas.tx.us
@ 2012-07-14 21:42 ` bfriesen at simple dot dallas.tx.us
  2012-07-16 12:42 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-14 21:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #6 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:42:38 UTC ---
Created attachment 27797
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27797
Pre-processed GraphicsMagick source (effect.c).

In case the small sample (which only illustrates the core algorithm) does not
satisfy, I have attached a pre-processed version of the real GraphicsMagick
code with the performance issue.  Look for ConvolveImage().


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (5 preceding siblings ...)
  2012-07-14 21:42 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-16 12:42 ` rguenth at gcc dot gnu.org
  2012-07-16 14:17 ` bfriesen at simple dot dallas.tx.us
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-16 12:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2012-07-16
     Ever Confirmed|0                           |1

--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-16 12:42:15 UTC ---
What options do you use besides -march=corei7-avx?  The build-log does not
tell.
Did you try -march=corei7 instead of -march=corei7-avx?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (6 preceding siblings ...)
  2012-07-16 12:42 ` rguenth at gcc dot gnu.org
@ 2012-07-16 14:17 ` bfriesen at simple dot dallas.tx.us
  2012-07-16 14:57 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-16 14:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #8 from bfriesen at simple dot dallas.tx.us 2012-07-16 14:16:46 UTC ---
I used -march=native in this case.  It is interesting that this enabled AVX
(this particular CPU does support it).

To be clear, the problem also occurs with

-m64 -mtune=generic -march=x86-64 -mfpmath=sse

vs

-m64 -mtune=generic -march=x86-64 -mfpmath=387

and is also observed on a 5-year old Opteron.

With GCC 4.7.1, and for a specific application benchmark case and with generic
architecture and tuning, -mfpmath=387 produces 0.133 iter/s and -mfpmath=sse
produces 0.047 iter/s.  A different (non-GCC) compiler on the same system
produces 0.155 iter/s.

In the course of testing, I have indeed tried -march=corei7 and it did not
provide an improvement.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (7 preceding siblings ...)
  2012-07-16 14:17 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-16 14:57 ` rguenth at gcc dot gnu.org
  2012-07-16 15:35 ` bfriesen at simple dot dallas.tx.us
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-16 14:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-16 14:56:59 UTC ---
(In reply to comment #8)
> I used -march=native in this case.  It is interesting that this enabled AVX
> (this particular CPU does support it).
> 
> To be clear, the problem also occurs with
> 
> -m64 -mtune=generic -march=x86-64 -mfpmath=sse
> 
> vs
> 
> -m64 -mtune=generic -march=x86-64 -mfpmath=387
> 
> and is also observed on a 5-year old Opteron.
> 
> With GCC 4.7.1, and for a specific application benchmark case and with generic
> architecture and tuning, -mfpmath=387 produces 0.133 iter/s and -mfpmath=sse
> produces 0.047 iter/s.  A different (non-GCC) compiler on the same system
> produces 0.155 iter/s.
> 
> In the course of testing, I have indeed tried -march=corei7 and it did not
> provide an improvement.

What kind of optimization options are you using?  -O3?  Or are you really
using -O0 (aka nothing)?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (8 preceding siblings ...)
  2012-07-16 14:57 ` rguenth at gcc dot gnu.org
@ 2012-07-16 15:35 ` bfriesen at simple dot dallas.tx.us
  2012-07-16 15:41 ` bfriesen at simple dot dallas.tx.us
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-16 15:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #10 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:35:03 UTC ---
This particular application test was done with these options (i.e. -O2):

-m64 -mtune=generic -march=x86-64 -mfpmath=387 -O2

I have also tried -O3, with no positive benefit.

The Autoconf default is -O2 so that is what I generally test/tune the software
with. It is pretty rare to see additional benefit from -O3, although with some
versions of GCC I have seen application crashes due to wrong code from the tree
vectorizer.

Bob


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (9 preceding siblings ...)
  2012-07-16 15:35 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-16 15:41 ` bfriesen at simple dot dallas.tx.us
  2012-07-17  9:20 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-16 15:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #11 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:41:08 UTC ---
I just verified that -O3 produces similar timings to -O2 for both -mfpmath=387
and -mfpmath=sse


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (10 preceding siblings ...)
  2012-07-16 15:41 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-17  9:20 ` rguenth at gcc dot gnu.org
  2012-07-18  9:45 ` evstupac at gmail dot com
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-17  9:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (11 preceding siblings ...)
  2012-07-17  9:20 ` rguenth at gcc dot gnu.org
@ 2012-07-18  9:45 ` evstupac at gmail dot com
  2012-07-18 10:50 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: evstupac at gmail dot com @ 2012-07-18  9:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

Stupachenko Evgeny <evstupac at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |evstupac at gmail dot com

--- Comment #12 from Stupachenko Evgeny <evstupac at gmail dot com> 2012-07-18 09:45:15 UTC ---
I tried it at "-O2" and got low performance with -mfpmath=sse. It looks like it
is caused by register dependency (%xmm0) between:

addss    %xmm0, %xmm1
cvtsi2ss    %eax, %xmm0

Renaming %xmm0 in cvtsi2ss to another free register in all such cases resolves
the issue. 

Also you can try "-O2 -funroll-loops", which made "sse" code even faster and
and "-O2 -fschedule-insns" which significantly reduced performance loses in
"sse" case.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (12 preceding siblings ...)
  2012-07-18  9:45 ` evstupac at gmail dot com
@ 2012-07-18 10:50 ` rguenth at gcc dot gnu.org
  2012-07-18 14:28 ` bfriesen at simple dot dallas.tx.us
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-18 10:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-18 10:49:53 UTC ---
You can also try -frename-registers


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (13 preceding siblings ...)
  2012-07-18 10:50 ` rguenth at gcc dot gnu.org
@ 2012-07-18 14:28 ` bfriesen at simple dot dallas.tx.us
  2012-07-18 20:42 ` bfriesen at simple dot dallas.tx.us
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-18 14:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #14 from bfriesen at simple dot dallas.tx.us 2012-07-18 14:28:04 UTC ---
With

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -funroll-loops
-fschedule-insns

I see a whole-program performance jump from 0.047 iter/s to 0.156 iter/s (331%
boost).  That is huge!  Given the fundamental properties of this algorithm (the
image processing algorithm most often recommended to be moved to a GPU) the
world would be a better place if this performance was the normal case.

With

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -fschedule-insns

I see 0.101 iter/s

These must not be included in -O3 since

-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O3

produces only 0.048 iter/s


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (14 preceding siblings ...)
  2012-07-18 14:28 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-18 20:42 ` bfriesen at simple dot dallas.tx.us
  2012-07-19 14:29 ` bfriesen at simple dot dallas.tx.us
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-18 20:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #15 from bfriesen at simple dot dallas.tx.us 2012-07-18 20:42:22 UTC ---
Testing shows that using

-m64 -march=native -O2 -mfpmath=sse -frename-registers

is sufficient to restore good performance.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (15 preceding siblings ...)
  2012-07-18 20:42 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-19 14:29 ` bfriesen at simple dot dallas.tx.us
  2012-07-21  1:05 ` bfriesen at simple dot dallas.tx.us
  2012-08-12 15:41 ` xunxun1982 at gmail dot com
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-19 14:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #16 from bfriesen at simple dot dallas.tx.us 2012-07-19 14:29:10 UTC ---
Is there a way that I can selectively apply the -frename-registers fix to
functions which benefit from it in order to work around the bug until the fix
is widely available?  I tried

#pragma GCC optimize ("O3,rename-registers")

and

#pragma GCC optimize ("rename-registers")

as well as the function attribute equivalent and there was no effect.  GCC
seems to ignore the request.

I did find another somewhat similar function which benefited significantly from
-frename-registers.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (16 preceding siblings ...)
  2012-07-19 14:29 ` bfriesen at simple dot dallas.tx.us
@ 2012-07-21  1:05 ` bfriesen at simple dot dallas.tx.us
  2012-08-12 15:41 ` xunxun1982 at gmail dot com
  18 siblings, 0 replies; 20+ messages in thread
From: bfriesen at simple dot dallas.tx.us @ 2012-07-21  1:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #17 from bfriesen at simple dot dallas.tx.us 2012-07-21 01:04:55 UTC ---
I discovered that GCC's __attribute__((__optimize__())) and optimization
pragmas do not work for OpenMP code because OpenMP uses a different function
name for the actual working code.  This makes it much more painful to work
around this bug.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)
  2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
                   ` (17 preceding siblings ...)
  2012-07-21  1:05 ` bfriesen at simple dot dallas.tx.us
@ 2012-08-12 15:41 ` xunxun1982 at gmail dot com
  18 siblings, 0 replies; 20+ messages in thread
From: xunxun1982 at gmail dot com @ 2012-08-12 15:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

--- Comment #18 from xunxun <xunxun1982 at gmail dot com> 2012-08-12 15:41:35 UTC ---
Is the bug related with PR19780?


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-08-12 15:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-14 20:52 [Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) bfriesen at simple dot dallas.tx.us
2012-07-14 20:56 ` [Bug c/53967] " bfriesen at simple dot dallas.tx.us
2012-07-14 20:57 ` bfriesen at simple dot dallas.tx.us
2012-07-14 20:58 ` bfriesen at simple dot dallas.tx.us
2012-07-14 20:59 ` bfriesen at simple dot dallas.tx.us
2012-07-14 21:06 ` [Bug target/53967] " bfriesen at simple dot dallas.tx.us
2012-07-14 21:42 ` bfriesen at simple dot dallas.tx.us
2012-07-16 12:42 ` rguenth at gcc dot gnu.org
2012-07-16 14:17 ` bfriesen at simple dot dallas.tx.us
2012-07-16 14:57 ` rguenth at gcc dot gnu.org
2012-07-16 15:35 ` bfriesen at simple dot dallas.tx.us
2012-07-16 15:41 ` bfriesen at simple dot dallas.tx.us
2012-07-17  9:20 ` rguenth at gcc dot gnu.org
2012-07-18  9:45 ` evstupac at gmail dot com
2012-07-18 10:50 ` rguenth at gcc dot gnu.org
2012-07-18 14:28 ` bfriesen at simple dot dallas.tx.us
2012-07-18 20:42 ` bfriesen at simple dot dallas.tx.us
2012-07-19 14:29 ` bfriesen at simple dot dallas.tx.us
2012-07-21  1:05 ` bfriesen at simple dot dallas.tx.us
2012-08-12 15:41 ` xunxun1982 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).