[Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves
@ 2011-09-22 10:26 kirill.yukhin at intel dot com
  2011-09-22 10:29 ` [Bug tree-optimization/50480] " kirill.yukhin at intel dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-09-22 10:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

             Bug #: 50480
           Summary: 10% performance regression on Spec2006 410.bwaves
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: kirill.yukhin@intel.com


Hi,
Recently Richard fixed this http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49957
According to my measurements, fix for that bug caused (on Spec2006):

For SandyBride CPU:
* 410.bwaves degradation is -9.54% for peak32
* 410.bwaves degradation is -6.91% for base32
* 410.bwaves improvement is 1.00% for peak64
* 410.bwaves improvement is 0.91% 3or base64

For Corei7 CPU:
* 410.bwaves degradation is -3.91% for peak32
* 410.bwaves degradation is -3.91% for base32
* 410.bwaves improvement is 1.94% for peak64
* 410.bwaves improvement is 3.23% 3or base64

For AMD (Phenom(tm) II X3 B75) CPU:
* 410.bwaves degradation is -7.32% for peak32
* 410.bwaves degradation is -6.56% for base32
* 410.bwaves improvement is 2.01% for peak64
* 410.bwaves degradation is -1.34% 3or base64


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
@ 2011-09-22 10:29 ` kirill.yukhin at intel dot com
  2011-09-22 10:49 ` kirill.yukhin at intel dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-09-22 10:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #1 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-09-22 10:00:34 UTC ---
Checkin URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177368


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
  2011-09-22 10:29 ` [Bug tree-optimization/50480] " kirill.yukhin at intel dot com
@ 2011-09-22 10:49 ` kirill.yukhin at intel dot com
  2011-09-25 11:59 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-09-22 10:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #2 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-09-22 10:33:06 UTC ---
Here is optset details:
base=-static -O2 -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32 bit mode)
peak=-static -O3 -funroll-loops -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32
bit mode)

For SandyBridge: += "-mavx -march=corei7" 
For Core i7: += "-march=corei7" 
For AMD: += "-march=amdfam10" (not sure this is the best)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
  2011-09-22 10:29 ` [Bug tree-optimization/50480] " kirill.yukhin at intel dot com
  2011-09-22 10:49 ` kirill.yukhin at intel dot com
@ 2011-09-25 11:59 ` rguenth at gcc dot gnu.org
  2011-09-27  8:49 ` kirill.yukhin at intel dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-09-25 11:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |i?86-*-*

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-09-25 11:57:42 UTC ---
For 32bit only it seems.  Supposedly a cost model issue, the register pressure
will be higher and we have only half the number of SSE regs.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
                   ` (2 preceding siblings ...)
  2011-09-25 11:59 ` rguenth at gcc dot gnu.org
@ 2011-09-27  8:49 ` kirill.yukhin at intel dot com
  2011-09-27  9:00 ` rguenther at suse dot de
  2012-04-20 23:29 ` meissner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: kirill.yukhin at intel dot com @ 2011-09-27  8:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #4 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-09-27 08:31:35 UTC ---
(In reply to comment #3)
> For 32bit only it seems.  Supposedly a cost model issue, the register pressure
> will be higher and we have only half the number of SSE regs.

Richard, what's wrong maybe with cost model? If you're increasing liverange and
you have not as much registers (32-bit case), obviously register pressure will
increase and degrade performance. But again, how it is connected with cost
model?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
                   ` (3 preceding siblings ...)
  2011-09-27  8:49 ` kirill.yukhin at intel dot com
@ 2011-09-27  9:00 ` rguenther at suse dot de
  2012-04-20 23:29 ` meissner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2011-09-27  9:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> 2011-09-27 08:57:33 UTC ---
On Tue, 27 Sep 2011, kirill.yukhin at intel dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480
> 
> --- Comment #4 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-09-27 08:31:35 UTC ---
> (In reply to comment #3)
> > For 32bit only it seems.  Supposedly a cost model issue, the register pressure
> > will be higher and we have only half the number of SSE regs.
> 
> Richard, what's wrong maybe with cost model? If you're increasing liverange and
> you have not as much registers (32-bit case), obviously register pressure will
> increase and degrade performance. But again, how it is connected with cost
> model?

It's connected to the cost model not modeling the whole vectorized
loop but only vectorized statements.  So it can't possibly catch
this case.

I thought of moving more of the cost model details to the target by
allowing the target to track the complete loop, like with

void * targetm.vectorizer.cost_model_start_loop (struct loop *);
targetm.vectorizer.cost_model_stmt (void *, gimple);
unsigned targetm.vectorizer.cost_model_finish_loop (void *);

where the latter would return a cost for the vectorized loop.

We'd need that to model things like PPC having imbalanced resources
for some kind of vectorizations as well (shift takes up much
resources, so you need other stmts to compensate for it).

Richard.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/50480] 10% performance regression on Spec2006 410.bwaves
  2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
                   ` (4 preceding siblings ...)
  2011-09-27  9:00 ` rguenther at suse dot de
@ 2012-04-20 23:29 ` meissner at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: meissner at gcc dot gnu.org @ 2012-04-20 23:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480

--- Comment #6 from Michael Meissner <meissner at gcc dot gnu.org> 2012-04-20 23:17:44 UTC ---
Created attachment 27206
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27206
ivtops dump from subversion id 183934 (after regression)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-04-20 23:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-22 10:26 [Bug tree-optimization/50480] New: 10% performance regression on Spec2006 410.bwaves kirill.yukhin at intel dot com
2011-09-22 10:29 ` [Bug tree-optimization/50480] " kirill.yukhin at intel dot com
2011-09-22 10:49 ` kirill.yukhin at intel dot com
2011-09-25 11:59 ` rguenth at gcc dot gnu.org
2011-09-27  8:49 ` kirill.yukhin at intel dot com
2011-09-27  9:00 ` rguenther at suse dot de
2012-04-20 23:29 ` meissner at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).