[Bug c/52056] New: Code optimization sensitive to trivial changes

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/52056] New: Code optimization sensitive to trivial changes
@ 2012-01-30 20:02 gccbug at jamasaru dot com
  2012-01-30 23:17 ` [Bug c/52056] " gccbug at jamasaru dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gccbug at jamasaru dot com @ 2012-01-30 20:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

             Bug #: 52056
           Summary: Code optimization sensitive to trivial changes
    Classification: Unclassified
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: gccbug@jamasaru.com

GCC 4.6.1 (also in at least 4.5 and 4.4), 64 bit Ubuntu.

The GCC optimizer detects and significantly speeds up some simple inner-loop
computations, but small changes to those inner loops can completely change the
behavior of the optimizer, losing 50% or more of the code speed.

Example code, simplified down from a hash table computation:

#include <stdio.h>
const int UNROLL=8;
int main()
{
  // static keyword in next line can trigger optimizer behavior!
  static unsigned long accum=0;
  unsigned int sum=0;  

  for (long j=0; j<0x400000000/UNROLL; ++j)            
    for (int unroll=0; unroll<UNROLL; ++unroll)    {
      unsigned long j;
      ++accum;
      // unsigned versus signed shift in next line can trigger optimizer
      j=(accum^(((unsigned long)accum)>>22));

      j*=0x6543210123456789;
      sum+=(unsigned int)(j>>32);
    }
  printf("Sum=%d\n", sum);
}

compile with  gcc -O3 test.c -o test

The inner loop of shifts, xors, mults, and adds is the core computation. The
GCC optimizer is very sensitive to whether the shift above is made with a
SIGNED or UNSIGNED long.   It is also sensitive to whether the "accum" variable
is static or not.  

Some timings on a i7 980X CPU:

Static accum, signed shift:   11.3 seconds
Static accum, unsigned shift: 24.3 seconds
Local accum, signed shift:    12.1 seconds
Local accum, unsigned shift:  14.4 seconds

Looking at the assembly -S output, it's clear that the optimizations are valid
and effective, with the dramatic speed increase coming from switching over to
SSE math.  The output is correct in all 4 cases.
The only problem is that the optimizer is unable to recognize the same
opportunity in all four cases, giving inconsistent performance.

This example is simplified down from the inner loops of some of my code
involving Monte Carlo simulation, and has a significant affect on runtime.  The
Intel ICC compiler is significantly slower than GCC in most cases on my code.
For this specific example, ICC has an execution time of 14.3 seconds for all 4
cases. That 25% speed advantage of GCC is huge to us when we have 500 machines
in a cluster running simulations for days!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c/52056] Code optimization sensitive to trivial changes
  2012-01-30 20:02 [Bug c/52056] New: Code optimization sensitive to trivial changes gccbug at jamasaru dot com
@ 2012-01-30 23:17 ` gccbug at jamasaru dot com
  2012-01-31  1:04 ` [Bug middle-end/52056] " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: gccbug at jamasaru dot com @ 2012-01-30 23:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

--- Comment #1 from gccbug at jamasaru dot com 2012-01-30 22:34:37 UTC ---
While not relevant to gcc itself, it is interesting that clang also has trouble
with consistently identifying this optimization, but in an opposite way to GCC.
For clang, the unsigned shift code is faster (11.9 seconds) compared to clang's
signed shift (13.3 seconds.)   GCC's speeds are 24.3 and 11.3 seconds
respectively.   clang does not have any sensitivity to the static variable
declaration.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/52056] Code optimization sensitive to trivial changes
  2012-01-30 20:02 [Bug c/52056] New: Code optimization sensitive to trivial changes gccbug at jamasaru dot com
  2012-01-30 23:17 ` [Bug c/52056] " gccbug at jamasaru dot com
@ 2012-01-31  1:04 ` jakub at gcc dot gnu.org
  2012-01-31 11:40 ` [Bug tree-optimization/52056] Vectorizer cost model is imprecise rguenth at gcc dot gnu.org
  2012-07-13  8:52 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-01-31  1:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-01-30 23:16:03 UTC ---
The signed vs. unsigned long right shift is quite significant, because Intel
chips don't support signed quadword right shifts, only unsigned quadword right
shifts (and left shifts), except that AMD chips with -mxop do support that.
So, with the unsigned long right shift the loop is vectorized, while with
signed long right shift it is not, and clearly in this case the vectorization
(at least two elements at a time) isn't beneficial, but the cost model doesn't
figure that out.  So the faster times are without vectorization, you can get
the same speed with -O3 -fno-tree-vectorize even with the unsigned shift.
Even AVX can't process more than two elements at a time, only AVX2 will be
able, how fast is that loop on AVX2 capable chips compared to non-vectorized
remains to be seen.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/52056] Vectorizer cost model is imprecise
  2012-01-30 20:02 [Bug c/52056] New: Code optimization sensitive to trivial changes gccbug at jamasaru dot com
  2012-01-30 23:17 ` [Bug c/52056] " gccbug at jamasaru dot com
  2012-01-31  1:04 ` [Bug middle-end/52056] " jakub at gcc dot gnu.org
@ 2012-01-31 11:40 ` rguenth at gcc dot gnu.org
  2012-07-13  8:52 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-01-31 11:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-01-31
                 CC|                            |rguenth at gcc dot gnu.org
          Component|middle-end                  |tree-optimization
            Summary|Code optimization sensitive |Vectorizer cost model is
                   |to trivial changes          |imprecise
     Ever Confirmed|0                           |1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/52056] Vectorizer cost model is imprecise
  2012-01-30 20:02 [Bug c/52056] New: Code optimization sensitive to trivial changes gccbug at jamasaru dot com
                   ` (2 preceding siblings ...)
  2012-01-31 11:40 ` [Bug tree-optimization/52056] Vectorizer cost model is imprecise rguenth at gcc dot gnu.org
@ 2012-07-13  8:52 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13  8:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52056

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:52:06 UTC ---
Link to vectorizer missed-optimization meta-bug.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-07-13  8:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-30 20:02 [Bug c/52056] New: Code optimization sensitive to trivial changes gccbug at jamasaru dot com
2012-01-30 23:17 ` [Bug c/52056] " gccbug at jamasaru dot com
2012-01-31  1:04 ` [Bug middle-end/52056] " jakub at gcc dot gnu.org
2012-01-31 11:40 ` [Bug tree-optimization/52056] Vectorizer cost model is imprecise rguenth at gcc dot gnu.org
2012-07-13  8:52 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).