public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/45685]  New: GCC optimizer for Intel x64 generates inefficient code
@ 2010-09-16  1:17 ekuznetsov at divxcorp dot com
  2010-09-16  1:19 ` [Bug rtl-optimization/45685] " ekuznetsov at divxcorp dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: ekuznetsov at divxcorp dot com @ 2010-09-16  1:17 UTC (permalink / raw)
  To: gcc-bugs

I've attached two copies of a simple function. They are identical except for
the type of the internal variable (one uses 'int64_t', the other uses 'int').
When compiled with GCC 4.4.3 on a x64 platform using -O3 optimizations, the
assembly code for the first version will contain a conditional move instruction
'cmov', the second version will contain a branch. Since branches are extremely
slow, the second version ends up two times slower than the first version.


-- 
           Summary: GCC optimizer for Intel x64 generates inefficient code
           Product: gcc
           Version: 4.4.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ekuznetsov at divxcorp dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
@ 2010-09-16  1:19 ` ekuznetsov at divxcorp dot com
  2010-09-16 23:09 ` ekuznetsov at divxcorp dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ekuznetsov at divxcorp dot com @ 2010-09-16  1:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from ekuznetsov at divxcorp dot com  2010-09-16 01:18 -------
Created an attachment (id=21807)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21807&action=view)
Sample code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
  2010-09-16  1:19 ` [Bug rtl-optimization/45685] " ekuznetsov at divxcorp dot com
@ 2010-09-16 23:09 ` ekuznetsov at divxcorp dot com
  2010-09-17  9:59 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ekuznetsov at divxcorp dot com @ 2010-09-16 23:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from ekuznetsov at divxcorp dot com  2010-09-16 23:08 -------
Created an attachment (id=21813)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21813&action=view)
Output of gcc -v -O3 gcc-bug.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
  2010-09-16  1:19 ` [Bug rtl-optimization/45685] " ekuznetsov at divxcorp dot com
  2010-09-16 23:09 ` ekuznetsov at divxcorp dot com
@ 2010-09-17  9:59 ` ubizjak at gmail dot com
  2010-09-17 10:03 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2010-09-17  9:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from ubizjak at gmail dot com  2010-09-17 09:59 -------
This all happens in IF conversion pass.

4.6 regresses in the sense that a branch is emitted instead of cmov for:

int
summation_helper_1 (long * products, unsigned long count)
{
  int s = 0;
  unsigned long i;
  for (i = 0; i < count; i++)
    {
      int val = (products[i] > 0) ? 1 : -1;
      products[i] *= val;
      if (products[i] != i)
        val = -val;
      products[i] = val;
      s += val;
    }
  return s;
}

gcc-4.4.4 -O3 produces:

.L16:
        movq    (%rdi,%rdx,8), %r10
        testq   %r10, %r10
        setg    %r8b
        xorl    %ecx, %ecx
        testq   %r10, %r10
        movzbl  %r8b, %r9d
        movzbl  %r8b, %r8d
        setle   %cl
        leaq    -1(%r8,%r8), %r8
        leal    -1(%rcx,%rcx), %ecx
        leal    -1(%r9,%r9), %r9d
        imulq   %r8, %r10
        movslq  %ecx,%r11
        cmpq    %r10, %rdx
        cmovne  %r11, %r8
        cmove   %r9d, %ecx
        movq    %r8, (%rdi,%rdx,8)
        addq    $1, %rdx
        addl    %ecx, %eax
        cmpq    %rdx, %rsi
        ja      .L16

and gcc-4.6 20100917

.L15:
        movq    (%rdi,%rdx,8), %r8
        testq   %r8, %r8
        movq    %r8, %r10
        setg    %cl
        xorl    %r9d, %r9d
        testq   %r8, %r8
        movzbl  %cl, %r11d
        movzbl  %cl, %ecx
        setle   %r9b
        leaq    -1(%rcx,%rcx), %rcx
        leaq    -1(%r9,%r9), %r9
        imulq   %rcx, %r10
        cmpq    %r10, %rdx
        cmove   %rcx, %r9
        leal    -1(%r11,%r11), %ecx
        movq    %r9, (%rdi,%rdx,8)
        je      .L12
        xorl    %ecx, %ecx
        testq   %r8, %r8
        setle   %cl
        leal    -1(%rcx,%rcx), %ecx
.L12:
        addq    $1, %rdx
        addl    %ecx, %eax
        cmpq    %rsi, %rdx
        jne     .L15


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|0000-00-00 00:00:00         |2010-09-17 09:59:36
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
                   ` (2 preceding siblings ...)
  2010-09-17  9:59 ` ubizjak at gmail dot com
@ 2010-09-17 10:03 ` ubizjak at gmail dot com
  2010-09-17 13:04 ` [Bug rtl-optimization/45685] [4.6 Regression] " hjl dot tools at gmail dot com
  2010-09-17 13:45 ` matz at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2010-09-17 10:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2010-09-17 10:02 -------
Confirmed. Regression?


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|2010-09-17 09:59:36         |2010-09-17 10:02:53
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] [4.6 Regression] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
                   ` (3 preceding siblings ...)
  2010-09-17 10:03 ` ubizjak at gmail dot com
@ 2010-09-17 13:04 ` hjl dot tools at gmail dot com
  2010-09-17 13:45 ` matz at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: hjl dot tools at gmail dot com @ 2010-09-17 13:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from hjl dot tools at gmail dot com  2010-09-17 13:04 -------
(In reply to comment #4)
> This all happens in IF conversion pass.
> 
> 4.6 regresses in the sense that a branch is emitted instead of cmov for:
> 

This is caused by revision 159106:

http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00156.html


-- 

hjl dot tools at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at suse dot de
            Summary|GCC optimizer for Intel x64 |[4.6 Regression] GCC
                   |generates inefficient code  |optimizer for Intel x64
                   |                            |generates inefficient code
   Target Milestone|---                         |4.6.0
            Version|4.4.3                       |4.6.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/45685] [4.6 Regression] GCC optimizer for Intel x64 generates inefficient code
  2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
                   ` (4 preceding siblings ...)
  2010-09-17 13:04 ` [Bug rtl-optimization/45685] [4.6 Regression] " hjl dot tools at gmail dot com
@ 2010-09-17 13:45 ` matz at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: matz at gcc dot gnu dot org @ 2010-09-17 13:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from matz at gcc dot gnu dot org  2010-09-17 13:45 -------
It might have been exposed by that revision, but that merely points out
a deficiency in RTL if conversion.  The final gimple code doesn't have
explicit jumps in the inner loop, but uses cond_expr:

<bb 3>:
  # s_22 = PHI <0(2), s_30(3)>
  # i_19 = PHI <0(2), i_31(3)>
  D.2693_11 = MEM[base: products_9(D), index: i_19, step: 8, offset: 0B];
  val_4 = [cond_expr] D.2693_11 <= 0 ? -1 : 1;
  prephitmp.9_39 = [cond_expr] D.2693_11 <= 0 ? -1 : 1;
  prephitmp.10_40 = [cond_expr] D.2693_11 <= 0 ? 1 : -1;
  prephitmp.11_41 = [cond_expr] D.2693_11 <= 0 ? 1 : -1;
  D.2698_21 = D.2693_11 * prephitmp.9_39;
  D.2699_25 = (long unsigned int) D.2698_21;
  val_3 = [cond_expr] i_19 != D.2699_25 ? prephitmp.10_40 : val_4;
  prephitmp.11_43 = [cond_expr] i_19 != D.2699_25 ? prephitmp.11_41 :
prephitmp.9_39;
  MEM[base: products_9(D), index: i_19, step: 8, offset: 0B] = prephitmp.11_43;
  s_30 = val_3 + s_22;
  i_31 = i_19 + 1;
  if (i_31 != count_7(D))
    goto <bb 3>;
  else
    goto <bb 4>;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45685


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-09-17 13:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-16  1:17 [Bug rtl-optimization/45685] New: GCC optimizer for Intel x64 generates inefficient code ekuznetsov at divxcorp dot com
2010-09-16  1:19 ` [Bug rtl-optimization/45685] " ekuznetsov at divxcorp dot com
2010-09-16 23:09 ` ekuznetsov at divxcorp dot com
2010-09-17  9:59 ` ubizjak at gmail dot com
2010-09-17 10:03 ` ubizjak at gmail dot com
2010-09-17 13:04 ` [Bug rtl-optimization/45685] [4.6 Regression] " hjl dot tools at gmail dot com
2010-09-17 13:45 ` matz at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).