[Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
@ 2015-01-21  5:11 vmakarov at gcc dot gnu.org
  2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-01-21  5:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

            Bug ID: 64705
           Summary: Bad code generation of sieve on x86-64 because of too
                    aggressive IV optimizations
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vmakarov at gcc dot gnu.org

Created attachment 34510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34510&action=edit
preprocessed sieve program from aburto benchmarks

GCC on trunk generates a bad x86-64 code for the hotest loop in sieve
(preprocessed file in the attachment) for -Ofast -march=core-avx2:

for(k = i + prime ; k<=size ; k+=prime)
 {
   ci++;
   *(flags+k)=0;
 }

The GCC generated code is

.L82:
        movb    $0, (%rax)
        addq    %rsi, %rax
        addq    $1, %rbx
        leaq    (%rax,%rcx), %rdx
        cmpq    %rdx, %rbp
        jge     .L82

Here is the code generated by LLVM-3.5 using the same options:

   .LBB41_51:                              # %for.body26.unr
                                        #   Parent Loop BB41_45 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        incq    %r12
        movb    $0, (%rbx,%rax)
        addq    %r14, %rax
        cmpq    %r15, %rax
        jle     .LBB41_51

LLVM generates 5 insns loop instead of 6 insns loop in GCC.

It is achieved by using base+index addressing instead of just base
addressing in GCC which is a result of induction of flags+k expression.

I tried to make base+index addressing with zero cost by the following
patch.

Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c      (revision 219705)
+++ tree-ssa-loop-ivopts.c      (working copy)
@@ -3458,7 +3458,7 @@ get_address_cost (bool symbol_present, b
          end_sequence ();

          acost = seq_cost (seq, speed);
-         acost += address_cost (addr, mem_mode, as, speed);
+         acost = 0;

          if (!acost)
            acost = 1;


I got base+index addressing but still it is 6 insn loop becuase of
induction of other expressions.

.L82:
        movb    $0, (%r8,%rax)
        addq    %rcx, %rax
        addq    $1, %rbx
        leaq    (%rsi,%rax), %rdx
        cmpq    %rdx, %r14
        jge     .L82

Again, too aggressive iv optimization results in worse code generated
by GCC.  The code can be better which is demonstrated by LLVM-3.5.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
@ 2015-01-23  2:14 ` amker at gcc dot gnu.org
  2015-01-23  4:01 ` amker at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23  2:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

amker at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-01-23
                 CC|                            |amker at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |amker at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from amker at gcc dot gnu.org ---
Confirmed.  I shall have a look.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
  2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
@ 2015-01-23  4:01 ` amker at gcc dot gnu.org
  2015-01-23  7:24 ` amker at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23  4:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

--- Comment #2 from amker at gcc dot gnu.org ---
Loop dump before IVOPT is like below:

Loop 4, basic blocks 28/30;

  <bb 26>:
  count_54 = count_172 + 1;
  _55 = i_161 + i_161;
  prime_56 = _55 + 3;
  k_57 = prime_56 + i_161;
  if (size_26 >= k_57)
    goto <bb 27>;
  else
    goto <bb 31>;

  <bb 27>:

  <bb 28>:
  # k_167 = PHI <k_57(27), k_62(30)>
  # ci_168 = PHI <ci_169(27), ci_58(30)>
  ci_58 = ci_168 + 1;
  k.19_59 = (sizetype) k_167;
  _60 = flags_30 + k.19_59;
  *_60 = 0;
  k_62 = prime_56 + k_167;
  if (size_26 >= k_62)
    goto <bb 30>;
  else
    goto <bb 29>;

  <bb 29>:
  # ci_154 = PHI <ci_58(28)>
  goto <bb 31>;

  <bb 30>:
  goto <bb 28>;


The IV uses found by IVOPT is like below:

use 0
  address
  defined in statement 
  used in statement *_60 = 0;

  at position *_60
  type char *
  base flags_30 + (sizetype) k_57
  step (sizetype) prime_56
  base object (void *) flags_30
  related candidates 
use 1
  compare
  defined in statement 
  used in statement if (size_26 >= k_62)

  at position 
  type long int
  base (_55 + 3) + k_57
  step prime_56
  is a biv
  related candidates 
use 2
  generic (computed on exit edge)
  defined in statement ci_58 = ci_168 + 1;
  used in statement ci_154 = PHI <ci_58(28)>

  at position 
  type long int
  base ci_169 + 1
  step 1
  is a biv
  related candidates 

Root cause is IVOPT expands use 1 from {prime_56 + k_57, prime_56}_loop to
{(_55 + _3) + k_57, prime_56}_loop.  Thus information of "iv.step == prime_56
== (_55+_3)" is lost during costs computation and uses rewrting, resulting in
wrong candidate selected and bloated loop after IVOPT.

The related code is in function find_givs_in_stmt_scev, specifically,

  if (!simple_iv (loop, loop_containing_stmt (stmt), lhs, iv, true))
    return false;
  iv->base = expand_simple_operations (iv->base);   // <--- expansion

I will see how to fix the issue by skipping expansion in case like this.

Thanks,
bin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
  2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
  2015-01-23  4:01 ` amker at gcc dot gnu.org
@ 2015-01-23  7:24 ` amker at gcc dot gnu.org
  2015-02-05  6:39 ` amker at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-01-23  7:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

amker at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|x86_64-*-*                  |x86_64-*-*, aarch64

--- Comment #3 from amker at gcc dot gnu.org ---
Also it's a target independent issue.
Though IVOPT chooses base+index addressing mode, it needs one more instruction
to calculate the condition.

LLVM's assembly:
.LBB34_7:
    add    x26, x26, #1
    strb     wzr, [x22, x9]
    add     x9, x9, x24
    cmp     x9, x28
    b.le    .LBB34_7

GCC's assembly:
.L71:
    strb    wzr, [x27, x0]
    add    x0, x0, x2
    add    x19, x19, 1
    add    x1, x4, x0
    cmp    x21, x1
    bge    .L71


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-01-23  7:24 ` amker at gcc dot gnu.org
@ 2015-02-05  6:39 ` amker at gcc dot gnu.org
  2015-02-13  5:45 ` amker at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-05  6:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

--- Comment #4 from amker at gcc dot gnu.org ---
I had a patch.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-02-05  6:39 ` amker at gcc dot gnu.org
@ 2015-02-13  5:45 ` amker at gcc dot gnu.org
  2015-02-13  5:56 ` amker at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-13  5:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

--- Comment #5 from amker at gcc dot gnu.org ---
Author: amker
Date: Fri Feb 13 05:44:46 2015
New Revision: 220676

URL: https://gcc.gnu.org/viewcvs?rev=220676&root=gcc&view=rev
Log:

    PR tree-optimization/64705
    * tree-ssa-loop-niter.h (expand_simple_operations): New parameter.
    * tree-ssa-loop-niter.c (expand_simple_operations): New parameter.
    * tree-ssa-loop-ivopts.c (extract_single_var_from_expr): New.
    (find_bivs, find_givs_in_stmt_scev): Pass new argument to
    expand_simple_operations.

    testsuite
    PR tree-optimization/64705
    * gcc.dg/tree-ssa/pr64705.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c
    trunk/gcc/tree-ssa-loop-niter.c
    trunk/gcc/tree-ssa-loop-niter.h


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-02-13  5:45 ` amker at gcc dot gnu.org
@ 2015-02-13  5:56 ` amker at gcc dot gnu.org
  2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
  2015-03-12  1:39 ` amker at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-02-13  5:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

--- Comment #6 from amker at gcc dot gnu.org ---
Since it works on gcc 3.4, so I consider this as a regression and applied the
patch.  Should be fixed now.

Hi Vlad, could you please help me verify that the original benchmark is fixed
too?  Thanks very much!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-02-13  5:56 ` amker at gcc dot gnu.org
@ 2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
  2015-03-12  1:39 ` amker at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-03-11 15:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to amker from comment #6)
> Since it works on gcc 3.4, so I consider this as a regression and applied
> the patch.  Should be fixed now.
> 
> Hi Vlad, could you please help me verify that the original benchmark is
> fixed too?  Thanks very much!

Yes, it was fixed.  Thanks for working on this.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/64705] Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
  2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
@ 2015-03-12  1:39 ` amker at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: amker at gcc dot gnu.org @ 2015-03-12  1:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

amker at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from amker at gcc dot gnu.org ---
Fixed according to Vlad's input.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-12  1:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
2015-01-23  4:01 ` amker at gcc dot gnu.org
2015-01-23  7:24 ` amker at gcc dot gnu.org
2015-02-05  6:39 ` amker at gcc dot gnu.org
2015-02-13  5:45 ` amker at gcc dot gnu.org
2015-02-13  5:56 ` amker at gcc dot gnu.org
2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
2015-03-12  1:39 ` amker at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).