public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations
@ 2015-01-21  5:11 vmakarov at gcc dot gnu.org
  2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-01-21  5:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64705

            Bug ID: 64705
           Summary: Bad code generation of sieve on x86-64 because of too
                    aggressive IV optimizations
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vmakarov at gcc dot gnu.org

Created attachment 34510
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34510&action=edit
preprocessed sieve program from aburto benchmarks

GCC on trunk generates a bad x86-64 code for the hotest loop in sieve
(preprocessed file in the attachment) for -Ofast -march=core-avx2:

for(k = i + prime ; k<=size ; k+=prime)
 {
   ci++;
   *(flags+k)=0;
 }

The GCC generated code is

.L82:
        movb    $0, (%rax)
        addq    %rsi, %rax
        addq    $1, %rbx
        leaq    (%rax,%rcx), %rdx
        cmpq    %rdx, %rbp
        jge     .L82

Here is the code generated by LLVM-3.5 using the same options:

   .LBB41_51:                              # %for.body26.unr
                                        #   Parent Loop BB41_45 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        incq    %r12
        movb    $0, (%rbx,%rax)
        addq    %r14, %rax
        cmpq    %r15, %rax
        jle     .LBB41_51

LLVM generates 5 insns loop instead of 6 insns loop in GCC.

It is achieved by using base+index addressing instead of just base
addressing in GCC which is a result of induction of flags+k expression.

I tried to make base+index addressing with zero cost by the following
patch.

Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c      (revision 219705)
+++ tree-ssa-loop-ivopts.c      (working copy)
@@ -3458,7 +3458,7 @@ get_address_cost (bool symbol_present, b
          end_sequence ();

          acost = seq_cost (seq, speed);
-         acost += address_cost (addr, mem_mode, as, speed);
+         acost = 0;

          if (!acost)
            acost = 1;


I got base+index addressing but still it is 6 insn loop becuase of
induction of other expressions.

.L82:
        movb    $0, (%r8,%rax)
        addq    %rcx, %rax
        addq    $1, %rbx
        leaq    (%rsi,%rax), %rdx
        cmpq    %rdx, %r14
        jge     .L82

Again, too aggressive iv optimization results in worse code generated
by GCC.  The code can be better which is demonstrated by LLVM-3.5.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-12  1:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21  5:11 [Bug tree-optimization/64705] New: Bad code generation of sieve on x86-64 because of too aggressive IV optimizations vmakarov at gcc dot gnu.org
2015-01-23  2:14 ` [Bug tree-optimization/64705] " amker at gcc dot gnu.org
2015-01-23  4:01 ` amker at gcc dot gnu.org
2015-01-23  7:24 ` amker at gcc dot gnu.org
2015-02-05  6:39 ` amker at gcc dot gnu.org
2015-02-13  5:45 ` amker at gcc dot gnu.org
2015-02-13  5:56 ` amker at gcc dot gnu.org
2015-03-11 15:02 ` vmakarov at gcc dot gnu.org
2015-03-12  1:39 ` amker at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).