public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/99067] New: Missed optimization for induction variable elimination
@ 2021-02-10 23:49 brian.grayson at sifive dot com
  2021-02-11  0:48 ` [Bug rtl-optimization/99067] " wilson at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: brian.grayson at sifive dot com @ 2021-02-10 23:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

            Bug ID: 99067
           Summary: Missed optimization for induction variable elimination
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: brian.grayson at sifive dot com
  Target Milestone: ---

For RISC-V, this code snippet could eliminate the induction variable for the
loop ending condition.

#include <stdint.h>
int16_t a[1000];
int64_t N = 100;
int64_t found_zero() {
  for (int i = 0; i <= N; i++) {
    if (a[i] == 0) return 1;
  }
  return 0;
}

gcc -O3 for RISC-V generates:

.L8:
  addi  a5,a5,2
  blt a2,a4,.L4
.L3:
  lh  a3,0(a5)
  addi  a4,a4,1  <-- induction variable update that can be eliminated
  bne a3,zero,.L8
  li  a0,1
  ret
.L4:
  li  a0,0
  ret

Is there a reason it doesn't do this transform (written at the C level) to do
pointer comparisons:

  for (int16_t* p = &a[0]; p <= &a[N]; p++) { ... }

That C code is able to remove the extra add instruction:

.L15:
  bgtu  a5,a3,.L12
.L11:
  lh  a4,0(a5)
  addi  a5,a5,2
  bne a4,zero,.L15
  li  a0,1
  ret
.L12:
  li  a0,0
  ret

I verified the same issue occurs in PowerPC and ARM code-gen, so this isn't
target-specific.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/99067] Missed optimization for induction variable elimination
  2021-02-10 23:49 [Bug rtl-optimization/99067] New: Missed optimization for induction variable elimination brian.grayson at sifive dot com
@ 2021-02-11  0:48 ` wilson at gcc dot gnu.org
  2021-02-16 15:23 ` [Bug tree-optimization/99067] " amker at gcc dot gnu.org
  2021-02-18  2:38 ` amker at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: wilson at gcc dot gnu.org @ 2021-02-11  0:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson <wilson at gcc dot gnu.org> ---
This looks similar to an ivopts problem I looked at regarding coremark.  Given
this testcase
void matrix_add_const_unsigned(unsigned int N, short *A, short val) {
        unsigned int i,j;
        for (i=0; i<N; i++) {
                for (j=0; j<N; j++) {
                        A[i*N+j] += val;
                }
        }
}
void matrix_add_const_signed(signed int N, short *A, short val) {
        signed int i,j;
        for (i=0; i<N; i++) {
                for (j=0; j<N; j++) {
                        A[i*N+j] += val;
                }
        }
}
and compiling for 64-bit targets with -O2, we get much better code for the
second function than the first function.  For riscv64, the first function has 8
instructions in the inner loop.  The second function has 5 instructions in the
inner loop.  I reproduced this problem on multiple 64-bit targets including
mips64, ppc64, arm64.

The problem I saw was that with a signed iterator, ivopts decides that we can
ignore overflow and it is safe to eliminate.  With an unsigned interator, it
decides that unsigned overflow can't be ignored.  Then it looks at loop bounds.
 If the loop bound is unknown, e.g. it is a function parameter in this case,
then it decides that this indunction variable isn't safe to eliminate and we
get poor optimization.

Brian's testcase appears to be another issue of this.  With the original code
ivopts turns a[i] into an unsigned interator and then sees that the loop bound
is a global variable and apparently decides it can't eliminate it.  With the
modified code using int16_t *p, gcc decides that it can eliminate it, and we
get better code.  This issue shows up with 32-bit targets but appears related
to the above.

I don't know if the ivopts issue can be fixed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/99067] Missed optimization for induction variable elimination
  2021-02-10 23:49 [Bug rtl-optimization/99067] New: Missed optimization for induction variable elimination brian.grayson at sifive dot com
  2021-02-11  0:48 ` [Bug rtl-optimization/99067] " wilson at gcc dot gnu.org
@ 2021-02-16 15:23 ` amker at gcc dot gnu.org
  2021-02-18  2:38 ` amker at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: amker at gcc dot gnu.org @ 2021-02-16 15:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

bin cheng <amker at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |amker at gcc dot gnu.org

--- Comment #2 from bin cheng <amker at gcc dot gnu.org> ---
Mine, will have a look.  Thanks for reporting.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/99067] Missed optimization for induction variable elimination
  2021-02-10 23:49 [Bug rtl-optimization/99067] New: Missed optimization for induction variable elimination brian.grayson at sifive dot com
  2021-02-11  0:48 ` [Bug rtl-optimization/99067] " wilson at gcc dot gnu.org
  2021-02-16 15:23 ` [Bug tree-optimization/99067] " amker at gcc dot gnu.org
@ 2021-02-18  2:38 ` amker at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: amker at gcc dot gnu.org @ 2021-02-18  2:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

--- Comment #3 from bin cheng <amker at gcc dot gnu.org> ---
Though not sure if the underlying root causes are the same, I think these are
two different issues, at least, they are handled by different parts of code in
IVOPTs.  
For the first one, it's a known issue in GCC and IV elimination is complicated
yet quite conservative for long time, while for the second one, we indeed don't
know whether "i*N+j" wraps or not.  Even though we might be able to improve
IVOPTs under condition of wrapping behavior.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-18  2:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 23:49 [Bug rtl-optimization/99067] New: Missed optimization for induction variable elimination brian.grayson at sifive dot com
2021-02-11  0:48 ` [Bug rtl-optimization/99067] " wilson at gcc dot gnu.org
2021-02-16 15:23 ` [Bug tree-optimization/99067] " amker at gcc dot gnu.org
2021-02-18  2:38 ` amker at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).