Question about loop induction variables

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Question about loop induction variables
@ 2017-05-07  9:23 Fredrik Hederstierna
  2017-05-08  7:15 ` Richard Biener via gcc
  0 siblings, 1 reply; 2+ messages in thread
From: Fredrik Hederstierna @ 2017-05-07  9:23 UTC (permalink / raw)
  To: gcc

Hi,

I have a question about loop induction variables, related to

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213

Consider a simple loop like

  int ix;
  for (ix = 0; ix < 6; ix++) {
    data[ix] = ix;
  }

In this case variable 'ix' is used as counting variable for array index,
but also used as value for storage, so in some sense its used in two different "ways".

Several target architectures might have support to auto-increment pointer registers when storing or reading data,
but if same variable is used as both array index to pointer and as data it might be more complicated?

In the example above, it seems like GCC loop analyzer (for completely peeling) thinks 'ix' is an induction variable that can be folded away:
from 'tree-ssa-loop-ivcanon.c':

  Loop 1 iterates 5 times.
  Loop 1 iterates at most 5 times.
  Estimating sizes for loop 1
   BB: 3, after_exit: 0
    size:   0 _4 = (char) i_9;
     Induction variable computation will be folded away.
    size:   1 data[i_9] = _4;
    size:   1 i_6 = i_9 + 1;   <----- OK?
     Induction variable computation will be folded away.
    size:   1 ivtmp_7 = ivtmp_1 - 1;
     Induction variable computation will be folded away.
    size:   2 if (ivtmp_7 != 0)
     Exit condition will be eliminated in peeled copies.
   BB: 4, after_exit: 1
  size: 5-4, last_iteration: 5-2
    Loop size: 5
    Estimated size after unrolling: 5

Then index 'ix' is considered to be a induction variable, but possibly it cannot be simply folded since its used in other ways as well?

Though completely-peeling loop resulted in longer code:


  int i;
  char _4;
  unsigned int ivtmp_7;
  char _12;
  unsigned int ivtmp_15;
  char _19;
  unsigned int ivtmp_22;
  char _26;
  unsigned int ivtmp_29;
  char _33;
  unsigned int ivtmp_36;
  char _40;
  unsigned int ivtmp_43;

  <bb 2>:
  _12 = 0;
  data[0] = _12;
  i_14 = 1;
  ivtmp_15 = 5;
  _19 = (char) i_14;
  data[i_14] = _19;
  i_21 = i_14 + 1;
  ivtmp_22 = ivtmp_15 + 4294967295;
  _26 = (char) i_21;
  data[i_21] = _26;
  i_28 = i_21 + 1;
  ivtmp_29 = ivtmp_22 + 4294967295;
  _33 = (char) i_28;
  data[i_28] = _33;
  i_35 = i_28 + 1;
  ivtmp_36 = ivtmp_29 + 4294967295;
  _40 = (char) i_35;
  data[i_35] = _40;
  i_42 = i_35 + 1;
  ivtmp_43 = ivtmp_36 + 4294967295;
  _4 = (char) i_42;
  data[i_42] = _4;
  i_6 = i_42 + 1;
  ivtmp_7 = ivtmp_43 + 4294967295;
  return;


instead of original and shorter

  int i;
  unsigned int ivtmp_1;
  char _4;
  unsigned int ivtmp_7;

  <bb 2>:

  <bb 3>:
  # i_9 = PHI <i_6(4), 0(2)>
  # ivtmp_1 = PHI <ivtmp_7(4), 6(2)>
  _4 = (char) i_9;
  data[i_9] = _4;
  i_6 = i_9 + 1;
  ivtmp_7 = ivtmp_1 - 1;
  if (ivtmp_7 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  return;


Example for ARM target machine code it became longer:

0000001c <test_iter_6>:
  1c:   e59f3030        ldr     r3, [pc, #48]   ; 54 <test_iter_6+0x38>
  20:   e3a02000        mov     r2, #0
  24:   e5c32000        strb    r2, [r3]
  28:   e3a02001        mov     r2, #1
  2c:   e5c32001        strb    r2, [r3, #1]
  30:   e3a02002        mov     r2, #2
  34:   e5c32002        strb    r2, [r3, #2]
  38:   e3a02003        mov     r2, #3
  3c:   e5c32003        strb    r2, [r3, #3]
  40:   e3a02004        mov     r2, #4
  44:   e5c32004        strb    r2, [r3, #4]
  48:   e3a02005        mov     r2, #5
  4c:   e5c32005        strb    r2, [r3, #5]
  50:   e12fff1e        bx      lr
  54:   00000000        .word   0x00000000

compared to if complete-loop-peeling was not done

0000001c <test_iter_6>:
  1c:   e59f2014        ldr     r2, [pc, #20]   ; 38 <test_iter_6+0x1c>
  20:   e3a03000        mov     r3, #0
  24:   e7c33002        strb    r3, [r3, r2]
  28:   e2833001        add     r3, r3, #1
  2c:   e3530006        cmp     r3, #6
  30:   1afffffb        bne     24 <test_iter_6+0x8>
  34:   e12fff1e        bx      lr
  38:   00000000        .word   0x00000000

Producing 15 instead of 8 words, giving ~100% larger code size.

And same for x86 arch:

   f:   c6 05 00 00 00 00 00    movb   $0x0,0x0(%rip)        # 16
<test_iter_6+0x7>
  16:   c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # 1d
<test_iter_6+0xe>
  1d:   c6 05 00 00 00 00 02    movb   $0x2,0x0(%rip)        # 24
<test_iter_6+0x15>
  24:   c6 05 00 00 00 00 03    movb   $0x3,0x0(%rip)        # 2b
<test_iter_6+0x1c>
  2b:   c6 05 00 00 00 00 04    movb   $0x4,0x0(%rip)        # 32
<test_iter_6+0x23>
  32:   c6 05 00 00 00 00 05    movb   $0x5,0x0(%rip)        # 39
<test_iter_6+0x2a>
  39:   c3                      retq   


Without complete-loop-peeling done:

   f:   31 c0                   xor    %eax,%eax
  11:   88 80 00 00 00 00       mov    %al,0x0(%rax)
  17:   48 ff c0                inc    %rax
  1a:   48 83 f8 06             cmp    $0x6,%rax
  1e:   75 f1                   jne    11 <test_iter_6+0x2>
  20:   c3                      retq


Producing 43 instead of 18 bytes, ~140% bigger code size.

Should a variable be considered to be induction variable also when its used both as LHS array index and RHS data value in same loop?
Since its cross-target for both ARM and x86, does it origin in some cost estimation on how 'ix' or other induction variables will be folded or not in the final target code?

Thanks, Kind Regards,
Fredrik

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Question about loop induction variables
  2017-05-07  9:23 Question about loop induction variables Fredrik Hederstierna
@ 2017-05-08  7:15 ` Richard Biener via gcc
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener via gcc @ 2017-05-08  7:15 UTC (permalink / raw)
  To: Fredrik Hederstierna; +Cc: GCC Development

On Sun, May 7, 2017 at 11:22 AM, Fredrik Hederstierna
<fredrik.hederstierna@verisure.com> wrote:
> Hi,
>
> I have a question about loop induction variables, related to
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213
>
> Consider a simple loop like
>
>   int ix;
>   for (ix = 0; ix < 6; ix++) {
>     data[ix] = ix;
>   }
>
> In this case variable 'ix' is used as counting variable for array index,
> but also used as value for storage, so in some sense its used in two different "ways".
>
> Several target architectures might have support to auto-increment pointer registers when storing or reading data,
> but if same variable is used as both array index to pointer and as data it might be more complicated?
>
> In the example above, it seems like GCC loop analyzer (for completely peeling) thinks 'ix' is an induction variable that can be folded away:
> from 'tree-ssa-loop-ivcanon.c':
>
>   Loop 1 iterates 5 times.
>   Loop 1 iterates at most 5 times.
>   Estimating sizes for loop 1
>    BB: 3, after_exit: 0
>     size:   0 _4 = (char) i_9;
>      Induction variable computation will be folded away.
>     size:   1 data[i_9] = _4;
>     size:   1 i_6 = i_9 + 1;   <----- OK?
>      Induction variable computation will be folded away.
>     size:   1 ivtmp_7 = ivtmp_1 - 1;
>      Induction variable computation will be folded away.
>     size:   2 if (ivtmp_7 != 0)
>      Exit condition will be eliminated in peeled copies.
>    BB: 4, after_exit: 1
>   size: 5-4, last_iteration: 5-2
>     Loop size: 5
>     Estimated size after unrolling: 5
>
> Then index 'ix' is considered to be a induction variable, but possibly it cannot be simply folded since its used in other ways as well?

Well, but all the listed stmts _can_ be folded away because the
starting value is zero and thus they "fold away" to constants.

> Though completely-peeling loop resulted in longer code:
>
>
>   int i;
>   char _4;
>   unsigned int ivtmp_7;
>   char _12;
>   unsigned int ivtmp_15;
>   char _19;
>   unsigned int ivtmp_22;
>   char _26;
>   unsigned int ivtmp_29;
>   char _33;
>   unsigned int ivtmp_36;
>   char _40;
>   unsigned int ivtmp_43;
>
>   <bb 2>:
>   _12 = 0;
>   data[0] = _12;
>   i_14 = 1;
>   ivtmp_15 = 5;
>   _19 = (char) i_14;
>   data[i_14] = _19;
>   i_21 = i_14 + 1;
>   ivtmp_22 = ivtmp_15 + 4294967295;
>   _26 = (char) i_21;
>   data[i_21] = _26;
>   i_28 = i_21 + 1;
>   ivtmp_29 = ivtmp_22 + 4294967295;
>   _33 = (char) i_28;
>   data[i_28] = _33;
>   i_35 = i_28 + 1;
>   ivtmp_36 = ivtmp_29 + 4294967295;
>   _40 = (char) i_35;
>   data[i_35] = _40;
>   i_42 = i_35 + 1;
>   ivtmp_43 = ivtmp_36 + 4294967295;
>   _4 = (char) i_42;
>   data[i_42] = _4;
>   i_6 = i_42 + 1;
>   ivtmp_7 = ivtmp_43 + 4294967295;
>   return;
>
>
> instead of original and shorter
>
>   int i;
>   unsigned int ivtmp_1;
>   char _4;
>   unsigned int ivtmp_7;
>
>   <bb 2>:
>
>   <bb 3>:
>   # i_9 = PHI <i_6(4), 0(2)>
>   # ivtmp_1 = PHI <ivtmp_7(4), 6(2)>
>   _4 = (char) i_9;
>   data[i_9] = _4;
>   i_6 = i_9 + 1;
>   ivtmp_7 = ivtmp_1 - 1;
>   if (ivtmp_7 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
>   <bb 4>:
>   goto <bb 3>;
>
>   <bb 5>:
>   return;

The cost metrics assume constant propagation happened which in the
above code didn't yet:

>   i_14 = 1;
>   _19 = (char) i_14;
...

So you are comparing (visually) apples and oranges.

>
> Example for ARM target machine code it became longer:
>
> 0000001c <test_iter_6>:
>   1c:   e59f3030        ldr     r3, [pc, #48]   ; 54 <test_iter_6+0x38>
>   20:   e3a02000        mov     r2, #0
>   24:   e5c32000        strb    r2, [r3]
>   28:   e3a02001        mov     r2, #1
>   2c:   e5c32001        strb    r2, [r3, #1]
>   30:   e3a02002        mov     r2, #2
>   34:   e5c32002        strb    r2, [r3, #2]
>   38:   e3a02003        mov     r2, #3
>   3c:   e5c32003        strb    r2, [r3, #3]
>   40:   e3a02004        mov     r2, #4
>   44:   e5c32004        strb    r2, [r3, #4]
>   48:   e3a02005        mov     r2, #5
>   4c:   e5c32005        strb    r2, [r3, #5]
>   50:   e12fff1e        bx      lr
>   54:   00000000        .word   0x00000000
>
> compared to if complete-loop-peeling was not done
>
> 0000001c <test_iter_6>:
>   1c:   e59f2014        ldr     r2, [pc, #20]   ; 38 <test_iter_6+0x1c>
>   20:   e3a03000        mov     r3, #0
>   24:   e7c33002        strb    r3, [r3, r2]
>   28:   e2833001        add     r3, r3, #1
>   2c:   e3530006        cmp     r3, #6
>   30:   1afffffb        bne     24 <test_iter_6+0x8>
>   34:   e12fff1e        bx      lr
>   38:   00000000        .word   0x00000000
>
> Producing 15 instead of 8 words, giving ~100% larger code size.
>
> And same for x86 arch:
>
>    f:   c6 05 00 00 00 00 00    movb   $0x0,0x0(%rip)        # 16
> <test_iter_6+0x7>
>   16:   c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # 1d
> <test_iter_6+0xe>
>   1d:   c6 05 00 00 00 00 02    movb   $0x2,0x0(%rip)        # 24
> <test_iter_6+0x15>
>   24:   c6 05 00 00 00 00 03    movb   $0x3,0x0(%rip)        # 2b
> <test_iter_6+0x1c>
>   2b:   c6 05 00 00 00 00 04    movb   $0x4,0x0(%rip)        # 32
> <test_iter_6+0x23>
>   32:   c6 05 00 00 00 00 05    movb   $0x5,0x0(%rip)        # 39
> <test_iter_6+0x2a>
>   39:   c3                      retq
>
>
> Without complete-loop-peeling done:
>
>    f:   31 c0                   xor    %eax,%eax
>   11:   88 80 00 00 00 00       mov    %al,0x0(%rax)
>   17:   48 ff c0                inc    %rax
>   1a:   48 83 f8 06             cmp    $0x6,%rax
>   1e:   75 f1                   jne    11 <test_iter_6+0x2>
>   20:   c3                      retq
>
>
> Producing 43 instead of 18 bytes, ~140% bigger code size.
>
> Should a variable be considered to be induction variable also when its used both as LHS array index and RHS data value in same loop?
> Since its cross-target for both ARM and x86, does it origin in some cost estimation on how 'ix' or other induction variables will be folded or not in the final target code?

Yes, the "heuristic" is off in some cases (it adds some optimistic
factor of followup optimization opportunities IIRC).  But not for
the reason you think.

Richard.

> Thanks, Kind Regards,
> Fredrik
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-05-08  7:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-07  9:23 Question about loop induction variables Fredrik Hederstierna
2017-05-08  7:15 ` Richard Biener via gcc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).