public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: missed loop optimizations from loop induction variable copies
@ 2009-09-22 11:53 Rahul Kharche
  2009-09-22 13:08 ` Richard Guenther
  2009-09-23 10:37 ` Zdenek Dvorak
  0 siblings, 2 replies; 3+ messages in thread
From: Rahul Kharche @ 2009-09-22 11:53 UTC (permalink / raw)
  To: gcc; +Cc: sdkteam-gnu

The following causes missed loop optimizations in O2 from creating
unnecessary loop induction variables. Or, is a case of IV opts not
able to coalesce copies of induction variables. A previous post related
to this was made in PR41026 which had a type promoted loop index
variable
copied. I believe this example makes the problem more obvious.

struct struct_t {
  int* data;
};

void testAutoIncStruct (struct struct_t* sp, int start, int end)
{
    int i;
    for (i = 0; i+start < end; i++)
      {
        sp->data[i+start] = 0;
      }
}

With GCC v4.4.1 release) and gcc -O2 -fdump-tree-all on the above case
we get the following dump from IVOpts

testAutoIncStruct (struct struct_t * sp, int start, int end)
{
  unsigned int D.1283;
  unsigned int D.1284;
  int D.1282;
  unsigned int ivtmp.32;
  int * pretmp.17;
  int i;
  int * D.1245;
  unsigned int D.1244;
  unsigned int D.1243;

<bb 2>:
  if (start_3(D) < end_5(D))
    goto <bb 3>;
  else
    goto <bb 6>;

<bb 3>:
  pretmp.17_22 = sp_6(D)->data;
  D.1282_23 = start_3(D) + 1;
  ivtmp.32_25 = (unsigned int) D.1282_23;
  D.1283_27 = (unsigned int) end_5(D);
  D.1284_28 = D.1283_27 + 1;

<bb 4>:
  # start_20 = PHI <start_4(5), start_3(D)(3)>
  # ivtmp.32_7 = PHI <ivtmp.32_24(5), ivtmp.32_25(3)>
  D.1243_9 = (unsigned int) start_20;
  D.1244_10 = D.1243_9 * 4;
  D.1245_11 = pretmp.17_22 + D.1244_10;
  *D.1245_11 = 0;
  start_26 = (int) ivtmp.32_7;
  start_4 = start_26;
  ivtmp.32_24 = ivtmp.32_7 + 1;
  if (ivtmp.32_24 != D.1284_28)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  goto <bb 4>;

<bb 6>:
  return;

}

IVOpts cannot identify start_26, start_4 and ivtmp_32_7 to be copies.
The root cause is that expression 'i + start' is identified as a common
expression between the test in the header and the index operation in the
latch. This is unified by copy propagation or FRE prior to loop
optimizations
and creates a new induction variable.

If we disable tree copy propagation and FRE with
gcc -O2 -fno-tree-copy-prop -fno-tree-fre -fdump-tree-all we get

testAutoIncStruct (struct struct_t * sp, int start, int end)
{
  unsigned int D.1287;
  unsigned int D.1288;
  unsigned int D.1289;
  int D.1290;
  unsigned int D.1284;
  unsigned int D.1285;
  unsigned int D.1286;
  int * pretmp.17;
  int i;
  int * D.1245;
  unsigned int D.1244;
  unsigned int D.1243;
  int D.1242;
  int * D.1241;

<bb 2>:
  if (start_3(D) < end_5(D))
    goto <bb 3>;
  else
    goto <bb 6>;

<bb 3>:
  pretmp.17_23 = sp_6(D)->data;
  D.1287_27 = (unsigned int) end_5(D);
  D.1288_28 = (unsigned int) start_3(D);
  D.1289_29 = D.1287_27 - D.1288_28;
  D.1290_30 = (int) D.1289_29;

<bb 4>:
  # i_20 = PHI <i_12(5), 0(3)>
  D.1241_7 = pretmp.17_23;
  D.1284_26 = (unsigned int) start_3(D);
  D.1285_25 = (unsigned int) i_20;
  D.1286_24 = D.1284_26 + D.1285_25;
  MEM[base: pretmp.17_23, index: D.1286_24, step: 4] = 0;
  i_12 = i_20 + 1;
  if (i_12 != D.1290_30)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  goto <bb 4>;

<bb 6>:
  return;

}

The correct single induction variable as been identified here. This is
not
a loop header copying problem either. If we disable loop header copying,
we
still get multiple induction variables created. In fact in the above
case
loop header copying correctly enables post-increment mode on our port.

testAutoIncStruct (struct struct_t * sp, int start, int end)
{
  unsigned int D.1282;
  unsigned int ivtmp.31;
  unsigned int ivtmp.29;
  int i;
  int * D.1245;
  unsigned int D.1244;
  unsigned int D.1243;
  int D.1242;
  int * D.1241;

<bb 2>:
  ivtmp.29_18 = (unsigned int) start_3(D);
  D.1282_21 = (unsigned int) start_3(D);
  ivtmp.31_22 = D.1282_21 * 4;
  goto <bb 4>;

<bb 3>:
  D.1241_7 = sp_6(D)->data;
  D.1244_10 = ivtmp.31_19;
  D.1245_11 = D.1241_7 + D.1244_10;
  *D.1245_11 = 0;
  ivtmp.29_17 = ivtmp.29_8 + 1;
  ivtmp.31_20 = ivtmp.31_19 + 4;

<bb 4>:
  # ivtmp.29_8 = PHI <ivtmp.29_18(2), ivtmp.29_17(3)>
  # ivtmp.31_19 = PHI <ivtmp.31_22(2), ivtmp.31_20(3)>
  D.1242_23 = (int) ivtmp.29_8;
  if (D.1242_23 < end_5(D))
    goto <bb 3>;
  else
    goto <bb 5>;

<bb 5>:
  return;

}

Does this imply we try and not copy propagate or FRE potential induction
variables? Or is this simply a missed case in IVOpts?

Rahul

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: missed loop optimizations from loop induction variable   copies
  2009-09-22 11:53 RFC: missed loop optimizations from loop induction variable copies Rahul Kharche
@ 2009-09-22 13:08 ` Richard Guenther
  2009-09-23 10:37 ` Zdenek Dvorak
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Guenther @ 2009-09-22 13:08 UTC (permalink / raw)
  To: Rahul Kharche; +Cc: gcc, sdkteam-gnu

On Tue, Sep 22, 2009 at 1:52 PM, Rahul Kharche <rahul@icerasemi.com> wrote:
> The following causes missed loop optimizations in O2 from creating
> unnecessary loop induction variables. Or, is a case of IV opts not
> able to coalesce copies of induction variables. A previous post related
> to this was made in PR41026 which had a type promoted loop index
> variable
> copied. I believe this example makes the problem more obvious.
>
> struct struct_t {
>  int* data;
> };
>
> void testAutoIncStruct (struct struct_t* sp, int start, int end)
> {
>    int i;
>    for (i = 0; i+start < end; i++)
>      {
>        sp->data[i+start] = 0;
>      }
> }
>
> With GCC v4.4.1 release) and gcc -O2 -fdump-tree-all on the above case
> we get the following dump from IVOpts
>
> testAutoIncStruct (struct struct_t * sp, int start, int end)
> {
>  unsigned int D.1283;
>  unsigned int D.1284;
>  int D.1282;
>  unsigned int ivtmp.32;
>  int * pretmp.17;
>  int i;
>  int * D.1245;
>  unsigned int D.1244;
>  unsigned int D.1243;
>
> <bb 2>:
>  if (start_3(D) < end_5(D))
>    goto <bb 3>;
>  else
>    goto <bb 6>;
>
> <bb 3>:
>  pretmp.17_22 = sp_6(D)->data;
>  D.1282_23 = start_3(D) + 1;
>  ivtmp.32_25 = (unsigned int) D.1282_23;
>  D.1283_27 = (unsigned int) end_5(D);
>  D.1284_28 = D.1283_27 + 1;
>
> <bb 4>:
>  # start_20 = PHI <start_4(5), start_3(D)(3)>
>  # ivtmp.32_7 = PHI <ivtmp.32_24(5), ivtmp.32_25(3)>
>  D.1243_9 = (unsigned int) start_20;
>  D.1244_10 = D.1243_9 * 4;
>  D.1245_11 = pretmp.17_22 + D.1244_10;
>  *D.1245_11 = 0;
>  start_26 = (int) ivtmp.32_7;
>  start_4 = start_26;
>  ivtmp.32_24 = ivtmp.32_7 + 1;
>  if (ivtmp.32_24 != D.1284_28)
>    goto <bb 5>;
>  else
>    goto <bb 6>;
>
> <bb 5>:
>  goto <bb 4>;
>
> <bb 6>:
>  return;
>
> }
>
> IVOpts cannot identify start_26, start_4 and ivtmp_32_7 to be copies.
> The root cause is that expression 'i + start' is identified as a common
> expression between the test in the header and the index operation in the
> latch. This is unified by copy propagation or FRE prior to loop
> optimizations
> and creates a new induction variable.
>
> If we disable tree copy propagation and FRE with
> gcc -O2 -fno-tree-copy-prop -fno-tree-fre -fdump-tree-all we get
>
> testAutoIncStruct (struct struct_t * sp, int start, int end)
> {
>  unsigned int D.1287;
>  unsigned int D.1288;
>  unsigned int D.1289;
>  int D.1290;
>  unsigned int D.1284;
>  unsigned int D.1285;
>  unsigned int D.1286;
>  int * pretmp.17;
>  int i;
>  int * D.1245;
>  unsigned int D.1244;
>  unsigned int D.1243;
>  int D.1242;
>  int * D.1241;
>
> <bb 2>:
>  if (start_3(D) < end_5(D))
>    goto <bb 3>;
>  else
>    goto <bb 6>;
>
> <bb 3>:
>  pretmp.17_23 = sp_6(D)->data;
>  D.1287_27 = (unsigned int) end_5(D);
>  D.1288_28 = (unsigned int) start_3(D);
>  D.1289_29 = D.1287_27 - D.1288_28;
>  D.1290_30 = (int) D.1289_29;
>
> <bb 4>:
>  # i_20 = PHI <i_12(5), 0(3)>
>  D.1241_7 = pretmp.17_23;
>  D.1284_26 = (unsigned int) start_3(D);
>  D.1285_25 = (unsigned int) i_20;
>  D.1286_24 = D.1284_26 + D.1285_25;
>  MEM[base: pretmp.17_23, index: D.1286_24, step: 4] = 0;
>  i_12 = i_20 + 1;
>  if (i_12 != D.1290_30)
>    goto <bb 5>;
>  else
>    goto <bb 6>;
>
> <bb 5>:
>  goto <bb 4>;
>
> <bb 6>:
>  return;
>
> }
>
> The correct single induction variable as been identified here. This is
> not
> a loop header copying problem either. If we disable loop header copying,
> we
> still get multiple induction variables created. In fact in the above
> case
> loop header copying correctly enables post-increment mode on our port.
>
> testAutoIncStruct (struct struct_t * sp, int start, int end)
> {
>  unsigned int D.1282;
>  unsigned int ivtmp.31;
>  unsigned int ivtmp.29;
>  int i;
>  int * D.1245;
>  unsigned int D.1244;
>  unsigned int D.1243;
>  int D.1242;
>  int * D.1241;
>
> <bb 2>:
>  ivtmp.29_18 = (unsigned int) start_3(D);
>  D.1282_21 = (unsigned int) start_3(D);
>  ivtmp.31_22 = D.1282_21 * 4;
>  goto <bb 4>;
>
> <bb 3>:
>  D.1241_7 = sp_6(D)->data;
>  D.1244_10 = ivtmp.31_19;
>  D.1245_11 = D.1241_7 + D.1244_10;
>  *D.1245_11 = 0;
>  ivtmp.29_17 = ivtmp.29_8 + 1;
>  ivtmp.31_20 = ivtmp.31_19 + 4;
>
> <bb 4>:
>  # ivtmp.29_8 = PHI <ivtmp.29_18(2), ivtmp.29_17(3)>
>  # ivtmp.31_19 = PHI <ivtmp.31_22(2), ivtmp.31_20(3)>
>  D.1242_23 = (int) ivtmp.29_8;
>  if (D.1242_23 < end_5(D))
>    goto <bb 3>;
>  else
>    goto <bb 5>;
>
> <bb 5>:
>  return;
>
> }
>
> Does this imply we try and not copy propagate or FRE potential induction
> variables? Or is this simply a missed case in IVOpts?

Well, the situation is more complicated.  Initially FRE removes a
full redundancy which loop-header copying turns into a loop
carried dependency.  So for example moving loop header copying
before FRE avoids this issue as well (which would be a good idea
for other reasons as well).  So, at the point of FRE there is no
way to avoid the situation with the current pass ordering.

Richard.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: missed loop optimizations from loop induction variable  copies
  2009-09-22 11:53 RFC: missed loop optimizations from loop induction variable copies Rahul Kharche
  2009-09-22 13:08 ` Richard Guenther
@ 2009-09-23 10:37 ` Zdenek Dvorak
  1 sibling, 0 replies; 3+ messages in thread
From: Zdenek Dvorak @ 2009-09-23 10:37 UTC (permalink / raw)
  To: Rahul Kharche; +Cc: gcc, sdkteam-gnu, sebastian.pop

Hi,

<snip>

> IVOpts cannot identify start_26, start_4 and ivtmp_32_7 to be copies.
> The root cause is that expression 'i + start' is identified as a common
> expression between the test in the header and the index operation in the
> latch. This is unified by copy propagation or FRE prior to loop
> optimizations
> and creates a new induction variable.
> 
> 
> Does this imply we try and not copy propagate or FRE potential induction
> variables? Or is this simply a missed case in IVOpts?

IIRC, at some point maybe year or two ago Sebastian worked on enhancing
scev to analyze such induction variables (thus enabling IVopts to handle them).
But it seems the code did not make it to mainline,

Zdenek

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-09-23 10:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-22 11:53 RFC: missed loop optimizations from loop induction variable copies Rahul Kharche
2009-09-22 13:08 ` Richard Guenther
2009-09-23 10:37 ` Zdenek Dvorak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).