public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch] Fix PR tree-optimization/49771
@ 2011-07-19  8:24 Ira Rosen
  2011-07-19  9:49 ` Richard Guenther
  2011-07-20 19:47 ` Ulrich Weigand
  0 siblings, 2 replies; 35+ messages in thread
From: Ira Rosen @ 2011-07-19  8:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: Patch Tracking

Hi,

The vectorizer performs the following alias checks for data-refs with
unknown dependence:

((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
  || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))

where segment_length is data-ref's step in the loop multiplied by the
loop's number of iterations (in the general case). For invariant
data-refs segment_length is 0, since the step is 0. This creates
incorrect check for:

  for (i = 0; i < 1000; i++)
    for (j = 0; j < 1000; j++)
       a[j] = a[i] + 1;

We check:

&a + 4000 <= &a + i*4
|| &a + i*4 <= &a

and the second check is wrong for i=0.

This patch makes segment_length to be sizeof (data-ref type) in case
of zero step, changing the checks into

&a + 4000 <= &a + i*4
|| &a + i*4 + 4 <= &a

Bootstrapped and tested on powerpc64-suse-linux.
Committed revision 176434.

Ira

ChangeLog:

   PR tree-optimization/49771
   * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of
   zero step, set segment length to the size of the data-ref's type.

testsuite/ChangeLog:

   PR tree-optimization/49771
   * gcc.dg/vect/pr49771.c: New test.

Index: tree-vect-loop-manip.c
===================================================================
--- tree-vect-loop-manip.c      (revision 176433)
+++ tree-vect-loop-manip.c      (working copy)
@@ -2356,9 +2356,14 @@ static tree
 vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
 {
   tree segment_length;
-  segment_length = size_binop (MULT_EXPR,
-                              fold_convert (sizetype, DR_STEP (dr)),
-                              fold_convert (sizetype, length_factor));
+
+  if (!compare_tree_int (DR_STEP (dr), 0))
+    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
+  else
+    segment_length = size_binop (MULT_EXPR,
+                                 fold_convert (sizetype, DR_STEP (dr)),
+                                 fold_convert (sizetype, length_factor));
+
   if (vect_supportable_dr_alignment (dr, false)
         == dr_explicit_realign_optimized)
     {
Index: testsuite/gcc.dg/vect/pr49771.c
===================================================================
--- testsuite/gcc.dg/vect/pr49771.c     (revision 0)
+++ testsuite/gcc.dg/vect/pr49771.c     (revision 0)
@@ -0,0 +1,26 @@
+#include <stdlib.h>
+#include <stdarg.h>
+
+static int a[1000];
+
+int
+foo (void)
+{
+  int j;
+  int i;
+  for (i = 0; i < 1000; i++)
+    for (j = 0; j < 1000; j++)
+      a[j] = a[i] + 1;
+  return a[0];
+}
+
+int
+main (void)
+{
+  int res = foo ();
+  if (res != 1999)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-19  8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen
@ 2011-07-19  9:49 ` Richard Guenther
  2011-07-19 14:01   ` Ira Rosen
  2011-07-20 19:47 ` Ulrich Weigand
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-19  9:49 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> Hi,
>
> The vectorizer performs the following alias checks for data-refs with
> unknown dependence:
>
> ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
>  || (load_ptr_0 + load_segment_length_0) <= store_ptr_0))
>
> where segment_length is data-ref's step in the loop multiplied by the
> loop's number of iterations (in the general case). For invariant
> data-refs segment_length is 0, since the step is 0. This creates
> incorrect check for:
>
>  for (i = 0; i < 1000; i++)
>    for (j = 0; j < 1000; j++)
>       a[j] = a[i] + 1;
>
> We check:
>
> &a + 4000 <= &a + i*4
> || &a + i*4 <= &a
>
> and the second check is wrong for i=0.
>
> This patch makes segment_length to be sizeof (data-ref type) in case
> of zero step, changing the checks into
>
> &a + 4000 <= &a + i*4
> || &a + i*4 + 4 <= &a
>
> Bootstrapped and tested on powerpc64-suse-linux.
> Committed revision 176434.
>
> Ira
>
> ChangeLog:
>
>   PR tree-optimization/49771
>   * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of
>   zero step, set segment length to the size of the data-ref's type.
>
> testsuite/ChangeLog:
>
>   PR tree-optimization/49771
>   * gcc.dg/vect/pr49771.c: New test.
>
> Index: tree-vect-loop-manip.c
> ===================================================================
> --- tree-vect-loop-manip.c      (revision 176433)
> +++ tree-vect-loop-manip.c      (working copy)
> @@ -2356,9 +2356,14 @@ static tree
>  vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>  {
>   tree segment_length;
> -  segment_length = size_binop (MULT_EXPR,
> -                              fold_convert (sizetype, DR_STEP (dr)),
> -                              fold_convert (sizetype, length_factor));
> +
> +  if (!compare_tree_int (DR_STEP (dr), 0))

integer_zerop (DR_STEP (dr))

> +    segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> +  else
> +    segment_length = size_binop (MULT_EXPR,
> +                                 fold_convert (sizetype, DR_STEP (dr)),
> +                                 fold_convert (sizetype, length_factor));
> +
>   if (vect_supportable_dr_alignment (dr, false)
>         == dr_explicit_realign_optimized)
>     {
> Index: testsuite/gcc.dg/vect/pr49771.c
> ===================================================================
> --- testsuite/gcc.dg/vect/pr49771.c     (revision 0)
> +++ testsuite/gcc.dg/vect/pr49771.c     (revision 0)
> @@ -0,0 +1,26 @@
> +#include <stdlib.h>
> +#include <stdarg.h>
> +
> +static int a[1000];
> +
> +int
> +foo (void)
> +{
> +  int j;
> +  int i;
> +  for (i = 0; i < 1000; i++)
> +    for (j = 0; j < 1000; j++)
> +      a[j] = a[i] + 1;
> +  return a[0];
> +}
> +
> +int
> +main (void)
> +{
> +  int res = foo ();
> +  if (res != 1999)
> +    abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-19  9:49 ` Richard Guenther
@ 2011-07-19 14:01   ` Ira Rosen
  2011-07-19 14:03     ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ira Rosen @ 2011-07-19 14:01 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Patch Tracking

On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
;
>> +
>> +  if (!compare_tree_int (DR_STEP (dr), 0))
>
> integer_zerop (DR_STEP (dr))
>

Right. I'll change this with some other opportunity, if that's ok.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-19 14:01   ` Ira Rosen
@ 2011-07-19 14:03     ` Richard Guenther
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Guenther @ 2011-07-19 14:03 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

On Tue, Jul 19, 2011 at 3:27 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> ;
>>> +
>>> +  if (!compare_tree_int (DR_STEP (dr), 0))
>>
>> integer_zerop (DR_STEP (dr))
>>
>
> Right. I'll change this with some other opportunity, if that's ok.

Sure.

Thanks,
Richard.

> Thanks,
> Ira
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-19  8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen
  2011-07-19  9:49 ` Richard Guenther
@ 2011-07-20 19:47 ` Ulrich Weigand
  2011-07-21 12:54   ` Ira Rosen
  1 sibling, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-20 19:47 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

Ira Rosen wrote:

>    PR tree-optimization/49771
>    * gcc.dg/vect/pr49771.c: New test.

This test fails (with wrong code) on spu-elf ...

> +int
> +foo (void)
> +{
> +  int j;
> +  int i;
> +  for (i = 0; i < 1000; i++)
> +    for (j = 0; j < 1000; j++)
> +      a[j] = a[i] + 1;
> +  return a[0];
> +}
> +
> +int
> +main (void)
> +{
> +  int res = foo ();
> +  if (res != 1999)
> +    abort ();
> +  return 0;
> +}

The return value of foo with vectorization is 1249 instead
of 1999 for some reason.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-20 19:47 ` Ulrich Weigand
@ 2011-07-21 12:54   ` Ira Rosen
  2011-07-24 14:32     ` Ira Rosen
  0 siblings, 1 reply; 35+ messages in thread
From: Ira Rosen @ 2011-07-21 12:54 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, Patch Tracking

On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Ira Rosen wrote:
>
>>    PR tree-optimization/49771
>>    * gcc.dg/vect/pr49771.c: New test.
>
> This test fails (with wrong code) on spu-elf ...
>
>> +int
>> +foo (void)
>> +{
>> +  int j;
>> +  int i;
>> +  for (i = 0; i < 1000; i++)
>> +    for (j = 0; j < 1000; j++)
>> +      a[j] = a[i] + 1;
>> +  return a[0];
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int res = foo ();
>> +  if (res != 1999)
>> +    abort ();
>> +  return 0;
>> +}
>
> The return value of foo with vectorization is 1249 instead
> of 1999 for some reason.

I reproduced the failure. It occurs without Richard's
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
patches too. Obviously the vectorized loop is executed, but at the
moment I don't understand why. I'll have a better look on Sunday.

Ira

>
> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-21 12:54   ` Ira Rosen
@ 2011-07-24 14:32     ` Ira Rosen
  2011-07-24 18:46       ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ira Rosen @ 2011-07-24 14:32 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, Patch Tracking

On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>
>> The return value of foo with vectorization is 1249 instead
>> of 1999 for some reason.
>
> I reproduced the failure. It occurs without Richard's
> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> patches too. Obviously the vectorized loop is executed, but at the
> moment I don't understand why. I'll have a better look on Sunday.

Actually it doesn't choose the vectorized code. But the scalar version
gets optimized in a harmful way for SPU, AFAIU.
Here is the scalar loop after vrp2

<bb 8>:
  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
  D.4593_42 = (void *) ivtmp.53_32;
  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
  D.4521_34 = D.4520_33 + 1;
  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
  ivtmp.42_45 = ivtmp.42_50 + 4;
  if (ivtmp.42_45 != 16)
    goto <bb 10>;
  else
    goto <bb 5>;

and the load is changed by dom2 to:

<bb 4>:
  ...
  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
   ...

where vector(4) int * vect_pa.9;

And the scalar loop has no rotate for that load:

.L3:
        lqd     $13,0($2)
        lqx     $11,$5,$3
        cwx     $7,$sp,$3
        ai      $12,$13,1
        shufb   $6,$12,$11,$7
        stqx    $6,$5,$3
        ai      $3,$3,4
        ceqi    $4,$3,16


I manually added rotqby for $13 and the result was correct (I changed
the test to iterate only 4 times to make the things easier).

Ira

>
> Ira
>
>>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  Ulrich.Weigand@de.ibm.com
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-24 14:32     ` Ira Rosen
@ 2011-07-24 18:46       ` Richard Guenther
  2011-07-25  9:44         ` Ulrich Weigand
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-24 18:46 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>
>>> The return value of foo with vectorization is 1249 instead
>>> of 1999 for some reason.
>>
>> I reproduced the failure. It occurs without Richard's
>> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> patches too. Obviously the vectorized loop is executed, but at the
>> moment I don't understand why. I'll have a better look on Sunday.
>
> Actually it doesn't choose the vectorized code. But the scalar version
> gets optimized in a harmful way for SPU, AFAIU.
> Here is the scalar loop after vrp2
>
> <bb 8>:
>  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>  D.4593_42 = (void *) ivtmp.53_32;
>  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>  D.4521_34 = D.4520_33 + 1;
>  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>  ivtmp.42_45 = ivtmp.42_50 + 4;
>  if (ivtmp.42_45 != 16)
>    goto <bb 10>;
>  else
>    goto <bb 5>;
>
> and the load is changed by dom2 to:
>
> <bb 4>:
>  ...
>  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>   ...
>
> where vector(4) int * vect_pa.9;
>
> And the scalar loop has no rotate for that load:

Hum.  This smells like we are hiding sth from the tree optimizers?

> .L3:
>        lqd     $13,0($2)
>        lqx     $11,$5,$3
>        cwx     $7,$sp,$3
>        ai      $12,$13,1
>        shufb   $6,$12,$11,$7
>        stqx    $6,$5,$3
>        ai      $3,$3,4
>        ceqi    $4,$3,16
>
>
> I manually added rotqby for $13 and the result was correct (I changed
> the test to iterate only 4 times to make the things easier).
>
> Ira
>
>>
>> Ira
>>
>>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  Ulrich.Weigand@de.ibm.com
>>>
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-24 18:46       ` Richard Guenther
@ 2011-07-25  9:44         ` Ulrich Weigand
  2011-07-25 10:08           ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-25  9:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking

Richard Guenther wrote:
> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
> >> I reproduced the failure. It occurs without Richard's
> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> >> patches too. Obviously the vectorized loop is executed, but at the
> >> moment I don't understand why. I'll have a better look on Sunday.
> >
> > Actually it doesn't choose the vectorized code. But the scalar version
> > gets optimized in a harmful way for SPU, AFAIU.
> > Here is the scalar loop after vrp2
> >
> > <bb 8>:
> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
> >  D.4593_42 = (void *) ivtmp.53_32;
> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
> >  D.4521_34 = D.4520_33 + 1;
> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
> >  ivtmp.42_45 = ivtmp.42_50 + 4;
> >  if (ivtmp.42_45 != 16)
> >    goto <bb 10>;
> >  else
> >    goto <bb 5>;
> >
> > and the load is changed by dom2 to:
> >
> > <bb 4>:
> >  ...
> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
> >   ...
> >
> > where vector(4) int * vect_pa.9;
> >
> > And the scalar loop has no rotate for that load:
> 
> Hum.  This smells like we are hiding sth from the tree optimizers?

Well, the back-end assumes a pointer to vector type is always
naturally aligned, and therefore the data it points to can be
accessed via a simple load, with no extra rotate needed.

It seems what happened here is that somehow, a pointer to int
gets replaced by a pointer to vector, even though their alignment
properties are different.

This vector pointer must originate somehow in the vectorizer,
however, since the original C source does not contain any
vector types at all ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25  9:44         ` Ulrich Weigand
@ 2011-07-25 10:08           ` Richard Guenther
  2011-07-25 11:26             ` Ira Rosen
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 10:08 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>> >> I reproduced the failure. It occurs without Richard's
>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> >> patches too. Obviously the vectorized loop is executed, but at the
>> >> moment I don't understand why. I'll have a better look on Sunday.
>> >
>> > Actually it doesn't choose the vectorized code. But the scalar version
>> > gets optimized in a harmful way for SPU, AFAIU.
>> > Here is the scalar loop after vrp2
>> >
>> > <bb 8>:
>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>> >  D.4593_42 = (void *) ivtmp.53_32;
>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>> >  D.4521_34 = D.4520_33 + 1;
>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>> >  if (ivtmp.42_45 != 16)
>> >    goto <bb 10>;
>> >  else
>> >    goto <bb 5>;
>> >
>> > and the load is changed by dom2 to:
>> >
>> > <bb 4>:
>> >  ...
>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>> >   ...
>> >
>> > where vector(4) int * vect_pa.9;
>> >
>> > And the scalar loop has no rotate for that load:
>>
>> Hum.  This smells like we are hiding sth from the tree optimizers?
>
> Well, the back-end assumes a pointer to vector type is always
> naturally aligned, and therefore the data it points to can be
> accessed via a simple load, with no extra rotate needed.

I can't see any use of VECTOR_TYPE in config/spu/, and assuming
anything about alignment just because of the kind of the pointer
is bogus - the scalar code does a scalar read using that pointer.
So the backend better should look at the memory operation, not
at the pointer type.  That said, I can't find any code that looks
suspicious in the spu backend.

> It seems what happened here is that somehow, a pointer to int
> gets replaced by a pointer to vector, even though their alignment
> properties are different.

No, they are not.  They get replaced if they are value-equivalent
in which case they are also alignment-equivalent.  But well, the
dump snippet wasn't complete and I don't feel like building a
SPU cross to verify myself.

> This vector pointer must originate somehow in the vectorizer,
> however, since the original C source does not contain any
> vector types at all ...

That's for sure true, it must be the initial pointer we then increment
in the vectorized loop.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 10:08           ` Richard Guenther
@ 2011-07-25 11:26             ` Ira Rosen
  2011-07-25 11:41               ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ira Rosen @ 2011-07-25 11:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> Richard Guenther wrote:
>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> >> I reproduced the failure. It occurs without Richard's
>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>> >
>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>> > gets optimized in a harmful way for SPU, AFAIU.
>>> > Here is the scalar loop after vrp2
>>> >
>>> > <bb 8>:
>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>> >  D.4521_34 = D.4520_33 + 1;
>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>> >  if (ivtmp.42_45 != 16)
>>> >    goto <bb 10>;
>>> >  else
>>> >    goto <bb 5>;
>>> >
>>> > and the load is changed by dom2 to:
>>> >
>>> > <bb 4>:
>>> >  ...
>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>> >   ...
>>> >
>>> > where vector(4) int * vect_pa.9;
>>> >
>>> > And the scalar loop has no rotate for that load:
>>>
>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>
>> Well, the back-end assumes a pointer to vector type is always
>> naturally aligned, and therefore the data it points to can be
>> accessed via a simple load, with no extra rotate needed.
>
> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> anything about alignment just because of the kind of the pointer
> is bogus - the scalar code does a scalar read using that pointer.
> So the backend better should look at the memory operation, not
> at the pointer type.  That said, I can't find any code that looks
> suspicious in the spu backend.
>
>> It seems what happened here is that somehow, a pointer to int
>> gets replaced by a pointer to vector, even though their alignment
>> properties are different.
>
> No, they are not.  They get replaced if they are value-equivalent
> in which case they are also alignment-equivalent.  But well, the
> dump snippet wasn't complete and I don't feel like building a
> SPU cross to verify myself.

I am attaching the complete file.

Thanks,
Ira



>
>> This vector pointer must originate somehow in the vectorizer,
>> however, since the original C source does not contain any
>> vector types at all ...
>
> That's for sure true, it must be the initial pointer we then increment
> in the vectorized loop.
>
> Richard.
>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  Ulrich.Weigand@de.ibm.com
>>
>

[-- Attachment #2: my--pr49771.c.124t.dom2 --]
[-- Type: application/octet-stream, Size: 19190 bytes --]


;; Function foo (foo, funcdef_no=0, decl_uid=2839, cgraph_uid=0)

;; 3 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 10 5 6 9 7 8
;;
;; Loop 1
;;  header 7, latch 9
;;  depth 1, outer 0
;;  nodes: 7 9 6 5 4 3 10
;;
;; Loop 2
;;  header 4, latch 10
;;  depth 2, outer 1
;;  nodes: 4 10
;; 2 succs { 7 }
;; 3 succs { 4 }
;; 4 succs { 10 6 }
;; 10 succs { 4 }
;; 5 succs { 6 }
;; 6 succs { 9 8 }
;; 9 succs { 7 }
;; 7 succs { 3 5 }
;; 8 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement ivtmp.53_37 = (long unsigned int) &a;
LKUP STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
2>>> STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
Optimizing statement a.55_39 = (long unsigned int) &a;
LKUP STMT a.55_39 = nop_expr &a
          a.55_39 = (long unsigned int) &a;
FIND: ivtmp.53_37
  Replaced redundant expr '(long unsigned int) &a' with 'ivtmp.53_37'
==== ASGN a.55_39 = ivtmp.53_37
Optimizing statement D.4591_40 = a.55_39 + 16;
  Replaced 'a.55_39' with variable 'ivtmp.53_37'
LKUP STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;
2>>> STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;


Optimizing block #7

Optimizing statement vect_pa.12_24 = &a;
LKUP STMT vect_pa.12_24 = &a
LKUP STMT vect_pa.12_24 = &a
          vect_pa.12_24 = &a;
==== ASGN vect_pa.12_24 = &a
Optimizing statement D.4589_36 = ivtmp.53_32 + 16;
LKUP STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
2>>> STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
Optimizing statement D.4540_11 = (vector(4) int *) D.4589_36;
LKUP STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
2>>> STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
Optimizing statement D.4541_26 = D.4540_11 <= &a;
LKUP STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
2>>> STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
Optimizing statement D.4542_27 = &MEM[(void *)&a + 16B];
LKUP STMT D.4542_27 = &MEM[(void *)&a + 16B]
          D.4542_27 = &MEM[(void *)&a + 16B];
==== ASGN D.4542_27 = &MEM[(void *)&a + 16B]
Optimizing statement vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
2>>> STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
Optimizing statement D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
2>>> STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
Optimizing statement D.4544_29 = D.4541_26 & D.4543_28;
LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
2>>> STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
Optimizing statement if (D.4544_29 != 0)
LKUP STMT D.4544_29 ne_expr 0
          if (D.4544_29 != 0)


Optimizing block #3

0>>> COPY D.4544_29 = 0
Optimizing statement ivtmp.42_59 = 0;
LKUP STMT ivtmp.42_59 = 0
          ivtmp.42_59 = 0;
==== ASGN ivtmp.42_59 = 0


Optimizing block #4

Optimizing statement D.4593_42 = (void *) ivtmp.53_32;
LKUP STMT D.4593_42 = nop_expr ivtmp.53_32
          D.4593_42 = (void *) ivtmp.53_32;
FIND: vect_pa.9_19
  Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19'
==== ASGN D.4593_42 = vect_pa.9_19
Optimizing statement D.4520_33 = MEM[base: D.4593_42, offset: 0B];
  Replaced 'D.4593_42' with variable 'vect_pa.9_19'
LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
2>>> STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
Optimizing statement D.4521_34 = D.4520_33 + 1;
LKUP STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
2>>> STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
Optimizing statement MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
LKUP STMT MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34
          MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
2>>> STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
Optimizing statement ivtmp.42_45 = ivtmp.42_50 + 4;
LKUP STMT ivtmp.42_45 = ivtmp.42_50 plus_expr 4
          ivtmp.42_45 = ivtmp.42_50 + 4;
Optimizing statement if (ivtmp.42_45 != 16)
LKUP STMT ivtmp.42_45 ne_expr 16
          if (ivtmp.42_45 != 16)


Optimizing block #10

1>>> COND 1 = ivtmp.42_45 ne_expr 16
1>>> COND 0 = ivtmp.42_45 eq_expr 16
<<<< COND 0 = ivtmp.42_45 eq_expr 16
<<<< COND 1 = ivtmp.42_45 ne_expr 16
0>>> COPY ivtmp.42_45 = 16
1>>> COND 1 = ivtmp.42_45 le_expr 16
1>>> COND 1 = ivtmp.42_45 ge_expr 16
1>>> COND 1 = ivtmp.42_45 eq_expr 16
1>>> COND 0 = ivtmp.42_45 ne_expr 16
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)
<<<< COND 0 = ivtmp.42_45 ne_expr 16
<<<< COND 1 = ivtmp.42_45 eq_expr 16
<<<< COND 1 = ivtmp.42_45 ge_expr 16
<<<< COND 1 = ivtmp.42_45 le_expr 16
<<<< STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
<<<< STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
<<<< STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
<<<< COPY D.4544_29 = 0
LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
LKUP STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;


Optimizing block #5

0>>> COPY D.4544_29 = 1
Optimizing statement vect_cst_.29_56 = { 1, 1, 1, 1 };
LKUP STMT vect_cst_.29_56 = { 1, 1, 1, 1 }
          vect_cst_.29_56 = { 1, 1, 1, 1 };
==== ASGN vect_cst_.29_56 = { 1, 1, 1, 1 }
Optimizing statement vect_pa.33_58 = &a;
LKUP STMT vect_pa.33_58 = &a
          vect_pa.33_58 = &a;
==== ASGN vect_pa.33_58 = &a
Optimizing statement D.4592_41 = (void *) ivtmp.53_32;
LKUP STMT D.4592_41 = nop_expr ivtmp.53_32
          D.4592_41 = (void *) ivtmp.53_32;
FIND: vect_pa.9_19
  Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19'
==== ASGN D.4592_41 = vect_pa.9_19
Optimizing statement D.4520_5 = MEM[base: D.4592_41, offset: 0B];
  Replaced 'D.4592_41' with variable 'vect_pa.9_19'
LKUP STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
2>>> STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
Optimizing statement vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
LKUP STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
2>>> STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
Optimizing statement vect_var_.28_57 = vect_cst_.27_55 + vect_cst_.29_56;
  Replaced 'vect_cst_.29_56' with constant '{ 1, 1, 1, 1 }'
LKUP STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
2>>> STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
Optimizing statement MEM[(int[4] *)&a] = vect_var_.28_57;
LKUP STMT MEM[(int[4] *)&a] = vect_var_.28_57
          MEM[(int[4] *)&a] = vect_var_.28_57;
LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
2>>> STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
Optimizing statement vect_pa.30_60 = &MEM[(void *)&a + 16B];
LKUP STMT vect_pa.30_60 = &MEM[(void *)&a + 16B]
          vect_pa.30_60 = &MEM[(void *)&a + 16B];
==== ASGN vect_pa.30_60 = &MEM[(void *)&a + 16B]
Optimizing statement ivtmp.34_62 = 1;
LKUP STMT ivtmp.34_62 = 1
          ivtmp.34_62 = 1;
==== ASGN ivtmp.34_62 = 1
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)
<<<< STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
<<<< STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
<<<< STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
<<<< STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];


Optimizing block #6

Optimizing statement ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
Optimizing statement if (ivtmp.53_30 != D.4591_40)
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)


Optimizing block #8

0>>> COPY ivtmp.53_30 = D.4591_40
1>>> COND 1 = ivtmp.53_30 le_expr D.4591_40
1>>> COND 1 = ivtmp.53_30 ge_expr D.4591_40
1>>> COND 1 = ivtmp.53_30 eq_expr D.4591_40
1>>> COND 0 = ivtmp.53_30 ne_expr D.4591_40
Optimizing statement D.4522_9 = a[0];
LKUP STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
2>>> STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
Optimizing statement return D.4522_9;
<<<< STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
<<<< COND 0 = ivtmp.53_30 ne_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 ge_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 le_expr D.4591_40
<<<< COPY ivtmp.53_30 = D.4591_40


Optimizing block #9

1>>> COND 1 = ivtmp.53_30 ne_expr D.4591_40
1>>> COND 0 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 0 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 ne_expr D.4591_40
<<<< STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
<<<< STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
<<<< STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
<<<< STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
<<<< STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
<<<< STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
LKUP STMT D.4589_36 = ivtmp.53_37 plus_expr 16
          D.4589_36 = ivtmp.53_37 + 16;
FIND: D.4591_40
LKUP STMT D.4540_11 = nop_expr D.4591_40
          D.4540_11 = (vector(4) int *) D.4591_40;
LKUP STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_37
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_37;
LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
LKUP STMT D.4544_29 ne_expr 0
          if (D.4544_29 != 0)
<<<< STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;
<<<< STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
Removing basic block 9
;; basic block 9, loop depth 1, count 0
;; prev block 6, next block 7
;; pred:
;; succ:       7 [100.0%]  (fallthru,dfs_back)
<bb 9>:
Invalid sum of incoming frequencies 0, should be 1500


Removing basic block 10
;; basic block 10, loop depth 2, count 0
;; prev block 4, next block 5
;; pred:
;; succ:       4 [100.0%]  (fallthru,dfs_back)
<bb 10>:
Invalid sum of incoming frequencies 0, should be 1200
goto <bb 4>;


Scope blocks after cleanups:

{ Scope block #0

}
foo ()
{
  void * D.4593;
  void * D.4592;
  long unsigned int a.55;
  long unsigned int D.4591;
  long unsigned int D.4589;
  long unsigned int ivtmp.53;
  sizetype ivtmp.42;
  unsigned int ivtmp.34;
  vector(4) int * vect_pa.33;
  vector(4) int * vect_pa.30;
  vector(4) int vect_cst_.29;
  vector(4) int vect_var_.28;
  vector(4) int vect_cst_.27;
  vector(4) int * D.4540;
  _Bool D.4541;
  vector(4) int * D.4542;
  _Bool D.4543;
  _Bool D.4544;
  vector(4) int * vect_pa.12;
  vector(4) int * vect_pa.9;
  int D.4522;
  int D.4521;
  int D.4520;

<bb 2>:
  ivtmp.53_37 = (long unsigned int) &a;
  a.55_39 = ivtmp.53_37;
  D.4591_40 = ivtmp.53_37 + 16;
  goto <bb 7>;

<bb 3>:
  ivtmp.42_59 = 0;

<bb 4>:
  # ivtmp.42_50 = PHI <0(3), ivtmp.42_45(4)>
  D.4593_42 = vect_pa.9_19;
  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
  D.4521_34 = D.4520_33 + 1;
  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
  ivtmp.42_45 = ivtmp.42_50 + 4;
  if (ivtmp.42_45 != 16)
    goto <bb 4>;
  else
    goto <bb 6>;

<bb 5>:
  vect_cst_.29_56 = { 1, 1, 1, 1 };
  vect_pa.33_58 = &a;
  D.4592_41 = vect_pa.9_19;
  D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
  vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
  vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
  MEM[(int[4] *)&a] = vect_var_.28_57;
  vect_pa.30_60 = &MEM[(void *)&a + 16B];
  ivtmp.34_62 = 1;

<bb 6>:
  ivtmp.53_30 = ivtmp.53_32 + 4;
  if (ivtmp.53_30 != D.4591_40)
    goto <bb 7>;
  else
    goto <bb 8>;

<bb 7>:
  D.4589_36 = ivtmp.53_32 + 16;
  D.4540_11 = (vector(4) int *) D.4589_36;
  D.4541_26 = D.4540_11 <= &a;
  D.4542_27 = &MEM[(void *)&a + 16B];
  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
  D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
  D.4544_29 = D.4541_26 & D.4543_28;
  if (D.4544_29 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

<bb 8>:
  D.4522_9 = a[0];
  return D.4522_9;

}



;; Function main (main, funcdef_no=1, decl_uid=2850, cgraph_uid=1) (executed once)

;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5 7 6
;;
;; Loop 1
;;  header 5, latch 7
;;  depth 1, outer 0
;;  nodes: 5 7
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 5 }
;; 5 succs { 7 6 }
;; 7 succs { 5 }
;; 6 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement a[0] = 0;
LKUP STMT a[0] = 0
          a[0] = 0;
LKUP STMT 0 = a[0]
          0 = a[0];
LKUP STMT 0 = a[0]
          0 = a[0];
2>>> STMT 0 = a[0]
          0 = a[0];
Optimizing statement a[1] = 0;
LKUP STMT a[1] = 0
          a[1] = 0;
LKUP STMT 0 = a[1]
          0 = a[1];
LKUP STMT 0 = a[1]
          0 = a[1];
2>>> STMT 0 = a[1]
          0 = a[1];
Optimizing statement a[2] = 0;
LKUP STMT a[2] = 0
          a[2] = 0;
LKUP STMT 0 = a[2]
          0 = a[2];
LKUP STMT 0 = a[2]
          0 = a[2];
2>>> STMT 0 = a[2]
          0 = a[2];
Optimizing statement a[3] = 0;
LKUP STMT a[3] = 0
          a[3] = 0;
LKUP STMT 0 = a[3]
          0 = a[3];
LKUP STMT 0 = a[3]
          0 = a[3];
2>>> STMT 0 = a[3]
          0 = a[3];
Optimizing statement res_5 = foo ();
Optimizing statement if (res_5 != 31)
LKUP STMT res_5 ne_expr 31
          if (res_5 != 31)


Optimizing block #3

1>>> COND 1 = res_5 ne_expr 31
1>>> COND 0 = res_5 eq_expr 31
Optimizing statement printf ("%d\n", res_5);
<<<< COND 0 = res_5 eq_expr 31
<<<< COND 1 = res_5 ne_expr 31


Optimizing block #4



Optimizing block #5

Optimizing statement D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
LKUP STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
2>>> STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
Optimizing statement printf ("%d ", D.4517_7);
Optimizing statement ivtmp.67_12 = ivtmp.67_26 + 4;
LKUP STMT ivtmp.67_12 = ivtmp.67_26 plus_expr 4
          ivtmp.67_12 = ivtmp.67_26 + 4;
Optimizing statement if (ivtmp.67_12 != 16)
LKUP STMT ivtmp.67_12 ne_expr 16
          if (ivtmp.67_12 != 16)


Optimizing block #6

0>>> COPY ivtmp.67_12 = 16
1>>> COND 1 = ivtmp.67_12 le_expr 16
1>>> COND 1 = ivtmp.67_12 ge_expr 16
1>>> COND 1 = ivtmp.67_12 eq_expr 16
1>>> COND 0 = ivtmp.67_12 ne_expr 16
Optimizing statement __builtin_putchar (10);
Optimizing statement return 0;
<<<< COND 0 = ivtmp.67_12 ne_expr 16
<<<< COND 1 = ivtmp.67_12 eq_expr 16
<<<< COND 1 = ivtmp.67_12 ge_expr 16
<<<< COND 1 = ivtmp.67_12 le_expr 16
<<<< COPY ivtmp.67_12 = 16


Optimizing block #7

1>>> COND 1 = ivtmp.67_12 ne_expr 16
1>>> COND 0 = ivtmp.67_12 eq_expr 16
<<<< COND 0 = ivtmp.67_12 eq_expr 16
<<<< COND 1 = ivtmp.67_12 ne_expr 16
<<<< STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
LKUP STMT D.4517_7 = MEM[symbol: a, index: 0, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: 0, offset: 0B];
<<<< STMT 0 = a[3]
          0 = a[3];
<<<< STMT 0 = a[2]
          0 = a[2];
<<<< STMT 0 = a[1]
          0 = a[1];
<<<< STMT 0 = a[0]
          0 = a[0];
Removing basic block 7
;; basic block 7, loop depth 1, count 0
;; prev block 5, next block 6
;; pred:
;; succ:       5 [100.0%]  (fallthru,dfs_back)
<bb 7>:
Invalid sum of incoming frequencies 0, should be 6000
goto <bb 5>;


Scope blocks after cleanups:

{ Scope block #0
  int res;

}
main ()
{
  sizetype ivtmp.67;
  int res;
  int D.4517;

<bb 2>:
  a[0] = 0;
  a[1] = 0;
  a[2] = 0;
  a[3] = 0;
  res_5 = foo ();
  if (res_5 != 31)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  printf ("%d\n", res_5);

<bb 4>:

<bb 5>:
  # ivtmp.67_26 = PHI <ivtmp.67_12(5), 0(4)>
  D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
  printf ("%d ", D.4517_7);
  ivtmp.67_12 = ivtmp.67_26 + 4;
  if (ivtmp.67_12 != 16)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 6>:
  __builtin_putchar (10);
  return 0;

}



[-- Attachment #3: my--pr49771.c --]
[-- Type: text/x-csrc, Size: 528 bytes --]

#include <stdio.h>
#include <stdarg.h>

#define N 4
static int a[N];

__attribute__ ((noinline)) int
foo (void)
{
  int j;
  int i;
  for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
      a[j] = a[i] + 1;
  return a[0];
}

int
main (void)
{
  int res, i;

  for (i = 0; i < N; i++)
    a[i] = 0;

  res = foo ();
  if (res != 31)
    printf ("%d\n", res);
  for (i = 0; i < N; i++)
    printf ("%d ", a[i]);
 printf ("\n");

  return 0;
}

/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 11:26             ` Ira Rosen
@ 2011-07-25 11:41               ` Richard Guenther
  2011-07-25 12:33                 ` Ira Rosen
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 11:41 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>> Richard Guenther wrote:
>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> >> I reproduced the failure. It occurs without Richard's
>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>> >
>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>> > Here is the scalar loop after vrp2
>>>> >
>>>> > <bb 8>:
>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>> >  D.4521_34 = D.4520_33 + 1;
>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>> >  if (ivtmp.42_45 != 16)
>>>> >    goto <bb 10>;
>>>> >  else
>>>> >    goto <bb 5>;
>>>> >
>>>> > and the load is changed by dom2 to:
>>>> >
>>>> > <bb 4>:
>>>> >  ...
>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>> >   ...
>>>> >
>>>> > where vector(4) int * vect_pa.9;
>>>> >
>>>> > And the scalar loop has no rotate for that load:
>>>>
>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>
>>> Well, the back-end assumes a pointer to vector type is always
>>> naturally aligned, and therefore the data it points to can be
>>> accessed via a simple load, with no extra rotate needed.
>>
>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> anything about alignment just because of the kind of the pointer
>> is bogus - the scalar code does a scalar read using that pointer.
>> So the backend better should look at the memory operation, not
>> at the pointer type.  That said, I can't find any code that looks
>> suspicious in the spu backend.
>>
>>> It seems what happened here is that somehow, a pointer to int
>>> gets replaced by a pointer to vector, even though their alignment
>>> properties are different.
>>
>> No, they are not.  They get replaced if they are value-equivalent
>> in which case they are also alignment-equivalent.  But well, the
>> dump snippet wasn't complete and I don't feel like building a
>> SPU cross to verify myself.
>
> I am attaching the complete file.

The issue seems to be that the IV in question, vect_pa.9_19, is
defined as

  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;

but ivtmp.53_32 does not have a definition at all.

Richard.

>
> Thanks,
> Ira
>
>
>
>>
>>> This vector pointer must originate somehow in the vectorizer,
>>> however, since the original C source does not contain any
>>> vector types at all ...
>>
>> That's for sure true, it must be the initial pointer we then increment
>> in the vectorized loop.
>>
>> Richard.
>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  Ulrich.Weigand@de.ibm.com
>>>
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 11:41               ` Richard Guenther
@ 2011-07-25 12:33                 ` Ira Rosen
  2011-07-25 13:01                   ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ira Rosen @ 2011-07-25 12:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

[-- Attachment #1: Type: text/plain, Size: 3634 bytes --]

On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>> Richard Guenther wrote:
>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>> >
>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>> > Here is the scalar loop after vrp2
>>>>> >
>>>>> > <bb 8>:
>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>> >  if (ivtmp.42_45 != 16)
>>>>> >    goto <bb 10>;
>>>>> >  else
>>>>> >    goto <bb 5>;
>>>>> >
>>>>> > and the load is changed by dom2 to:
>>>>> >
>>>>> > <bb 4>:
>>>>> >  ...
>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>> >   ...
>>>>> >
>>>>> > where vector(4) int * vect_pa.9;
>>>>> >
>>>>> > And the scalar loop has no rotate for that load:
>>>>>
>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>
>>>> Well, the back-end assumes a pointer to vector type is always
>>>> naturally aligned, and therefore the data it points to can be
>>>> accessed via a simple load, with no extra rotate needed.
>>>
>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> anything about alignment just because of the kind of the pointer
>>> is bogus - the scalar code does a scalar read using that pointer.
>>> So the backend better should look at the memory operation, not
>>> at the pointer type.  That said, I can't find any code that looks
>>> suspicious in the spu backend.
>>>
>>>> It seems what happened here is that somehow, a pointer to int
>>>> gets replaced by a pointer to vector, even though their alignment
>>>> properties are different.
>>>
>>> No, they are not.  They get replaced if they are value-equivalent
>>> in which case they are also alignment-equivalent.  But well, the
>>> dump snippet wasn't complete and I don't feel like building a
>>> SPU cross to verify myself.
>>
>> I am attaching the complete file.
>
> The issue seems to be that the IV in question, vect_pa.9_19, is
> defined as
>
>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>
> but ivtmp.53_32 does not have a definition at all.
>

I am sorry, it's my fault, resending the file.

Sorry,
Ira

> Richard.
>
>>
>> Thanks,
>> Ira
>>
>>
>>
>>>
>>>> This vector pointer must originate somehow in the vectorizer,
>>>> however, since the original C source does not contain any
>>>> vector types at all ...
>>>
>>> That's for sure true, it must be the initial pointer we then increment
>>> in the vectorized loop.
>>>
>>> Richard.
>>>
>>>> Bye,
>>>> Ulrich
>>>>
>>>> --
>>>>  Dr. Ulrich Weigand
>>>>  GNU Toolchain for Linux on System z and Cell BE
>>>>  Ulrich.Weigand@de.ibm.com
>>>>
>>>
>>
>

[-- Attachment #2: my--pr49771.c.124t.dom2 --]
[-- Type: application/octet-stream, Size: 19269 bytes --]


;; Function foo (foo, funcdef_no=0, decl_uid=2839, cgraph_uid=0)

;; 3 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 10 5 6 9 7 8
;;
;; Loop 1
;;  header 7, latch 9
;;  depth 1, outer 0
;;  nodes: 7 9 6 5 4 3 10
;;
;; Loop 2
;;  header 4, latch 10
;;  depth 2, outer 1
;;  nodes: 4 10
;; 2 succs { 7 }
;; 3 succs { 4 }
;; 4 succs { 10 6 }
;; 10 succs { 4 }
;; 5 succs { 6 }
;; 6 succs { 9 8 }
;; 9 succs { 7 }
;; 7 succs { 3 5 }
;; 8 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement ivtmp.53_37 = (long unsigned int) &a;
LKUP STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
2>>> STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
Optimizing statement a.55_39 = (long unsigned int) &a;
LKUP STMT a.55_39 = nop_expr &a
          a.55_39 = (long unsigned int) &a;
FIND: ivtmp.53_37
  Replaced redundant expr '(long unsigned int) &a' with 'ivtmp.53_37'
==== ASGN a.55_39 = ivtmp.53_37
Optimizing statement D.4591_40 = a.55_39 + 16;
  Replaced 'a.55_39' with variable 'ivtmp.53_37'
LKUP STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;
2>>> STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;


Optimizing block #7

Optimizing statement vect_pa.12_24 = &a;
LKUP STMT vect_pa.12_24 = &a
LKUP STMT vect_pa.12_24 = &a
          vect_pa.12_24 = &a;
==== ASGN vect_pa.12_24 = &a
Optimizing statement D.4589_36 = ivtmp.53_32 + 16;
LKUP STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
2>>> STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
Optimizing statement D.4540_11 = (vector(4) int *) D.4589_36;
LKUP STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
2>>> STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
Optimizing statement D.4541_26 = D.4540_11 <= &a;
LKUP STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
2>>> STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
Optimizing statement D.4542_27 = &MEM[(void *)&a + 16B];
LKUP STMT D.4542_27 = &MEM[(void *)&a + 16B]
          D.4542_27 = &MEM[(void *)&a + 16B];
==== ASGN D.4542_27 = &MEM[(void *)&a + 16B]
Optimizing statement vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
2>>> STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
Optimizing statement D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
2>>> STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
Optimizing statement D.4544_29 = D.4541_26 & D.4543_28;
LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
2>>> STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
Optimizing statement if (D.4544_29 != 0)
LKUP STMT D.4544_29 ne_expr 0
          if (D.4544_29 != 0)


Optimizing block #3

0>>> COPY D.4544_29 = 0
Optimizing statement ivtmp.42_59 = 0;
LKUP STMT ivtmp.42_59 = 0
          ivtmp.42_59 = 0;
==== ASGN ivtmp.42_59 = 0


Optimizing block #4

Optimizing statement D.4593_42 = (void *) ivtmp.53_32;
LKUP STMT D.4593_42 = nop_expr ivtmp.53_32
          D.4593_42 = (void *) ivtmp.53_32;
FIND: vect_pa.9_19
  Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19'
==== ASGN D.4593_42 = vect_pa.9_19
Optimizing statement D.4520_33 = MEM[base: D.4593_42, offset: 0B];
  Replaced 'D.4593_42' with variable 'vect_pa.9_19'
LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
2>>> STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
Optimizing statement D.4521_34 = D.4520_33 + 1;
LKUP STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
2>>> STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
Optimizing statement MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
LKUP STMT MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34
          MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
2>>> STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
Optimizing statement ivtmp.42_45 = ivtmp.42_50 + 4;
LKUP STMT ivtmp.42_45 = ivtmp.42_50 plus_expr 4
          ivtmp.42_45 = ivtmp.42_50 + 4;
Optimizing statement if (ivtmp.42_45 != 16)
LKUP STMT ivtmp.42_45 ne_expr 16
          if (ivtmp.42_45 != 16)


Optimizing block #10

1>>> COND 1 = ivtmp.42_45 ne_expr 16
1>>> COND 0 = ivtmp.42_45 eq_expr 16
<<<< COND 0 = ivtmp.42_45 eq_expr 16
<<<< COND 1 = ivtmp.42_45 ne_expr 16
0>>> COPY ivtmp.42_45 = 16
1>>> COND 1 = ivtmp.42_45 le_expr 16
1>>> COND 1 = ivtmp.42_45 ge_expr 16
1>>> COND 1 = ivtmp.42_45 eq_expr 16
1>>> COND 0 = ivtmp.42_45 ne_expr 16
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)
<<<< COND 0 = ivtmp.42_45 ne_expr 16
<<<< COND 1 = ivtmp.42_45 eq_expr 16
<<<< COND 1 = ivtmp.42_45 ge_expr 16
<<<< COND 1 = ivtmp.42_45 le_expr 16
<<<< STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]
          D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B];
<<<< STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;
<<<< STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
<<<< COPY D.4544_29 = 0
LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
LKUP STMT D.4521_34 = D.4520_33 plus_expr 1
          D.4521_34 = D.4520_33 + 1;


Optimizing block #5

0>>> COPY D.4544_29 = 1
Optimizing statement vect_cst_.29_56 = { 1, 1, 1, 1 };
LKUP STMT vect_cst_.29_56 = { 1, 1, 1, 1 }
          vect_cst_.29_56 = { 1, 1, 1, 1 };
==== ASGN vect_cst_.29_56 = { 1, 1, 1, 1 }
Optimizing statement vect_pa.33_58 = &a;
LKUP STMT vect_pa.33_58 = &a
          vect_pa.33_58 = &a;
==== ASGN vect_pa.33_58 = &a
Optimizing statement D.4592_41 = (void *) ivtmp.53_32;
LKUP STMT D.4592_41 = nop_expr ivtmp.53_32
          D.4592_41 = (void *) ivtmp.53_32;
FIND: vect_pa.9_19
  Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19'
==== ASGN D.4592_41 = vect_pa.9_19
Optimizing statement D.4520_5 = MEM[base: D.4592_41, offset: 0B];
  Replaced 'D.4592_41' with variable 'vect_pa.9_19'
LKUP STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
2>>> STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
Optimizing statement vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
LKUP STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
2>>> STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
Optimizing statement vect_var_.28_57 = vect_cst_.27_55 + vect_cst_.29_56;
  Replaced 'vect_cst_.29_56' with constant '{ 1, 1, 1, 1 }'
LKUP STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
2>>> STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
Optimizing statement MEM[(int[4] *)&a] = vect_var_.28_57;
LKUP STMT MEM[(int[4] *)&a] = vect_var_.28_57
          MEM[(int[4] *)&a] = vect_var_.28_57;
LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
2>>> STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
Optimizing statement vect_pa.30_60 = &MEM[(void *)&a + 16B];
LKUP STMT vect_pa.30_60 = &MEM[(void *)&a + 16B]
          vect_pa.30_60 = &MEM[(void *)&a + 16B];
==== ASGN vect_pa.30_60 = &MEM[(void *)&a + 16B]
Optimizing statement ivtmp.34_62 = 1;
LKUP STMT ivtmp.34_62 = 1
          ivtmp.34_62 = 1;
==== ASGN ivtmp.34_62 = 1
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)
<<<< STMT vect_var_.28_57 = MEM[(int[4] *)&a]
          vect_var_.28_57 = MEM[(int[4] *)&a];
<<<< STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 }
          vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
<<<< STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}
          vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
<<<< STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]
          D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];


Optimizing block #6

Optimizing statement ivtmp.53_30 = ivtmp.53_32 + 4;
LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4
          ivtmp.53_30 = ivtmp.53_32 + 4;
Optimizing statement if (ivtmp.53_30 != D.4591_40)
LKUP STMT ivtmp.53_30 ne_expr D.4591_40
          if (ivtmp.53_30 != D.4591_40)


Optimizing block #8

0>>> COPY ivtmp.53_30 = D.4591_40
1>>> COND 1 = ivtmp.53_30 le_expr D.4591_40
1>>> COND 1 = ivtmp.53_30 ge_expr D.4591_40
1>>> COND 1 = ivtmp.53_30 eq_expr D.4591_40
1>>> COND 0 = ivtmp.53_30 ne_expr D.4591_40
Optimizing statement D.4522_9 = a[0];
LKUP STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
2>>> STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
Optimizing statement return D.4522_9;
<<<< STMT D.4522_9 = a[0]
          D.4522_9 = a[0];
<<<< COND 0 = ivtmp.53_30 ne_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 ge_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 le_expr D.4591_40
<<<< COPY ivtmp.53_30 = D.4591_40


Optimizing block #9

1>>> COND 1 = ivtmp.53_30 ne_expr D.4591_40
1>>> COND 0 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 0 = ivtmp.53_30 eq_expr D.4591_40
<<<< COND 1 = ivtmp.53_30 ne_expr D.4591_40
<<<< STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
<<<< STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
<<<< STMT vect_pa.9_19 = nop_expr ivtmp.53_32
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
<<<< STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
<<<< STMT D.4540_11 = nop_expr D.4589_36
          D.4540_11 = (vector(4) int *) D.4589_36;
<<<< STMT D.4589_36 = ivtmp.53_32 plus_expr 16
          D.4589_36 = ivtmp.53_32 + 16;
LKUP STMT D.4589_36 = ivtmp.53_37 plus_expr 16
          D.4589_36 = ivtmp.53_37 + 16;
FIND: D.4591_40
LKUP STMT D.4540_11 = nop_expr D.4591_40
          D.4540_11 = (vector(4) int *) D.4591_40;
LKUP STMT D.4541_26 = D.4540_11 le_expr &a
          D.4541_26 = D.4540_11 <= &a;
LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_37
          vect_pa.9_19 = (vector(4) int *) ivtmp.53_37;
LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B]
          D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28
          D.4544_29 = D.4541_26 & D.4543_28;
LKUP STMT D.4544_29 ne_expr 0
          if (D.4544_29 != 0)
<<<< STMT D.4591_40 = ivtmp.53_37 plus_expr 16
          D.4591_40 = ivtmp.53_37 + 16;
<<<< STMT ivtmp.53_37 = nop_expr &a
          ivtmp.53_37 = (long unsigned int) &a;
Removing basic block 9
;; basic block 9, loop depth 1, count 0
;; prev block 6, next block 7
;; pred:
;; succ:       7 [100.0%]  (fallthru,dfs_back)
<bb 9>:
Invalid sum of incoming frequencies 0, should be 1500


Removing basic block 10
;; basic block 10, loop depth 2, count 0
;; prev block 4, next block 5
;; pred:
;; succ:       4 [100.0%]  (fallthru,dfs_back)
<bb 10>:
Invalid sum of incoming frequencies 0, should be 1200
goto <bb 4>;


Scope blocks after cleanups:

{ Scope block #0

}
foo ()
{
  void * D.4593;
  void * D.4592;
  long unsigned int a.55;
  long unsigned int D.4591;
  long unsigned int D.4589;
  long unsigned int ivtmp.53;
  sizetype ivtmp.42;
  unsigned int ivtmp.34;
  vector(4) int * vect_pa.33;
  vector(4) int * vect_pa.30;
  vector(4) int vect_cst_.29;
  vector(4) int vect_var_.28;
  vector(4) int vect_cst_.27;
  vector(4) int * D.4540;
  _Bool D.4541;
  vector(4) int * D.4542;
  _Bool D.4543;
  _Bool D.4544;
  vector(4) int * vect_pa.12;
  vector(4) int * vect_pa.9;
  int D.4522;
  int D.4521;
  int D.4520;

<bb 2>:
  ivtmp.53_37 = (long unsigned int) &a;
  a.55_39 = ivtmp.53_37;
  D.4591_40 = ivtmp.53_37 + 16;
  goto <bb 7>;

<bb 3>:
  ivtmp.42_59 = 0;

<bb 4>:
  # ivtmp.42_50 = PHI <0(3), ivtmp.42_45(4)>
  D.4593_42 = vect_pa.9_19;
  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
  D.4521_34 = D.4520_33 + 1;
  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
  ivtmp.42_45 = ivtmp.42_50 + 4;
  if (ivtmp.42_45 != 16)
    goto <bb 4>;
  else
    goto <bb 6>;

<bb 5>:
  vect_cst_.29_56 = { 1, 1, 1, 1 };
  vect_pa.33_58 = &a;
  D.4592_41 = vect_pa.9_19;
  D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B];
  vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5};
  vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 };
  MEM[(int[4] *)&a] = vect_var_.28_57;
  vect_pa.30_60 = &MEM[(void *)&a + 16B];
  ivtmp.34_62 = 1;

<bb 6>:
  ivtmp.53_30 = ivtmp.53_32 + 4;
  if (ivtmp.53_30 != D.4591_40)
    goto <bb 7>;
  else
    goto <bb 8>;

<bb 7>:
  # ivtmp.53_32 = PHI <ivtmp.53_37(2), ivtmp.53_30(6)>
  vect_pa.12_24 = &a;
  D.4589_36 = ivtmp.53_32 + 16;
  D.4540_11 = (vector(4) int *) D.4589_36;
  D.4541_26 = D.4540_11 <= &a;
  D.4542_27 = &MEM[(void *)&a + 16B];
  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
  D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B];
  D.4544_29 = D.4541_26 & D.4543_28;
  if (D.4544_29 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

<bb 8>:
  D.4522_9 = a[0];
  return D.4522_9;

}



;; Function main (main, funcdef_no=1, decl_uid=2850, cgraph_uid=1) (executed once)

;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5 7 6
;;
;; Loop 1
;;  header 5, latch 7
;;  depth 1, outer 0
;;  nodes: 5 7
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 5 }
;; 5 succs { 7 6 }
;; 7 succs { 5 }
;; 6 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement a[0] = 0;
LKUP STMT a[0] = 0
          a[0] = 0;
LKUP STMT 0 = a[0]
          0 = a[0];
LKUP STMT 0 = a[0]
          0 = a[0];
2>>> STMT 0 = a[0]
          0 = a[0];
Optimizing statement a[1] = 0;
LKUP STMT a[1] = 0
          a[1] = 0;
LKUP STMT 0 = a[1]
          0 = a[1];
LKUP STMT 0 = a[1]
          0 = a[1];
2>>> STMT 0 = a[1]
          0 = a[1];
Optimizing statement a[2] = 0;
LKUP STMT a[2] = 0
          a[2] = 0;
LKUP STMT 0 = a[2]
          0 = a[2];
LKUP STMT 0 = a[2]
          0 = a[2];
2>>> STMT 0 = a[2]
          0 = a[2];
Optimizing statement a[3] = 0;
LKUP STMT a[3] = 0
          a[3] = 0;
LKUP STMT 0 = a[3]
          0 = a[3];
LKUP STMT 0 = a[3]
          0 = a[3];
2>>> STMT 0 = a[3]
          0 = a[3];
Optimizing statement res_5 = foo ();
Optimizing statement if (res_5 != 31)
LKUP STMT res_5 ne_expr 31
          if (res_5 != 31)


Optimizing block #3

1>>> COND 1 = res_5 ne_expr 31
1>>> COND 0 = res_5 eq_expr 31
Optimizing statement printf ("%d\n", res_5);
<<<< COND 0 = res_5 eq_expr 31
<<<< COND 1 = res_5 ne_expr 31


Optimizing block #4



Optimizing block #5

Optimizing statement D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
LKUP STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
2>>> STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
Optimizing statement printf ("%d ", D.4517_7);
Optimizing statement ivtmp.67_12 = ivtmp.67_26 + 4;
LKUP STMT ivtmp.67_12 = ivtmp.67_26 plus_expr 4
          ivtmp.67_12 = ivtmp.67_26 + 4;
Optimizing statement if (ivtmp.67_12 != 16)
LKUP STMT ivtmp.67_12 ne_expr 16
          if (ivtmp.67_12 != 16)


Optimizing block #6

0>>> COPY ivtmp.67_12 = 16
1>>> COND 1 = ivtmp.67_12 le_expr 16
1>>> COND 1 = ivtmp.67_12 ge_expr 16
1>>> COND 1 = ivtmp.67_12 eq_expr 16
1>>> COND 0 = ivtmp.67_12 ne_expr 16
Optimizing statement __builtin_putchar (10);
Optimizing statement return 0;
<<<< COND 0 = ivtmp.67_12 ne_expr 16
<<<< COND 1 = ivtmp.67_12 eq_expr 16
<<<< COND 1 = ivtmp.67_12 ge_expr 16
<<<< COND 1 = ivtmp.67_12 le_expr 16
<<<< COPY ivtmp.67_12 = 16


Optimizing block #7

1>>> COND 1 = ivtmp.67_12 ne_expr 16
1>>> COND 0 = ivtmp.67_12 eq_expr 16
<<<< COND 0 = ivtmp.67_12 eq_expr 16
<<<< COND 1 = ivtmp.67_12 ne_expr 16
<<<< STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
LKUP STMT D.4517_7 = MEM[symbol: a, index: 0, offset: 0B]
          D.4517_7 = MEM[symbol: a, index: 0, offset: 0B];
<<<< STMT 0 = a[3]
          0 = a[3];
<<<< STMT 0 = a[2]
          0 = a[2];
<<<< STMT 0 = a[1]
          0 = a[1];
<<<< STMT 0 = a[0]
          0 = a[0];
Removing basic block 7
;; basic block 7, loop depth 1, count 0
;; prev block 5, next block 6
;; pred:
;; succ:       5 [100.0%]  (fallthru,dfs_back)
<bb 7>:
Invalid sum of incoming frequencies 0, should be 6000
goto <bb 5>;


Scope blocks after cleanups:

{ Scope block #0
  int res;

}
main ()
{
  sizetype ivtmp.67;
  int res;
  int D.4517;

<bb 2>:
  a[0] = 0;
  a[1] = 0;
  a[2] = 0;
  a[3] = 0;
  res_5 = foo ();
  if (res_5 != 31)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  printf ("%d\n", res_5);

<bb 4>:

<bb 5>:
  # ivtmp.67_26 = PHI <ivtmp.67_12(5), 0(4)>
  D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B];
  printf ("%d ", D.4517_7);
  ivtmp.67_12 = ivtmp.67_26 + 4;
  if (ivtmp.67_12 != 16)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 6>:
  __builtin_putchar (10);
  return 0;

}



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 12:33                 ` Ira Rosen
@ 2011-07-25 13:01                   ` Richard Guenther
  2011-07-25 13:07                     ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 13:01 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>>> Richard Guenther wrote:
>>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>>> >
>>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>>> > Here is the scalar loop after vrp2
>>>>>> >
>>>>>> > <bb 8>:
>>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>>> >  if (ivtmp.42_45 != 16)
>>>>>> >    goto <bb 10>;
>>>>>> >  else
>>>>>> >    goto <bb 5>;
>>>>>> >
>>>>>> > and the load is changed by dom2 to:
>>>>>> >
>>>>>> > <bb 4>:
>>>>>> >  ...
>>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>>> >   ...
>>>>>> >
>>>>>> > where vector(4) int * vect_pa.9;
>>>>>> >
>>>>>> > And the scalar loop has no rotate for that load:
>>>>>>
>>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>>
>>>>> Well, the back-end assumes a pointer to vector type is always
>>>>> naturally aligned, and therefore the data it points to can be
>>>>> accessed via a simple load, with no extra rotate needed.
>>>>
>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>> anything about alignment just because of the kind of the pointer
>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>> So the backend better should look at the memory operation, not
>>>> at the pointer type.  That said, I can't find any code that looks
>>>> suspicious in the spu backend.
>>>>
>>>>> It seems what happened here is that somehow, a pointer to int
>>>>> gets replaced by a pointer to vector, even though their alignment
>>>>> properties are different.
>>>>
>>>> No, they are not.  They get replaced if they are value-equivalent
>>>> in which case they are also alignment-equivalent.  But well, the
>>>> dump snippet wasn't complete and I don't feel like building a
>>>> SPU cross to verify myself.
>>>
>>> I am attaching the complete file.
>>
>> The issue seems to be that the IV in question, vect_pa.9_19, is
>> defined as
>>
>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>
>> but ivtmp.53_32 does not have a definition at all.
>>
>
> I am sorry, it's my fault, resending the file.

Seems perfectly valid to me.  Or well - I suppose we might run into
the issue that the vectorizer sets alignment data at the wrong spot?
You can check alignment info when dumping with the -alias flag.
Building a spu cross now.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 13:01                   ` Richard Guenther
@ 2011-07-25 13:07                     ` Richard Guenther
  2011-07-25 13:47                       ` Ulrich Weigand
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 13:07 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 1:15 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote:
>>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>>>>> Richard Guenther wrote:
>>>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote:
>>>>>>> >> I reproduced the failure. It occurs without Richard's
>>>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>>>>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>>>>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>>>>>> >
>>>>>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>>>>>> > gets optimized in a harmful way for SPU, AFAIU.
>>>>>>> > Here is the scalar loop after vrp2
>>>>>>> >
>>>>>>> > <bb 8>:
>>>>>>> >  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>>>>>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>>>>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>>>>>> >  D.4521_34 = D.4520_33 + 1;
>>>>>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>>>>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>>>>>> >  if (ivtmp.42_45 != 16)
>>>>>>> >    goto <bb 10>;
>>>>>>> >  else
>>>>>>> >    goto <bb 5>;
>>>>>>> >
>>>>>>> > and the load is changed by dom2 to:
>>>>>>> >
>>>>>>> > <bb 4>:
>>>>>>> >  ...
>>>>>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>>>>>> >   ...
>>>>>>> >
>>>>>>> > where vector(4) int * vect_pa.9;
>>>>>>> >
>>>>>>> > And the scalar loop has no rotate for that load:
>>>>>>>
>>>>>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>>>>>
>>>>>> Well, the back-end assumes a pointer to vector type is always
>>>>>> naturally aligned, and therefore the data it points to can be
>>>>>> accessed via a simple load, with no extra rotate needed.
>>>>>
>>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>>> anything about alignment just because of the kind of the pointer
>>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>>> So the backend better should look at the memory operation, not
>>>>> at the pointer type.  That said, I can't find any code that looks
>>>>> suspicious in the spu backend.
>>>>>
>>>>>> It seems what happened here is that somehow, a pointer to int
>>>>>> gets replaced by a pointer to vector, even though their alignment
>>>>>> properties are different.
>>>>>
>>>>> No, they are not.  They get replaced if they are value-equivalent
>>>>> in which case they are also alignment-equivalent.  But well, the
>>>>> dump snippet wasn't complete and I don't feel like building a
>>>>> SPU cross to verify myself.
>>>>
>>>> I am attaching the complete file.
>>>
>>> The issue seems to be that the IV in question, vect_pa.9_19, is
>>> defined as
>>>
>>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>>
>>> but ivtmp.53_32 does not have a definition at all.
>>>
>>
>> I am sorry, it's my fault, resending the file.
>
> Seems perfectly valid to me.  Or well - I suppose we might run into
> the issue that the vectorizer sets alignment data at the wrong spot?
> You can check alignment info when dumping with the -alias flag.
> Building a spu cross now.

Nope, all perfectly valid.

> Richard.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 13:07                     ` Richard Guenther
@ 2011-07-25 13:47                       ` Ulrich Weigand
  2011-07-25 14:01                         ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-25 13:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking

Richard Guenther wrote:

> >>>>>> Well, the back-end assumes a pointer to vector type is always
> >>>>>> naturally aligned, and therefore the data it points to can be
> >>>>>> accessed via a simple load, with no extra rotate needed.
> >>>>>
> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> >>>>> anything about alignment just because of the kind of the pointer
> >>>>> is bogus - the scalar code does a scalar read using that pointer.
> >>>>> So the backend better should look at the memory operation, not
> >>>>> at the pointer type.  That said, I can't find any code that looks
> >>>>> suspicious in the spu backend.
> >>>>>
> >>>>>> It seems what happened here is that somehow, a pointer to int
> >>>>>> gets replaced by a pointer to vector, even though their alignment
> >>>>>> properties are different.
> >>>>>
> >>>>> No, they are not.  They get replaced if they are value-equivalent
> >>>>> in which case they are also alignment-equivalent.  But well, the
> >>>>> dump snippet wasn't complete and I don't feel like building a
> >>>>> SPU cross to verify myself.

> > Seems perfectly valid to me.  Or well - I suppose we might run into
> > the issue that the vectorizer sets alignment data at the wrong spot?
> > You can check alignment info when dumping with the -alias flag.
> > Building a spu cross now.
> 
> Nope, all perfectly valid.

Ah, I guess I see what's happening here.  When the SPU back-end is called
to expand the load, the source operand is passed as:

(mem:SI (reg/f:SI 226 [ vect_pa.9 ])
        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])

Now this does say the MEM is only guaranteed to be aligned to 32 bits.

However, spu_expand_load then goes and looks at the components of the
address in detail, in order to figure out how to best perform the access.
In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
registers involved in the address.

In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
the back-end thinks it can use an aligned access after all.

Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
is the DECL_RTL for the variable vect_pa.9, and that variable has a
pointer-to-vector type (with target alignment 128).

When expanding that variable, expand_one_register_var does:

  if (POINTER_TYPE_P (type))
    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));

All this is normally completely correct -- a variable of type pointer
to vector type *must* hold only properly aligned values.

I guess the vectorizer deliberatly loads a (potentially) unaligned
value into a vector pointer variable.  It then generates a check
whether the value is really aligned; and uses it only if so.

But if that pointer variable "escapes" into the other branch because
DOM thinks it can re-use the value, the REGNO_POINTER_ALIGN value
carried for its DECL_RTL register is now incorrect ...

Maybe the vectorizer ought to declare that variable with a non-default
type alignment setting?  Or else, perform the assignment to the
variable only *inside* the "if" that checks for correct alignment?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 13:47                       ` Ulrich Weigand
@ 2011-07-25 14:01                         ` Richard Guenther
  2011-07-25 14:10                           ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 14:01 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>> >>>>>> naturally aligned, and therefore the data it points to can be
>> >>>>>> accessed via a simple load, with no extra rotate needed.
>> >>>>>
>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> >>>>> anything about alignment just because of the kind of the pointer
>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>> >>>>> So the backend better should look at the memory operation, not
>> >>>>> at the pointer type.  That said, I can't find any code that looks
>> >>>>> suspicious in the spu backend.
>> >>>>>
>> >>>>>> It seems what happened here is that somehow, a pointer to int
>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>> >>>>>> properties are different.
>> >>>>>
>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>> >>>>> in which case they are also alignment-equivalent.  But well, the
>> >>>>> dump snippet wasn't complete and I don't feel like building a
>> >>>>> SPU cross to verify myself.
>
>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>> > the issue that the vectorizer sets alignment data at the wrong spot?
>> > You can check alignment info when dumping with the -alias flag.
>> > Building a spu cross now.
>>
>> Nope, all perfectly valid.
>
> Ah, I guess I see what's happening here.  When the SPU back-end is called
> to expand the load, the source operand is passed as:
>
> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>
> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>
> However, spu_expand_load then goes and looks at the components of the
> address in detail, in order to figure out how to best perform the access.
> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
> registers involved in the address.
>
> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
> the back-end thinks it can use an aligned access after all.
>
> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
> is the DECL_RTL for the variable vect_pa.9, and that variable has a
> pointer-to-vector type (with target alignment 128).
>
> When expanding that variable, expand_one_register_var does:
>
>  if (POINTER_TYPE_P (type))
>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>
> All this is normally completely correct -- a variable of type pointer
> to vector type *must* hold only properly aligned values.

No, this is indeed completely bogus code ;)  it should instead
use get_pointer_alignment.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 14:01                         ` Richard Guenther
@ 2011-07-25 14:10                           ` Richard Guenther
  2011-07-25 14:14                             ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 14:10 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> Richard Guenther wrote:
>>
>>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>>> >>>>>> naturally aligned, and therefore the data it points to can be
>>> >>>>>> accessed via a simple load, with no extra rotate needed.
>>> >>>>>
>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> >>>>> anything about alignment just because of the kind of the pointer
>>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>>> >>>>> So the backend better should look at the memory operation, not
>>> >>>>> at the pointer type.  That said, I can't find any code that looks
>>> >>>>> suspicious in the spu backend.
>>> >>>>>
>>> >>>>>> It seems what happened here is that somehow, a pointer to int
>>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>>> >>>>>> properties are different.
>>> >>>>>
>>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>>> >>>>> in which case they are also alignment-equivalent.  But well, the
>>> >>>>> dump snippet wasn't complete and I don't feel like building a
>>> >>>>> SPU cross to verify myself.
>>
>>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>>> > the issue that the vectorizer sets alignment data at the wrong spot?
>>> > You can check alignment info when dumping with the -alias flag.
>>> > Building a spu cross now.
>>>
>>> Nope, all perfectly valid.
>>
>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>> to expand the load, the source operand is passed as:
>>
>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>
>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>
>> However, spu_expand_load then goes and looks at the components of the
>> address in detail, in order to figure out how to best perform the access.
>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>> registers involved in the address.
>>
>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>> the back-end thinks it can use an aligned access after all.
>>
>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>> pointer-to-vector type (with target alignment 128).
>>
>> When expanding that variable, expand_one_register_var does:
>>
>>  if (POINTER_TYPE_P (type))
>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>
>> All this is normally completely correct -- a variable of type pointer
>> to vector type *must* hold only properly aligned values.
>
> No, this is indeed completely bogus code ;)  it should instead
> use get_pointer_alignment.

Btw, as pseudos do not have a single def site how can the above
ever be correct in the face of coalescing?  For example on trees we
can have

 p_1 = &a; // align 256
 p_2 = p_1 + 4; // align 32

but we'll coalesce the thing and thus would have to use the weaker
alignment of both SSA names.  expand_one_register_var expands
the decl, not the SSA name, so using get_pointer_alignment on
the decl would probably be fine, though also pointless as it always
will return 8.

At least I don't see any code that would prevent a temporary variable
of type int * of being coalesced with a temporary variable of type vector int *.

Why should REGNO_POINTER_ALIGN be interesting to anyone?
Proper alignment information is (should be) attached to every
MEM already.

Richard.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 14:10                           ` Richard Guenther
@ 2011-07-25 14:14                             ` Richard Guenther
  2011-07-25 14:54                               ` Ulrich Weigand
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 14:14 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 4:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>> Richard Guenther wrote:
>>>
>>>> >>>>>> Well, the back-end assumes a pointer to vector type is always
>>>> >>>>>> naturally aligned, and therefore the data it points to can be
>>>> >>>>>> accessed via a simple load, with no extra rotate needed.
>>>> >>>>>
>>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>>> >>>>> anything about alignment just because of the kind of the pointer
>>>> >>>>> is bogus - the scalar code does a scalar read using that pointer.
>>>> >>>>> So the backend better should look at the memory operation, not
>>>> >>>>> at the pointer type.  That said, I can't find any code that looks
>>>> >>>>> suspicious in the spu backend.
>>>> >>>>>
>>>> >>>>>> It seems what happened here is that somehow, a pointer to int
>>>> >>>>>> gets replaced by a pointer to vector, even though their alignment
>>>> >>>>>> properties are different.
>>>> >>>>>
>>>> >>>>> No, they are not.  They get replaced if they are value-equivalent
>>>> >>>>> in which case they are also alignment-equivalent.  But well, the
>>>> >>>>> dump snippet wasn't complete and I don't feel like building a
>>>> >>>>> SPU cross to verify myself.
>>>
>>>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>>>> > the issue that the vectorizer sets alignment data at the wrong spot?
>>>> > You can check alignment info when dumping with the -alias flag.
>>>> > Building a spu cross now.
>>>>
>>>> Nope, all perfectly valid.
>>>
>>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>>> to expand the load, the source operand is passed as:
>>>
>>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>>
>>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>>
>>> However, spu_expand_load then goes and looks at the components of the
>>> address in detail, in order to figure out how to best perform the access.
>>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>>> registers involved in the address.
>>>
>>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>>> the back-end thinks it can use an aligned access after all.
>>>
>>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>>> pointer-to-vector type (with target alignment 128).
>>>
>>> When expanding that variable, expand_one_register_var does:
>>>
>>>  if (POINTER_TYPE_P (type))
>>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>>
>>> All this is normally completely correct -- a variable of type pointer
>>> to vector type *must* hold only properly aligned values.
>>
>> No, this is indeed completely bogus code ;)  it should instead
>> use get_pointer_alignment.
>
> Btw, as pseudos do not have a single def site how can the above
> ever be correct in the face of coalescing?  For example on trees we
> can have
>
>  p_1 = &a; // align 256
>  p_2 = p_1 + 4; // align 32
>
> but we'll coalesce the thing and thus would have to use the weaker
> alignment of both SSA names.  expand_one_register_var expands
> the decl, not the SSA name, so using get_pointer_alignment on
> the decl would probably be fine, though also pointless as it always
> will return 8.
>
> At least I don't see any code that would prevent a temporary variable
> of type int * of being coalesced with a temporary variable of type vector int *.
>
> Why should REGNO_POINTER_ALIGN be interesting to anyone?
> Proper alignment information is (should be) attached to every
> MEM already.

nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
apart from maybe alpha.c and spu.c.

We should simply kill REGNO_POINTER_ALIGN IMHO.

Richard.

> Richard.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 14:14                             ` Richard Guenther
@ 2011-07-25 14:54                               ` Ulrich Weigand
  2011-07-25 14:59                                 ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-25 14:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking

Richard Guenther wrote:

> > Btw, as pseudos do not have a single def site how can the above
> > ever be correct in the face of coalescing?

I had always understood this to reflect the simple fact that a
pointer to some type must never hold a value that is not properly
aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
targets?)   This has always been an important property to generate
good code on SPU ...

> > For example on trees we can have
> >
> >  p_1 = &a; // align 256
> >  p_2 = p_1 + 4; // align 32
> >
> > but we'll coalesce the thing and thus would have to use the weaker
> > alignment of both SSA names.  expand_one_register_var expands
> > the decl, not the SSA name, so using get_pointer_alignment on
> > the decl would probably be fine, though also pointless as it always
> > will return 8.
> >
> > At least I don't see any code that would prevent a temporary variable
> > of type int * of being coalesced with a temporary variable of type vector
> > int *.

I don't really understand the coalesce code, but in the above sample,
the two variables must have the same type, otherwise there'd have to
be a cast somewhere.  Does coalesce eliminate casts?

> > Why should REGNO_POINTER_ALIGN be interesting to anyone?
> > Proper alignment information is (should be) attached to every
> > MEM already.
> 
> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
> apart from maybe alpha.c and spu.c.
> 
> We should simply kill REGNO_POINTER_ALIGN IMHO.

On the SPU at least, REGNO_POINTER_ALIGN carries additional information
over just the MEM alignment.  Say, I'm getting a MEM the form
(mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits.

This means I need to generate a rotate to fix up the value that was
loaded by the (forced aligned) load instruction.  However, the form
of this rotate can be simpler if I know that e.g. reg X is always
guaranteed to be 128-bits aligned and only reg Y introduces the
potential misalignment.  If on the other hand neither of the base
registers is guaranteed to be 128-bit aligned, I need to generate
more complex rotate code ...

I understand this may also be important on other platforms, e.g.
to choose which register to use as base and which as index in a
memory operation ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 14:54                               ` Ulrich Weigand
@ 2011-07-25 14:59                                 ` Richard Guenther
  2011-07-25 16:12                                   ` Ulrich Weigand
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-07-25 14:59 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> > Btw, as pseudos do not have a single def site how can the above
>> > ever be correct in the face of coalescing?
>
> I had always understood this to reflect the simple fact that a
> pointer to some type must never hold a value that is not properly
> aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
> targets?)   This has always been an important property to generate
> good code on SPU ...

We do not preserve pointer type casts in the middle-end (anymore).

>> > For example on trees we can have
>> >
>> >  p_1 = &a; // align 256
>> >  p_2 = p_1 + 4; // align 32
>> >
>> > but we'll coalesce the thing and thus would have to use the weaker
>> > alignment of both SSA names.  expand_one_register_var expands
>> > the decl, not the SSA name, so using get_pointer_alignment on
>> > the decl would probably be fine, though also pointless as it always
>> > will return 8.
>> >
>> > At least I don't see any code that would prevent a temporary variable
>> > of type int * of being coalesced with a temporary variable of type vector
>> > int *.
>
> I don't really understand the coalesce code, but in the above sample,
> the two variables must have the same type, otherwise there'd have to
> be a cast somewhere.  Does coalesce eliminate casts?

No, there is no cast between different pointer types.  Information is
not attached to types but to real entities.

>> > Why should REGNO_POINTER_ALIGN be interesting to anyone?
>> > Proper alignment information is (should be) attached to every
>> > MEM already.
>>
>> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
>> apart from maybe alpha.c and spu.c.
>>
>> We should simply kill REGNO_POINTER_ALIGN IMHO.
>
> On the SPU at least, REGNO_POINTER_ALIGN carries additional information
> over just the MEM alignment.  Say, I'm getting a MEM the form
> (mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits.
>
> This means I need to generate a rotate to fix up the value that was
> loaded by the (forced aligned) load instruction.  However, the form
> of this rotate can be simpler if I know that e.g. reg X is always
> guaranteed to be 128-bits aligned and only reg Y introduces the
> potential misalignment.  If on the other hand neither of the base
> registers is guaranteed to be 128-bit aligned, I need to generate
> more complex rotate code ...

Because then you need the value of X + Y instead of just picking either?

Why not expand this explicitly when you still have the per-SSA name
alignment information around?

> I understand this may also be important on other platforms, e.g.
> to choose which register to use as base and which as index in a
> memory operation ...

Well, we still have REG_POINTER.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 14:59                                 ` Richard Guenther
@ 2011-07-25 16:12                                   ` Ulrich Weigand
  2011-07-26  8:25                                     ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-25 16:12 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking

Richard Guenther wrote:
> On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > I had always understood this to reflect the simple fact that a
> > pointer to some type must never hold a value that is not properly
> > aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
> > targets?)   This has always been an important property to generate
> > good code on SPU ...
> 
> We do not preserve pointer type casts in the middle-end (anymore).

Huh, OK.  I was not aware of that ...

> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
> >> apart from maybe alpha.c and spu.c.

There's also a use in find_reloads_subreg_address, as well as in the
i386/predicates.md and arm/arm.md files.

> > This means I need to generate a rotate to fix up the value that was
> > loaded by the (forced aligned) load instruction.  However, the form
> > of this rotate can be simpler if I know that e.g. reg X is always
> > guaranteed to be 128-bits aligned and only reg Y introduces the
> > potential misalignment.  If on the other hand neither of the base
> > registers is guaranteed to be 128-bit aligned, I need to generate
> > more complex rotate code ...
> 
> Because then you need the value of X + Y instead of just picking either?

Yes, exactly.

> Why not expand this explicitly when you still have the per-SSA name
> alignment information around?

When would that be?  The expansion does happen in the initial expand
stage, but I'm getting called from the middle-end via emit_move_insn etc.
which already provides me with a MEM ...

Can I use REG_ATTRS->decl to get at the register's DECL and use
get_pointer_alignment on that?  [ On the other hand, don't we have
the same problems with reliability of REG_ATTRS that we have with
REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-25 16:12                                   ` Ulrich Weigand
@ 2011-07-26  8:25                                     ` Richard Guenther
  2011-07-26  8:59                                       ` Andrew Pinski
  2011-07-26 14:23                                       ` Ulrich Weigand
  0 siblings, 2 replies; 35+ messages in thread
From: Richard Guenther @ 2011-07-26  8:25 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking

On Mon, Jul 25, 2011 at 5:25 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > I had always understood this to reflect the simple fact that a
>> > pointer to some type must never hold a value that is not properly
>> > aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
>> > targets?)   This has always been an important property to generate
>> > good code on SPU ...
>>
>> We do not preserve pointer type casts in the middle-end (anymore).
>
> Huh, OK.  I was not aware of that ...
>
>> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
>> >> apart from maybe alpha.c and spu.c.
>
> There's also a use in find_reloads_subreg_address, as well as in the
> i386/predicates.md and arm/arm.md files.
>
>> > This means I need to generate a rotate to fix up the value that was
>> > loaded by the (forced aligned) load instruction.  However, the form
>> > of this rotate can be simpler if I know that e.g. reg X is always
>> > guaranteed to be 128-bits aligned and only reg Y introduces the
>> > potential misalignment.  If on the other hand neither of the base
>> > registers is guaranteed to be 128-bit aligned, I need to generate
>> > more complex rotate code ...
>>
>> Because then you need the value of X + Y instead of just picking either?
>
> Yes, exactly.
>
>> Why not expand this explicitly when you still have the per-SSA name
>> alignment information around?
>
> When would that be?  The expansion does happen in the initial expand
> stage, but I'm getting called from the middle-end via emit_move_insn etc.
> which already provides me with a MEM ...

Hmm.  I suppose we'd need to see at the initial expand stage that the
move is going to be handled specially.  For other strict-align targets
we end up with store/load-bit-field for unaligned accesses, so I suppose
SPU doesn't want to go down that path (via insv/extv)?

> Can I use REG_ATTRS->decl to get at the register's DECL and use
> get_pointer_alignment on that?  [ On the other hand, don't we have
> the same problems with reliability of REG_ATTRS that we have with
> REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ]

Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
we'd need to pick a conservative REGNO_POINTER_ALIGN during
expansion of the SSA name partition - iterate over all of them in the
partition and pick the lowest alignment.  Or even adjust the partitioning
to avoid losing alignment information that way.

I suppose the RTL code transforms are careful to update REGNO_POINTER_ALIGN
conservatively.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-26  8:25                                     ` Richard Guenther
@ 2011-07-26  8:59                                       ` Andrew Pinski
  2011-07-26 14:23                                       ` Ulrich Weigand
  1 sibling, 0 replies; 35+ messages in thread
From: Andrew Pinski @ 2011-07-26  8:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ulrich Weigand, Ira Rosen, gcc-patches, Patch Tracking

On Tue, Jul 26, 2011 at 12:23 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Hmm.  I suppose we'd need to see at the initial expand stage that the
> move is going to be handled specially.  For other strict-align targets
> we end up with store/load-bit-field for unaligned accesses, so I suppose
> SPU doesn't want to go down that path (via insv/extv)?

The problem is that almost all load/stores on spu are unaligned.  So
there will be less optimized done on them if we go down that path (it
was tried at least twice and it produced worse code).

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-26  8:25                                     ` Richard Guenther
  2011-07-26  8:59                                       ` Andrew Pinski
@ 2011-07-26 14:23                                       ` Ulrich Weigand
  2011-07-26 14:25                                         ` Michael Matz
  1 sibling, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-26 14:23 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking

Richard Guenther wrote:
> On Mon, Jul 25, 2011 at 5:25 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > When would that be?  The expansion does happen in the initial expand
> > stage, but I'm getting called from the middle-end via emit_move_insn etc.
> > which already provides me with a MEM ...
> 
> Hmm.  I suppose we'd need to see at the initial expand stage that the
> move is going to be handled specially.  For other strict-align targets
> we end up with store/load-bit-field for unaligned accesses, so I suppose
> SPU doesn't want to go down that path (via insv/extv)?

One issue here is that accesses aren't necessarily "unaligned" as far as
the middle-end is concerned:  in the current example, we in fact have an
access to a 32-bit integer that is aligned on a 32-bit boundary (which is
the default alignment for integers).  It's just that even so, the address
is not *128-bit* aligned, and all SPU load instructions require this level
of alignment ...

The other issue is that as Andrew mentioned, all this means that just about
every single memory access needs to be handled this way, and attempts to
have everying go through insv/extv in the past have resulted in less efficient
code generation.
 
> > Can I use REG_ATTRS->decl to get at the register's DECL and use
> > get_pointer_alignment on that?  [ On the other hand, don't we have
> > the same problems with reliability of REG_ATTRS that we have with
> > REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ]
> 
> Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> we'd need to pick a conservative REGNO_POINTER_ALIGN during
> expansion of the SSA name partition - iterate over all of them in the
> partition and pick the lowest alignment.  Or even adjust the partitioning
> to avoid losing alignment information that way.

That would certainly be helpful.

> I suppose the RTL code transforms are careful to update REGNO_POINTER_ALIGN
> conservatively.

They're supposed to, yes.  In practice, REGNO_POINTER_ALIGN is mostly used
for pseudos allocated to hold pointer types (reflecting the type's alignment
requirement) and for virtual/hard registers pointing into the stack (stack,
frame, virtual args, ...), reflecting the various ABI alignment guarantees
about the stack.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] Fix PR tree-optimization/49771
  2011-07-26 14:23                                       ` Ulrich Weigand
@ 2011-07-26 14:25                                         ` Michael Matz
  2011-07-26 16:18                                           ` Merge alignments from coalesced SSA pointers Michael Matz
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Matz @ 2011-07-26 14:25 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Guenther, Ira Rosen, gcc-patches, Patch Tracking

Hi,

On Tue, 26 Jul 2011, Ulrich Weigand wrote:

> > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> > we'd need to pick a conservative REGNO_POINTER_ALIGN during
> > expansion of the SSA name partition - iterate over all of them in the
> > partition and pick the lowest alignment.  Or even adjust the partitioning
> > to avoid losing alignment information that way.
> 
> That would certainly be helpful.

I'm working on a patch for that, stay tuned.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Merge alignments from coalesced SSA pointers
  2011-07-26 14:25                                         ` Michael Matz
@ 2011-07-26 16:18                                           ` Michael Matz
  2011-07-26 17:23                                             ` Michael Matz
  2011-07-26 17:28                                             ` Merge alignments from coalesced SSA pointers Ulrich Weigand
  0 siblings, 2 replies; 35+ messages in thread
From: Michael Matz @ 2011-07-26 16:18 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches

On Tue, 26 Jul 2011, Michael Matz wrote:

> Hi,
> 
> On Tue, 26 Jul 2011, Ulrich Weigand wrote:
> 
> > > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> > > we'd need to pick a conservative REGNO_POINTER_ALIGN during
> > > expansion of the SSA name partition - iterate over all of them in the
> > > partition and pick the lowest alignment.  Or even adjust the partitioning
> > > to avoid losing alignment information that way.
> > 
> > That would certainly be helpful.
> 
> I'm working on a patch for that, stay tuned.

Like so.  Currently in regstrapping on x86_64-linux.  Could you try if it 
helps spu?

Okay for trunk?

Ciao,
Michael.
	* cfgexpand.c (expand_one_register_var): Use get_pointer_alignment.
	(gimple_expand_cfg): Merge alignment info for coalesced pointer
	SSA names.

Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 176790)
+++ cfgexpand.c	(working copy)
@@ -909,7 +909,7 @@ expand_one_register_var (tree var)
     mark_user_reg (x);
 
   if (POINTER_TYPE_P (type))
-    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
+    mark_reg_pointer (x, get_pointer_alignment (var, BIGGEST_ALIGNMENT));
 }
 
 /* A subroutine of expand_one_var.  Called to assign rtl to a VAR_DECL that
@@ -4265,6 +4265,25 @@ gimple_expand_cfg (void)
 	}
     }
 
+  /* If we have a class containing differently aligned pointers
+     we need to merge those into the corresponding RTL pointer
+     alignment.  */
+  for (i = 1; i < num_ssa_names; i++)
+    {
+      tree name = ssa_name (i);
+      int part;
+      rtx r;
+
+      if (!name || !POINTER_TYPE_P (TREE_TYPE (name)))
+	continue;
+      part = var_to_partition (SA.map, name);
+      if (part == NO_PARTITION)
+	continue;
+      r = SA.partition_to_pseudo[part];
+      if (REG_P (r))
+	mark_reg_pointer (r, get_pointer_alignment (name, BIGGEST_ALIGNMENT));
+    }
+
   /* If this function is `main', emit a call to `__main'
      to run global initializers, etc.  */
   if (DECL_NAME (current_function_decl)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Merge alignments from coalesced SSA pointers
  2011-07-26 16:18                                           ` Merge alignments from coalesced SSA pointers Michael Matz
@ 2011-07-26 17:23                                             ` Michael Matz
  2011-08-08 16:34                                               ` Ulrich Weigand
  2011-07-26 17:28                                             ` Merge alignments from coalesced SSA pointers Ulrich Weigand
  1 sibling, 1 reply; 35+ messages in thread
From: Michael Matz @ 2011-07-26 17:23 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches

Hi,

On Tue, 26 Jul 2011, Michael Matz wrote:

> On Tue, 26 Jul 2011, Michael Matz wrote:
> 
> > Hi,
> > 
> > On Tue, 26 Jul 2011, Ulrich Weigand wrote:
> > 
> > > > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during
> > > > expansion of the SSA name partition - iterate over all of them in the
> > > > partition and pick the lowest alignment.  Or even adjust the partitioning
> > > > to avoid losing alignment information that way.
> > > 
> > > That would certainly be helpful.
> > 
> > I'm working on a patch for that, stay tuned.
> 
> Like so.  Currently in regstrapping on x86_64-linux.  Could you try if it 
> helps spu?
> 
> Okay for trunk?

This patch exposes a problem in libada.  But I'd still be interested if it 
fixes the spu problem.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Merge alignments from coalesced SSA pointers
  2011-07-26 16:18                                           ` Merge alignments from coalesced SSA pointers Michael Matz
  2011-07-26 17:23                                             ` Michael Matz
@ 2011-07-26 17:28                                             ` Ulrich Weigand
  2011-07-27  9:13                                               ` Richard Guenther
  1 sibling, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-07-26 17:28 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches

Michael Matz wrote:
> On Tue, 26 Jul 2011, Michael Matz wrote:
> > On Tue, 26 Jul 2011, Ulrich Weigand wrote:
> > 
> > > > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during
> > > > expansion of the SSA name partition - iterate over all of them in the
> > > > partition and pick the lowest alignment.  Or even adjust the partitioning
> > > > to avoid losing alignment information that way.
> > > 
> > > That would certainly be helpful.
> > 
> > I'm working on a patch for that, stay tuned.
> 
> Like so.  Currently in regstrapping on x86_64-linux.  Could you try if it 
> helps spu?

Well, it does help SPU in the sense that the wrong code generation goes away.

However, it does so by setting REGNO_POINTER_ALIGN to the minimum of 8 just
about every time -- not sure what the impact on generated code quality is.

Maybe get_pointer_alignment should default to the type's alignment if
nothing more specific is known, at least on STRICT_ALIGNMENT targets?
Just like MEM_ALIGN defaults to the mode's alignment ...

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Merge alignments from coalesced SSA pointers
  2011-07-26 17:28                                             ` Merge alignments from coalesced SSA pointers Ulrich Weigand
@ 2011-07-27  9:13                                               ` Richard Guenther
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Guenther @ 2011-07-27  9:13 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Michael Matz, gcc-patches

On Tue, Jul 26, 2011 at 6:56 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Michael Matz wrote:
>> On Tue, 26 Jul 2011, Michael Matz wrote:
>> > On Tue, 26 Jul 2011, Ulrich Weigand wrote:
>> >
>> > > > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
>> > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during
>> > > > expansion of the SSA name partition - iterate over all of them in the
>> > > > partition and pick the lowest alignment.  Or even adjust the partitioning
>> > > > to avoid losing alignment information that way.
>> > >
>> > > That would certainly be helpful.
>> >
>> > I'm working on a patch for that, stay tuned.
>>
>> Like so.  Currently in regstrapping on x86_64-linux.  Could you try if it
>> helps spu?
>
> Well, it does help SPU in the sense that the wrong code generation goes away.
>
> However, it does so by setting REGNO_POINTER_ALIGN to the minimum of 8 just
> about every time -- not sure what the impact on generated code quality is.
>
> Maybe get_pointer_alignment should default to the type's alignment if
> nothing more specific is known, at least on STRICT_ALIGNMENT targets?
> Just like MEM_ALIGN defaults to the mode's alignment ...

Which is bogus ... instead we should improve alignment tracking to
take into account more sources of alignment information (it is very
conservative right now - for a reason, of course, as we get most of
the packed/aligned attribute stuff wrong from the frontend already
as soon as pointers are involved).

Richard.

> Thanks,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Merge alignments from coalesced SSA pointers
  2011-07-26 17:23                                             ` Michael Matz
@ 2011-08-08 16:34                                               ` Ulrich Weigand
  2011-08-09 12:01                                                 ` Michael Matz
  0 siblings, 1 reply; 35+ messages in thread
From: Ulrich Weigand @ 2011-08-08 16:34 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches

Michael Matz wrote:
> Hi,
> 
> On Tue, 26 Jul 2011, Michael Matz wrote:
> 
> > On Tue, 26 Jul 2011, Michael Matz wrote:
> > 
> > > Hi,
> > > 
> > > On Tue, 26 Jul 2011, Ulrich Weigand wrote:
> > > 
> > > > > Well, REG_ATTRS->decl is again a decl, not an SSA name.  I suppose
> > > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during
> > > > > expansion of the SSA name partition - iterate over all of them in the
> > > > > partition and pick the lowest alignment.  Or even adjust the partitioning
> > > > > to avoid losing alignment information that way.
> > > > 
> > > > That would certainly be helpful.
> > > 
> > > I'm working on a patch for that, stay tuned.
> > 
> > Like so.  Currently in regstrapping on x86_64-linux.  Could you try if it 
> > helps spu?
> > 
> > Okay for trunk?
> 
> This patch exposes a problem in libada.  But I'd still be interested if it 
> fixes the spu problem.

It turns out that the patch not only fixes the pr49771 failure, but in fact
*all* outstanding vectorizer test case failures on SPU in both mainline and
the 4.6 branch:

On mainline:
FAIL: gcc.dg/vect/pr49771.c execution test
FAIL: gcc.dg/vect/vect-outer-4f.c execution test
FAIL: gcc.dg/vect/vect-outer-4g.c execution test
FAIL: gcc.dg/vect/vect-outer-4k.c execution test
FAIL: gcc.dg/vect/vect-outer-4l.c execution test
FAIL: gcc.dg/vect/pr49771.c -flto execution test
FAIL: gcc.dg/vect/vect-outer-4f.c -flto execution test
FAIL: gcc.dg/vect/vect-outer-4g.c -flto execution test
FAIL: gcc.dg/vect/vect-outer-4k.c -flto execution test
FAIL: gcc.dg/vect/vect-outer-4l.c -flto execution test

On 4.6:
FAIL: gcc.dg/vect/vect-double-reduc-5.c execution test
FAIL: gcc.dg/vect/vect-outer-4f.c execution test
FAIL: gcc.dg/vect/vect-outer-4g.c execution test
FAIL: gcc.dg/vect/vect-outer-4k.c execution test
FAIL: gcc.dg/vect/vect-outer-4l.c execution test


While I'm still somewhat concerned about potential performance regressions,
correctness is of course more important, so I'd really like to see your
patch (or something along its lines) go in ...

Are there any updates on the libada problem or other reasons why the patch
cannot go in?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Merge alignments from coalesced SSA pointers
  2011-08-08 16:34                                               ` Ulrich Weigand
@ 2011-08-09 12:01                                                 ` Michael Matz
  2011-08-12 16:41                                                   ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Matz @ 2011-08-09 12:01 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches

Hi,

On Mon, 8 Aug 2011, Ulrich Weigand wrote:

> > This patch exposes a problem in libada.  But I'd still be interested 
> > if it fixes the spu problem.
> 
> It turns out that the patch not only fixes the pr49771 failure, but in 
> fact *all* outstanding vectorizer test case failures on SPU in both 
> mainline and the 4.6 branch:
> 
> While I'm still somewhat concerned about potential performance 
> regressions, correctness is of course more important, so I'd really like 
> to see your patch (or something along its lines) go in ...
> 
> Are there any updates on the libada problem or other reasons why the 
> patch cannot go in?

Nope, I've solved that one.  Letme update it.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [rfa] Set alignment of pseudos via get_pointer_alignment
  2011-08-09 12:01                                                 ` Michael Matz
@ 2011-08-12 16:41                                                   ` Michael Matz
  2011-08-12 22:53                                                     ` Richard Guenther
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Matz @ 2011-08-12 16:41 UTC (permalink / raw)
  To: Ulrich Weigand, gcc-patches; +Cc: Richard Guenther

Hi,

On Tue, 9 Aug 2011, Michael Matz wrote:

> > Are there any updates on the libada problem or other reasons why the 
> > patch cannot go in?
> 
> Nope, I've solved that one.  Letme update it.

Like so.  Regstrapped on x86_64-linux (all languages + Ada).  Okay for 
trunk?


Ciao,
Michael.
-- 
	* cfgexpand.c (expand_one_register_var): Use get_pointer_alignment.
	(gimple_expand_cfg): Merge alignment info for coalesced pointer
	SSA names.

Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 177696)
+++ cfgexpand.c	(working copy)
@@ -909,7 +909,7 @@ expand_one_register_var (tree var)
     mark_user_reg (x);
 
   if (POINTER_TYPE_P (type))
-    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
+    mark_reg_pointer (x, get_pointer_alignment (var));
 }
 
 /* A subroutine of expand_one_var.  Called to assign rtl to a VAR_DECL that
@@ -4265,6 +4265,31 @@ gimple_expand_cfg (void)
 	}
     }
 
+  /* If we have a class containing differently aligned pointers
+     we need to merge those into the corresponding RTL pointer
+     alignment.  */
+  for (i = 1; i < num_ssa_names; i++)
+    {
+      tree name = ssa_name (i);
+      int part;
+      rtx r;
+
+      if (!name
+	  || !POINTER_TYPE_P (TREE_TYPE (name))
+	  /* We might have generated new SSA names in
+	     update_alias_info_with_stack_vars.  They will have a NULL
+	     defining statements, and won't be part of the partitioning,
+	     so ignore those.  */
+	  || !SSA_NAME_DEF_STMT (name))
+	continue;
+      part = var_to_partition (SA.map, name);
+      if (part == NO_PARTITION)
+	continue;
+      r = SA.partition_to_pseudo[part];
+      if (REG_P (r))
+	mark_reg_pointer (r, get_pointer_alignment (name));
+    }
+
   /* If this function is `main', emit a call to `__main'
      to run global initializers, etc.  */
   if (DECL_NAME (current_function_decl)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [rfa] Set alignment of pseudos via get_pointer_alignment
  2011-08-12 16:41                                                   ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz
@ 2011-08-12 22:53                                                     ` Richard Guenther
  2011-08-23 15:04                                                       ` Michael Matz
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Guenther @ 2011-08-12 22:53 UTC (permalink / raw)
  To: Michael Matz; +Cc: Ulrich Weigand, gcc-patches

On Fri, Aug 12, 2011 at 6:04 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 9 Aug 2011, Michael Matz wrote:
>
>> > Are there any updates on the libada problem or other reasons why the
>> > patch cannot go in?
>>
>> Nope, I've solved that one.  Letme update it.
>
> Like so.  Regstrapped on x86_64-linux (all languages + Ada).  Okay for
> trunk?

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> --
>        * cfgexpand.c (expand_one_register_var): Use get_pointer_alignment.
>        (gimple_expand_cfg): Merge alignment info for coalesced pointer
>        SSA names.
>
> Index: cfgexpand.c
> ===================================================================
> --- cfgexpand.c (revision 177696)
> +++ cfgexpand.c (working copy)
> @@ -909,7 +909,7 @@ expand_one_register_var (tree var)
>     mark_user_reg (x);
>
>   if (POINTER_TYPE_P (type))
> -    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
> +    mark_reg_pointer (x, get_pointer_alignment (var));
>  }
>
>  /* A subroutine of expand_one_var.  Called to assign rtl to a VAR_DECL that
> @@ -4265,6 +4265,31 @@ gimple_expand_cfg (void)
>        }
>     }
>
> +  /* If we have a class containing differently aligned pointers
> +     we need to merge those into the corresponding RTL pointer
> +     alignment.  */
> +  for (i = 1; i < num_ssa_names; i++)
> +    {
> +      tree name = ssa_name (i);
> +      int part;
> +      rtx r;
> +
> +      if (!name
> +         || !POINTER_TYPE_P (TREE_TYPE (name))
> +         /* We might have generated new SSA names in
> +            update_alias_info_with_stack_vars.  They will have a NULL
> +            defining statements, and won't be part of the partitioning,
> +            so ignore those.  */
> +         || !SSA_NAME_DEF_STMT (name))
> +       continue;
> +      part = var_to_partition (SA.map, name);
> +      if (part == NO_PARTITION)
> +       continue;
> +      r = SA.partition_to_pseudo[part];
> +      if (REG_P (r))
> +       mark_reg_pointer (r, get_pointer_alignment (name));
> +    }
> +
>   /* If this function is `main', emit a call to `__main'
>      to run global initializers, etc.  */
>   if (DECL_NAME (current_function_decl)
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [rfa] Set alignment of pseudos via get_pointer_alignment
  2011-08-12 22:53                                                     ` Richard Guenther
@ 2011-08-23 15:04                                                       ` Michael Matz
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Matz @ 2011-08-23 15:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Guenther, Ulrich Weigand

[-- Attachment #1: Type: TEXT/PLAIN, Size: 205 bytes --]

Hi,

> > Like so.  Regstrapped on x86_64-linux (all languages + Ada).  Okay for
> > trunk?
> 
> Ok.

r177989 (JFYI because it's some time ago to make searching the archives 
easier).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2011-08-23 14:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-19  8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen
2011-07-19  9:49 ` Richard Guenther
2011-07-19 14:01   ` Ira Rosen
2011-07-19 14:03     ` Richard Guenther
2011-07-20 19:47 ` Ulrich Weigand
2011-07-21 12:54   ` Ira Rosen
2011-07-24 14:32     ` Ira Rosen
2011-07-24 18:46       ` Richard Guenther
2011-07-25  9:44         ` Ulrich Weigand
2011-07-25 10:08           ` Richard Guenther
2011-07-25 11:26             ` Ira Rosen
2011-07-25 11:41               ` Richard Guenther
2011-07-25 12:33                 ` Ira Rosen
2011-07-25 13:01                   ` Richard Guenther
2011-07-25 13:07                     ` Richard Guenther
2011-07-25 13:47                       ` Ulrich Weigand
2011-07-25 14:01                         ` Richard Guenther
2011-07-25 14:10                           ` Richard Guenther
2011-07-25 14:14                             ` Richard Guenther
2011-07-25 14:54                               ` Ulrich Weigand
2011-07-25 14:59                                 ` Richard Guenther
2011-07-25 16:12                                   ` Ulrich Weigand
2011-07-26  8:25                                     ` Richard Guenther
2011-07-26  8:59                                       ` Andrew Pinski
2011-07-26 14:23                                       ` Ulrich Weigand
2011-07-26 14:25                                         ` Michael Matz
2011-07-26 16:18                                           ` Merge alignments from coalesced SSA pointers Michael Matz
2011-07-26 17:23                                             ` Michael Matz
2011-08-08 16:34                                               ` Ulrich Weigand
2011-08-09 12:01                                                 ` Michael Matz
2011-08-12 16:41                                                   ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz
2011-08-12 22:53                                                     ` Richard Guenther
2011-08-23 15:04                                                       ` Michael Matz
2011-07-26 17:28                                             ` Merge alignments from coalesced SSA pointers Ulrich Weigand
2011-07-27  9:13                                               ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).