* [patch] Fix PR tree-optimization/49771 @ 2011-07-19 8:24 Ira Rosen 2011-07-19 9:49 ` Richard Guenther 2011-07-20 19:47 ` Ulrich Weigand 0 siblings, 2 replies; 35+ messages in thread From: Ira Rosen @ 2011-07-19 8:24 UTC (permalink / raw) To: gcc-patches; +Cc: Patch Tracking Hi, The vectorizer performs the following alias checks for data-refs with unknown dependence: ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) where segment_length is data-ref's step in the loop multiplied by the loop's number of iterations (in the general case). For invariant data-refs segment_length is 0, since the step is 0. This creates incorrect check for: for (i = 0; i < 1000; i++) for (j = 0; j < 1000; j++) a[j] = a[i] + 1; We check: &a + 4000 <= &a + i*4 || &a + i*4 <= &a and the second check is wrong for i=0. This patch makes segment_length to be sizeof (data-ref type) in case of zero step, changing the checks into &a + 4000 <= &a + i*4 || &a + i*4 + 4 <= &a Bootstrapped and tested on powerpc64-suse-linux. Committed revision 176434. Ira ChangeLog: PR tree-optimization/49771 * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of zero step, set segment length to the size of the data-ref's type. testsuite/ChangeLog: PR tree-optimization/49771 * gcc.dg/vect/pr49771.c: New test. Index: tree-vect-loop-manip.c =================================================================== --- tree-vect-loop-manip.c (revision 176433) +++ tree-vect-loop-manip.c (working copy) @@ -2356,9 +2356,14 @@ static tree vect_vfa_segment_size (struct data_reference *dr, tree length_factor) { tree segment_length; - segment_length = size_binop (MULT_EXPR, - fold_convert (sizetype, DR_STEP (dr)), - fold_convert (sizetype, length_factor)); + + if (!compare_tree_int (DR_STEP (dr), 0)) + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); + else + segment_length = size_binop (MULT_EXPR, + fold_convert (sizetype, DR_STEP (dr)), + fold_convert (sizetype, length_factor)); + if (vect_supportable_dr_alignment (dr, false) == dr_explicit_realign_optimized) { Index: testsuite/gcc.dg/vect/pr49771.c =================================================================== --- testsuite/gcc.dg/vect/pr49771.c (revision 0) +++ testsuite/gcc.dg/vect/pr49771.c (revision 0) @@ -0,0 +1,26 @@ +#include <stdlib.h> +#include <stdarg.h> + +static int a[1000]; + +int +foo (void) +{ + int j; + int i; + for (i = 0; i < 1000; i++) + for (j = 0; j < 1000; j++) + a[j] = a[i] + 1; + return a[0]; +} + +int +main (void) +{ + int res = foo (); + if (res != 1999) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-19 8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen @ 2011-07-19 9:49 ` Richard Guenther 2011-07-19 14:01 ` Ira Rosen 2011-07-20 19:47 ` Ulrich Weigand 1 sibling, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-19 9:49 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches, Patch Tracking On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote: > Hi, > > The vectorizer performs the following alias checks for data-refs with > unknown dependence: > > ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) > > where segment_length is data-ref's step in the loop multiplied by the > loop's number of iterations (in the general case). For invariant > data-refs segment_length is 0, since the step is 0. This creates > incorrect check for: > > for (i = 0; i < 1000; i++) > for (j = 0; j < 1000; j++) > a[j] = a[i] + 1; > > We check: > > &a + 4000 <= &a + i*4 > || &a + i*4 <= &a > > and the second check is wrong for i=0. > > This patch makes segment_length to be sizeof (data-ref type) in case > of zero step, changing the checks into > > &a + 4000 <= &a + i*4 > || &a + i*4 + 4 <= &a > > Bootstrapped and tested on powerpc64-suse-linux. > Committed revision 176434. > > Ira > > ChangeLog: > > PR tree-optimization/49771 > * tree-vect-loop-manip.c (vect_vfa_segment_size): In case of > zero step, set segment length to the size of the data-ref's type. > > testsuite/ChangeLog: > > PR tree-optimization/49771 > * gcc.dg/vect/pr49771.c: New test. > > Index: tree-vect-loop-manip.c > =================================================================== > --- tree-vect-loop-manip.c (revision 176433) > +++ tree-vect-loop-manip.c (working copy) > @@ -2356,9 +2356,14 @@ static tree > vect_vfa_segment_size (struct data_reference *dr, tree length_factor) > { > tree segment_length; > - segment_length = size_binop (MULT_EXPR, > - fold_convert (sizetype, DR_STEP (dr)), > - fold_convert (sizetype, length_factor)); > + > + if (!compare_tree_int (DR_STEP (dr), 0)) integer_zerop (DR_STEP (dr)) > + segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); > + else > + segment_length = size_binop (MULT_EXPR, > + fold_convert (sizetype, DR_STEP (dr)), > + fold_convert (sizetype, length_factor)); > + > if (vect_supportable_dr_alignment (dr, false) > == dr_explicit_realign_optimized) > { > Index: testsuite/gcc.dg/vect/pr49771.c > =================================================================== > --- testsuite/gcc.dg/vect/pr49771.c (revision 0) > +++ testsuite/gcc.dg/vect/pr49771.c (revision 0) > @@ -0,0 +1,26 @@ > +#include <stdlib.h> > +#include <stdarg.h> > + > +static int a[1000]; > + > +int > +foo (void) > +{ > + int j; > + int i; > + for (i = 0; i < 1000; i++) > + for (j = 0; j < 1000; j++) > + a[j] = a[i] + 1; > + return a[0]; > +} > + > +int > +main (void) > +{ > + int res = foo (); > + if (res != 1999) > + abort (); > + return 0; > +} > + > +/* { dg-final { cleanup-tree-dump "vect" } } */ > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-19 9:49 ` Richard Guenther @ 2011-07-19 14:01 ` Ira Rosen 2011-07-19 14:03 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ira Rosen @ 2011-07-19 14:01 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Patch Tracking On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote: > On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote: ; >> + >> + if (!compare_tree_int (DR_STEP (dr), 0)) > > integer_zerop (DR_STEP (dr)) > Right. I'll change this with some other opportunity, if that's ok. Thanks, Ira ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-19 14:01 ` Ira Rosen @ 2011-07-19 14:03 ` Richard Guenther 0 siblings, 0 replies; 35+ messages in thread From: Richard Guenther @ 2011-07-19 14:03 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches, Patch Tracking On Tue, Jul 19, 2011 at 3:27 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > On 19 July 2011 11:50, Richard Guenther <richard.guenther@gmail.com> wrote: >> On Tue, Jul 19, 2011 at 8:25 AM, Ira Rosen <ira.rosen@linaro.org> wrote: > ; >>> + >>> + if (!compare_tree_int (DR_STEP (dr), 0)) >> >> integer_zerop (DR_STEP (dr)) >> > > Right. I'll change this with some other opportunity, if that's ok. Sure. Thanks, Richard. > Thanks, > Ira > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-19 8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen 2011-07-19 9:49 ` Richard Guenther @ 2011-07-20 19:47 ` Ulrich Weigand 2011-07-21 12:54 ` Ira Rosen 1 sibling, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-20 19:47 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches, Patch Tracking Ira Rosen wrote: > PR tree-optimization/49771 > * gcc.dg/vect/pr49771.c: New test. This test fails (with wrong code) on spu-elf ... > +int > +foo (void) > +{ > + int j; > + int i; > + for (i = 0; i < 1000; i++) > + for (j = 0; j < 1000; j++) > + a[j] = a[i] + 1; > + return a[0]; > +} > + > +int > +main (void) > +{ > + int res = foo (); > + if (res != 1999) > + abort (); > + return 0; > +} The return value of foo with vectorization is 1249 instead of 1999 for some reason. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-20 19:47 ` Ulrich Weigand @ 2011-07-21 12:54 ` Ira Rosen 2011-07-24 14:32 ` Ira Rosen 0 siblings, 1 reply; 35+ messages in thread From: Ira Rosen @ 2011-07-21 12:54 UTC (permalink / raw) To: Ulrich Weigand; +Cc: gcc-patches, Patch Tracking On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Ira Rosen wrote: > >> PR tree-optimization/49771 >> * gcc.dg/vect/pr49771.c: New test. > > This test fails (with wrong code) on spu-elf ... > >> +int >> +foo (void) >> +{ >> + int j; >> + int i; >> + for (i = 0; i < 1000; i++) >> + for (j = 0; j < 1000; j++) >> + a[j] = a[i] + 1; >> + return a[0]; >> +} >> + >> +int >> +main (void) >> +{ >> + int res = foo (); >> + if (res != 1999) >> + abort (); >> + return 0; >> +} > > The return value of foo with vectorization is 1249 instead > of 1999 for some reason. I reproduced the failure. It occurs without Richard's (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this patches too. Obviously the vectorized loop is executed, but at the moment I don't understand why. I'll have a better look on Sunday. Ira > > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-21 12:54 ` Ira Rosen @ 2011-07-24 14:32 ` Ira Rosen 2011-07-24 18:46 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ira Rosen @ 2011-07-24 14:32 UTC (permalink / raw) To: Ulrich Weigand; +Cc: gcc-patches, Patch Tracking On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: > On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> >> The return value of foo with vectorization is 1249 instead >> of 1999 for some reason. > > I reproduced the failure. It occurs without Richard's > (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this > patches too. Obviously the vectorized loop is executed, but at the > moment I don't understand why. I'll have a better look on Sunday. Actually it doesn't choose the vectorized code. But the scalar version gets optimized in a harmful way for SPU, AFAIU. Here is the scalar loop after vrp2 <bb 8>: # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> D.4593_42 = (void *) ivtmp.53_32; D.4520_33 = MEM[base: D.4593_42, offset: 0B]; D.4521_34 = D.4520_33 + 1; MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; ivtmp.42_45 = ivtmp.42_50 + 4; if (ivtmp.42_45 != 16) goto <bb 10>; else goto <bb 5>; and the load is changed by dom2 to: <bb 4>: ... D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; ... where vector(4) int * vect_pa.9; And the scalar loop has no rotate for that load: .L3: lqd $13,0($2) lqx $11,$5,$3 cwx $7,$sp,$3 ai $12,$13,1 shufb $6,$12,$11,$7 stqx $6,$5,$3 ai $3,$3,4 ceqi $4,$3,16 I manually added rotqby for $13 and the result was correct (I changed the test to iterate only 4 times to make the things easier). Ira > > Ira > >> >> Bye, >> Ulrich >> >> -- >> Dr. Ulrich Weigand >> GNU Toolchain for Linux on System z and Cell BE >> Ulrich.Weigand@de.ibm.com >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-24 14:32 ` Ira Rosen @ 2011-07-24 18:46 ` Richard Guenther 2011-07-25 9:44 ` Ulrich Weigand 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-24 18:46 UTC (permalink / raw) To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >> On 20 July 2011 21:35, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>> >>> The return value of foo with vectorization is 1249 instead >>> of 1999 for some reason. >> >> I reproduced the failure. It occurs without Richard's >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >> patches too. Obviously the vectorized loop is executed, but at the >> moment I don't understand why. I'll have a better look on Sunday. > > Actually it doesn't choose the vectorized code. But the scalar version > gets optimized in a harmful way for SPU, AFAIU. > Here is the scalar loop after vrp2 > > <bb 8>: > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> > D.4593_42 = (void *) ivtmp.53_32; > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; > D.4521_34 = D.4520_33 + 1; > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; > ivtmp.42_45 = ivtmp.42_50 + 4; > if (ivtmp.42_45 != 16) > goto <bb 10>; > else > goto <bb 5>; > > and the load is changed by dom2 to: > > <bb 4>: > ... > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; > ... > > where vector(4) int * vect_pa.9; > > And the scalar loop has no rotate for that load: Hum. This smells like we are hiding sth from the tree optimizers? > .L3: > lqd $13,0($2) > lqx $11,$5,$3 > cwx $7,$sp,$3 > ai $12,$13,1 > shufb $6,$12,$11,$7 > stqx $6,$5,$3 > ai $3,$3,4 > ceqi $4,$3,16 > > > I manually added rotqby for $13 and the result was correct (I changed > the test to iterate only 4 times to make the things easier). > > Ira > >> >> Ira >> >>> >>> Bye, >>> Ulrich >>> >>> -- >>> Dr. Ulrich Weigand >>> GNU Toolchain for Linux on System z and Cell BE >>> Ulrich.Weigand@de.ibm.com >>> >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-24 18:46 ` Richard Guenther @ 2011-07-25 9:44 ` Ulrich Weigand 2011-07-25 10:08 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-25 9:44 UTC (permalink / raw) To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking Richard Guenther wrote: > On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: > >> I reproduced the failure. It occurs without Richard's > >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this > >> patches too. Obviously the vectorized loop is executed, but at the > >> moment I don't understand why. I'll have a better look on Sunday. > > > > Actually it doesn't choose the vectorized code. But the scalar version > > gets optimized in a harmful way for SPU, AFAIU. > > Here is the scalar loop after vrp2 > > > > <bb 8>: > > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> > > D.4593_42 = (void *) ivtmp.53_32; > > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; > > D.4521_34 = D.4520_33 + 1; > > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; > > ivtmp.42_45 = ivtmp.42_50 + 4; > > if (ivtmp.42_45 != 16) > > goto <bb 10>; > > else > > goto <bb 5>; > > > > and the load is changed by dom2 to: > > > > <bb 4>: > > ... > > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; > > ... > > > > where vector(4) int * vect_pa.9; > > > > And the scalar loop has no rotate for that load: > > Hum. This smells like we are hiding sth from the tree optimizers? Well, the back-end assumes a pointer to vector type is always naturally aligned, and therefore the data it points to can be accessed via a simple load, with no extra rotate needed. It seems what happened here is that somehow, a pointer to int gets replaced by a pointer to vector, even though their alignment properties are different. This vector pointer must originate somehow in the vectorizer, however, since the original C source does not contain any vector types at all ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 9:44 ` Ulrich Weigand @ 2011-07-25 10:08 ` Richard Guenther 2011-07-25 11:26 ` Ira Rosen 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 10:08 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: >> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >> >> I reproduced the failure. It occurs without Richard's >> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >> >> patches too. Obviously the vectorized loop is executed, but at the >> >> moment I don't understand why. I'll have a better look on Sunday. >> > >> > Actually it doesn't choose the vectorized code. But the scalar version >> > gets optimized in a harmful way for SPU, AFAIU. >> > Here is the scalar loop after vrp2 >> > >> > <bb 8>: >> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >> > D.4593_42 = (void *) ivtmp.53_32; >> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >> > D.4521_34 = D.4520_33 + 1; >> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >> > ivtmp.42_45 = ivtmp.42_50 + 4; >> > if (ivtmp.42_45 != 16) >> > goto <bb 10>; >> > else >> > goto <bb 5>; >> > >> > and the load is changed by dom2 to: >> > >> > <bb 4>: >> > ... >> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >> > ... >> > >> > where vector(4) int * vect_pa.9; >> > >> > And the scalar loop has no rotate for that load: >> >> Hum. This smells like we are hiding sth from the tree optimizers? > > Well, the back-end assumes a pointer to vector type is always > naturally aligned, and therefore the data it points to can be > accessed via a simple load, with no extra rotate needed. I can't see any use of VECTOR_TYPE in config/spu/, and assuming anything about alignment just because of the kind of the pointer is bogus - the scalar code does a scalar read using that pointer. So the backend better should look at the memory operation, not at the pointer type. That said, I can't find any code that looks suspicious in the spu backend. > It seems what happened here is that somehow, a pointer to int > gets replaced by a pointer to vector, even though their alignment > properties are different. No, they are not. They get replaced if they are value-equivalent in which case they are also alignment-equivalent. But well, the dump snippet wasn't complete and I don't feel like building a SPU cross to verify myself. > This vector pointer must originate somehow in the vectorizer, > however, since the original C source does not contain any > vector types at all ... That's for sure true, it must be the initial pointer we then increment in the vectorized loop. Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 10:08 ` Richard Guenther @ 2011-07-25 11:26 ` Ira Rosen 2011-07-25 11:41 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ira Rosen @ 2011-07-25 11:26 UTC (permalink / raw) To: Richard Guenther; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking [-- Attachment #1: Type: text/plain, Size: 3020 bytes --] On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> Richard Guenther wrote: >>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >>> >> I reproduced the failure. It occurs without Richard's >>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >>> >> patches too. Obviously the vectorized loop is executed, but at the >>> >> moment I don't understand why. I'll have a better look on Sunday. >>> > >>> > Actually it doesn't choose the vectorized code. But the scalar version >>> > gets optimized in a harmful way for SPU, AFAIU. >>> > Here is the scalar loop after vrp2 >>> > >>> > <bb 8>: >>> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >>> > D.4593_42 = (void *) ivtmp.53_32; >>> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >>> > D.4521_34 = D.4520_33 + 1; >>> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >>> > ivtmp.42_45 = ivtmp.42_50 + 4; >>> > if (ivtmp.42_45 != 16) >>> > goto <bb 10>; >>> > else >>> > goto <bb 5>; >>> > >>> > and the load is changed by dom2 to: >>> > >>> > <bb 4>: >>> > ... >>> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >>> > ... >>> > >>> > where vector(4) int * vect_pa.9; >>> > >>> > And the scalar loop has no rotate for that load: >>> >>> Hum. This smells like we are hiding sth from the tree optimizers? >> >> Well, the back-end assumes a pointer to vector type is always >> naturally aligned, and therefore the data it points to can be >> accessed via a simple load, with no extra rotate needed. > > I can't see any use of VECTOR_TYPE in config/spu/, and assuming > anything about alignment just because of the kind of the pointer > is bogus - the scalar code does a scalar read using that pointer. > So the backend better should look at the memory operation, not > at the pointer type. That said, I can't find any code that looks > suspicious in the spu backend. > >> It seems what happened here is that somehow, a pointer to int >> gets replaced by a pointer to vector, even though their alignment >> properties are different. > > No, they are not. They get replaced if they are value-equivalent > in which case they are also alignment-equivalent. But well, the > dump snippet wasn't complete and I don't feel like building a > SPU cross to verify myself. I am attaching the complete file. Thanks, Ira > >> This vector pointer must originate somehow in the vectorizer, >> however, since the original C source does not contain any >> vector types at all ... > > That's for sure true, it must be the initial pointer we then increment > in the vectorized loop. > > Richard. > >> Bye, >> Ulrich >> >> -- >> Dr. Ulrich Weigand >> GNU Toolchain for Linux on System z and Cell BE >> Ulrich.Weigand@de.ibm.com >> > [-- Attachment #2: my--pr49771.c.124t.dom2 --] [-- Type: application/octet-stream, Size: 19190 bytes --] ;; Function foo (foo, funcdef_no=0, decl_uid=2839, cgraph_uid=0) ;; 3 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 10 5 6 9 7 8 ;; ;; Loop 1 ;; header 7, latch 9 ;; depth 1, outer 0 ;; nodes: 7 9 6 5 4 3 10 ;; ;; Loop 2 ;; header 4, latch 10 ;; depth 2, outer 1 ;; nodes: 4 10 ;; 2 succs { 7 } ;; 3 succs { 4 } ;; 4 succs { 10 6 } ;; 10 succs { 4 } ;; 5 succs { 6 } ;; 6 succs { 9 8 } ;; 9 succs { 7 } ;; 7 succs { 3 5 } ;; 8 succs { 1 } Optimizing block #0 Optimizing block #2 Optimizing statement ivtmp.53_37 = (long unsigned int) &a; LKUP STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; 2>>> STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; Optimizing statement a.55_39 = (long unsigned int) &a; LKUP STMT a.55_39 = nop_expr &a a.55_39 = (long unsigned int) &a; FIND: ivtmp.53_37 Replaced redundant expr '(long unsigned int) &a' with 'ivtmp.53_37' ==== ASGN a.55_39 = ivtmp.53_37 Optimizing statement D.4591_40 = a.55_39 + 16; Replaced 'a.55_39' with variable 'ivtmp.53_37' LKUP STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; 2>>> STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; Optimizing block #7 Optimizing statement vect_pa.12_24 = &a; LKUP STMT vect_pa.12_24 = &a LKUP STMT vect_pa.12_24 = &a vect_pa.12_24 = &a; ==== ASGN vect_pa.12_24 = &a Optimizing statement D.4589_36 = ivtmp.53_32 + 16; LKUP STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; 2>>> STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; Optimizing statement D.4540_11 = (vector(4) int *) D.4589_36; LKUP STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; 2>>> STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; Optimizing statement D.4541_26 = D.4540_11 <= &a; LKUP STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; 2>>> STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; Optimizing statement D.4542_27 = &MEM[(void *)&a + 16B]; LKUP STMT D.4542_27 = &MEM[(void *)&a + 16B] D.4542_27 = &MEM[(void *)&a + 16B]; ==== ASGN D.4542_27 = &MEM[(void *)&a + 16B] Optimizing statement vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; 2>>> STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; Optimizing statement D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; 2>>> STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; Optimizing statement D.4544_29 = D.4541_26 & D.4543_28; LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; 2>>> STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; Optimizing statement if (D.4544_29 != 0) LKUP STMT D.4544_29 ne_expr 0 if (D.4544_29 != 0) Optimizing block #3 0>>> COPY D.4544_29 = 0 Optimizing statement ivtmp.42_59 = 0; LKUP STMT ivtmp.42_59 = 0 ivtmp.42_59 = 0; ==== ASGN ivtmp.42_59 = 0 Optimizing block #4 Optimizing statement D.4593_42 = (void *) ivtmp.53_32; LKUP STMT D.4593_42 = nop_expr ivtmp.53_32 D.4593_42 = (void *) ivtmp.53_32; FIND: vect_pa.9_19 Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19' ==== ASGN D.4593_42 = vect_pa.9_19 Optimizing statement D.4520_33 = MEM[base: D.4593_42, offset: 0B]; Replaced 'D.4593_42' with variable 'vect_pa.9_19' LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; 2>>> STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing statement D.4521_34 = D.4520_33 + 1; LKUP STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; 2>>> STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; Optimizing statement MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; LKUP STMT MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34 MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; 2>>> STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; Optimizing statement ivtmp.42_45 = ivtmp.42_50 + 4; LKUP STMT ivtmp.42_45 = ivtmp.42_50 plus_expr 4 ivtmp.42_45 = ivtmp.42_50 + 4; Optimizing statement if (ivtmp.42_45 != 16) LKUP STMT ivtmp.42_45 ne_expr 16 if (ivtmp.42_45 != 16) Optimizing block #10 1>>> COND 1 = ivtmp.42_45 ne_expr 16 1>>> COND 0 = ivtmp.42_45 eq_expr 16 <<<< COND 0 = ivtmp.42_45 eq_expr 16 <<<< COND 1 = ivtmp.42_45 ne_expr 16 0>>> COPY ivtmp.42_45 = 16 1>>> COND 1 = ivtmp.42_45 le_expr 16 1>>> COND 1 = ivtmp.42_45 ge_expr 16 1>>> COND 1 = ivtmp.42_45 eq_expr 16 1>>> COND 0 = ivtmp.42_45 ne_expr 16 LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) <<<< COND 0 = ivtmp.42_45 ne_expr 16 <<<< COND 1 = ivtmp.42_45 eq_expr 16 <<<< COND 1 = ivtmp.42_45 ge_expr 16 <<<< COND 1 = ivtmp.42_45 le_expr 16 <<<< STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; <<<< STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; <<<< STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; <<<< COPY D.4544_29 = 0 LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; LKUP STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; Optimizing block #5 0>>> COPY D.4544_29 = 1 Optimizing statement vect_cst_.29_56 = { 1, 1, 1, 1 }; LKUP STMT vect_cst_.29_56 = { 1, 1, 1, 1 } vect_cst_.29_56 = { 1, 1, 1, 1 }; ==== ASGN vect_cst_.29_56 = { 1, 1, 1, 1 } Optimizing statement vect_pa.33_58 = &a; LKUP STMT vect_pa.33_58 = &a vect_pa.33_58 = &a; ==== ASGN vect_pa.33_58 = &a Optimizing statement D.4592_41 = (void *) ivtmp.53_32; LKUP STMT D.4592_41 = nop_expr ivtmp.53_32 D.4592_41 = (void *) ivtmp.53_32; FIND: vect_pa.9_19 Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19' ==== ASGN D.4592_41 = vect_pa.9_19 Optimizing statement D.4520_5 = MEM[base: D.4592_41, offset: 0B]; Replaced 'D.4592_41' with variable 'vect_pa.9_19' LKUP STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; 2>>> STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing statement vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; LKUP STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; 2>>> STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; Optimizing statement vect_var_.28_57 = vect_cst_.27_55 + vect_cst_.29_56; Replaced 'vect_cst_.29_56' with constant '{ 1, 1, 1, 1 }' LKUP STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; 2>>> STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; Optimizing statement MEM[(int[4] *)&a] = vect_var_.28_57; LKUP STMT MEM[(int[4] *)&a] = vect_var_.28_57 MEM[(int[4] *)&a] = vect_var_.28_57; LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; 2>>> STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; Optimizing statement vect_pa.30_60 = &MEM[(void *)&a + 16B]; LKUP STMT vect_pa.30_60 = &MEM[(void *)&a + 16B] vect_pa.30_60 = &MEM[(void *)&a + 16B]; ==== ASGN vect_pa.30_60 = &MEM[(void *)&a + 16B] Optimizing statement ivtmp.34_62 = 1; LKUP STMT ivtmp.34_62 = 1 ivtmp.34_62 = 1; ==== ASGN ivtmp.34_62 = 1 LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) <<<< STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; <<<< STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; <<<< STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; <<<< STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing block #6 Optimizing statement ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; Optimizing statement if (ivtmp.53_30 != D.4591_40) LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) Optimizing block #8 0>>> COPY ivtmp.53_30 = D.4591_40 1>>> COND 1 = ivtmp.53_30 le_expr D.4591_40 1>>> COND 1 = ivtmp.53_30 ge_expr D.4591_40 1>>> COND 1 = ivtmp.53_30 eq_expr D.4591_40 1>>> COND 0 = ivtmp.53_30 ne_expr D.4591_40 Optimizing statement D.4522_9 = a[0]; LKUP STMT D.4522_9 = a[0] D.4522_9 = a[0]; 2>>> STMT D.4522_9 = a[0] D.4522_9 = a[0]; Optimizing statement return D.4522_9; <<<< STMT D.4522_9 = a[0] D.4522_9 = a[0]; <<<< COND 0 = ivtmp.53_30 ne_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 ge_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 le_expr D.4591_40 <<<< COPY ivtmp.53_30 = D.4591_40 Optimizing block #9 1>>> COND 1 = ivtmp.53_30 ne_expr D.4591_40 1>>> COND 0 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 0 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 ne_expr D.4591_40 <<<< STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; <<<< STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; <<<< STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; <<<< STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; <<<< STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; <<<< STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; LKUP STMT D.4589_36 = ivtmp.53_37 plus_expr 16 D.4589_36 = ivtmp.53_37 + 16; FIND: D.4591_40 LKUP STMT D.4540_11 = nop_expr D.4591_40 D.4540_11 = (vector(4) int *) D.4591_40; LKUP STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_37 vect_pa.9_19 = (vector(4) int *) ivtmp.53_37; LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; LKUP STMT D.4544_29 ne_expr 0 if (D.4544_29 != 0) <<<< STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; <<<< STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; Removing basic block 9 ;; basic block 9, loop depth 1, count 0 ;; prev block 6, next block 7 ;; pred: ;; succ: 7 [100.0%] (fallthru,dfs_back) <bb 9>: Invalid sum of incoming frequencies 0, should be 1500 Removing basic block 10 ;; basic block 10, loop depth 2, count 0 ;; prev block 4, next block 5 ;; pred: ;; succ: 4 [100.0%] (fallthru,dfs_back) <bb 10>: Invalid sum of incoming frequencies 0, should be 1200 goto <bb 4>; Scope blocks after cleanups: { Scope block #0 } foo () { void * D.4593; void * D.4592; long unsigned int a.55; long unsigned int D.4591; long unsigned int D.4589; long unsigned int ivtmp.53; sizetype ivtmp.42; unsigned int ivtmp.34; vector(4) int * vect_pa.33; vector(4) int * vect_pa.30; vector(4) int vect_cst_.29; vector(4) int vect_var_.28; vector(4) int vect_cst_.27; vector(4) int * D.4540; _Bool D.4541; vector(4) int * D.4542; _Bool D.4543; _Bool D.4544; vector(4) int * vect_pa.12; vector(4) int * vect_pa.9; int D.4522; int D.4521; int D.4520; <bb 2>: ivtmp.53_37 = (long unsigned int) &a; a.55_39 = ivtmp.53_37; D.4591_40 = ivtmp.53_37 + 16; goto <bb 7>; <bb 3>: ivtmp.42_59 = 0; <bb 4>: # ivtmp.42_50 = PHI <0(3), ivtmp.42_45(4)> D.4593_42 = vect_pa.9_19; D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; D.4521_34 = D.4520_33 + 1; MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; ivtmp.42_45 = ivtmp.42_50 + 4; if (ivtmp.42_45 != 16) goto <bb 4>; else goto <bb 6>; <bb 5>: vect_cst_.29_56 = { 1, 1, 1, 1 }; vect_pa.33_58 = &a; D.4592_41 = vect_pa.9_19; D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; MEM[(int[4] *)&a] = vect_var_.28_57; vect_pa.30_60 = &MEM[(void *)&a + 16B]; ivtmp.34_62 = 1; <bb 6>: ivtmp.53_30 = ivtmp.53_32 + 4; if (ivtmp.53_30 != D.4591_40) goto <bb 7>; else goto <bb 8>; <bb 7>: D.4589_36 = ivtmp.53_32 + 16; D.4540_11 = (vector(4) int *) D.4589_36; D.4541_26 = D.4540_11 <= &a; D.4542_27 = &MEM[(void *)&a + 16B]; vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; D.4544_29 = D.4541_26 & D.4543_28; if (D.4544_29 != 0) goto <bb 5>; else goto <bb 3>; <bb 8>: D.4522_9 = a[0]; return D.4522_9; } ;; Function main (main, funcdef_no=1, decl_uid=2850, cgraph_uid=1) (executed once) ;; 2 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 5 7 6 ;; ;; Loop 1 ;; header 5, latch 7 ;; depth 1, outer 0 ;; nodes: 5 7 ;; 2 succs { 3 4 } ;; 3 succs { 4 } ;; 4 succs { 5 } ;; 5 succs { 7 6 } ;; 7 succs { 5 } ;; 6 succs { 1 } Optimizing block #0 Optimizing block #2 Optimizing statement a[0] = 0; LKUP STMT a[0] = 0 a[0] = 0; LKUP STMT 0 = a[0] 0 = a[0]; LKUP STMT 0 = a[0] 0 = a[0]; 2>>> STMT 0 = a[0] 0 = a[0]; Optimizing statement a[1] = 0; LKUP STMT a[1] = 0 a[1] = 0; LKUP STMT 0 = a[1] 0 = a[1]; LKUP STMT 0 = a[1] 0 = a[1]; 2>>> STMT 0 = a[1] 0 = a[1]; Optimizing statement a[2] = 0; LKUP STMT a[2] = 0 a[2] = 0; LKUP STMT 0 = a[2] 0 = a[2]; LKUP STMT 0 = a[2] 0 = a[2]; 2>>> STMT 0 = a[2] 0 = a[2]; Optimizing statement a[3] = 0; LKUP STMT a[3] = 0 a[3] = 0; LKUP STMT 0 = a[3] 0 = a[3]; LKUP STMT 0 = a[3] 0 = a[3]; 2>>> STMT 0 = a[3] 0 = a[3]; Optimizing statement res_5 = foo (); Optimizing statement if (res_5 != 31) LKUP STMT res_5 ne_expr 31 if (res_5 != 31) Optimizing block #3 1>>> COND 1 = res_5 ne_expr 31 1>>> COND 0 = res_5 eq_expr 31 Optimizing statement printf ("%d\n", res_5); <<<< COND 0 = res_5 eq_expr 31 <<<< COND 1 = res_5 ne_expr 31 Optimizing block #4 Optimizing block #5 Optimizing statement D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; LKUP STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; 2>>> STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; Optimizing statement printf ("%d ", D.4517_7); Optimizing statement ivtmp.67_12 = ivtmp.67_26 + 4; LKUP STMT ivtmp.67_12 = ivtmp.67_26 plus_expr 4 ivtmp.67_12 = ivtmp.67_26 + 4; Optimizing statement if (ivtmp.67_12 != 16) LKUP STMT ivtmp.67_12 ne_expr 16 if (ivtmp.67_12 != 16) Optimizing block #6 0>>> COPY ivtmp.67_12 = 16 1>>> COND 1 = ivtmp.67_12 le_expr 16 1>>> COND 1 = ivtmp.67_12 ge_expr 16 1>>> COND 1 = ivtmp.67_12 eq_expr 16 1>>> COND 0 = ivtmp.67_12 ne_expr 16 Optimizing statement __builtin_putchar (10); Optimizing statement return 0; <<<< COND 0 = ivtmp.67_12 ne_expr 16 <<<< COND 1 = ivtmp.67_12 eq_expr 16 <<<< COND 1 = ivtmp.67_12 ge_expr 16 <<<< COND 1 = ivtmp.67_12 le_expr 16 <<<< COPY ivtmp.67_12 = 16 Optimizing block #7 1>>> COND 1 = ivtmp.67_12 ne_expr 16 1>>> COND 0 = ivtmp.67_12 eq_expr 16 <<<< COND 0 = ivtmp.67_12 eq_expr 16 <<<< COND 1 = ivtmp.67_12 ne_expr 16 <<<< STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; LKUP STMT D.4517_7 = MEM[symbol: a, index: 0, offset: 0B] D.4517_7 = MEM[symbol: a, index: 0, offset: 0B]; <<<< STMT 0 = a[3] 0 = a[3]; <<<< STMT 0 = a[2] 0 = a[2]; <<<< STMT 0 = a[1] 0 = a[1]; <<<< STMT 0 = a[0] 0 = a[0]; Removing basic block 7 ;; basic block 7, loop depth 1, count 0 ;; prev block 5, next block 6 ;; pred: ;; succ: 5 [100.0%] (fallthru,dfs_back) <bb 7>: Invalid sum of incoming frequencies 0, should be 6000 goto <bb 5>; Scope blocks after cleanups: { Scope block #0 int res; } main () { sizetype ivtmp.67; int res; int D.4517; <bb 2>: a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; res_5 = foo (); if (res_5 != 31) goto <bb 3>; else goto <bb 4>; <bb 3>: printf ("%d\n", res_5); <bb 4>: <bb 5>: # ivtmp.67_26 = PHI <ivtmp.67_12(5), 0(4)> D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; printf ("%d ", D.4517_7); ivtmp.67_12 = ivtmp.67_26 + 4; if (ivtmp.67_12 != 16) goto <bb 5>; else goto <bb 6>; <bb 6>: __builtin_putchar (10); return 0; } [-- Attachment #3: my--pr49771.c --] [-- Type: text/x-csrc, Size: 528 bytes --] #include <stdio.h> #include <stdarg.h> #define N 4 static int a[N]; __attribute__ ((noinline)) int foo (void) { int j; int i; for (i = 0; i < N; i++) for (j = 0; j < N; j++) a[j] = a[i] + 1; return a[0]; } int main (void) { int res, i; for (i = 0; i < N; i++) a[i] = 0; res = foo (); if (res != 31) printf ("%d\n", res); for (i = 0; i < N; i++) printf ("%d ", a[i]); printf ("\n"); return 0; } /* { dg-final { cleanup-tree-dump "vect" } } */ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 11:26 ` Ira Rosen @ 2011-07-25 11:41 ` Richard Guenther 2011-07-25 12:33 ` Ira Rosen 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 11:41 UTC (permalink / raw) To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote: >> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>> Richard Guenther wrote: >>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >>>> >> I reproduced the failure. It occurs without Richard's >>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >>>> >> patches too. Obviously the vectorized loop is executed, but at the >>>> >> moment I don't understand why. I'll have a better look on Sunday. >>>> > >>>> > Actually it doesn't choose the vectorized code. But the scalar version >>>> > gets optimized in a harmful way for SPU, AFAIU. >>>> > Here is the scalar loop after vrp2 >>>> > >>>> > <bb 8>: >>>> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >>>> > D.4593_42 = (void *) ivtmp.53_32; >>>> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >>>> > D.4521_34 = D.4520_33 + 1; >>>> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >>>> > ivtmp.42_45 = ivtmp.42_50 + 4; >>>> > if (ivtmp.42_45 != 16) >>>> > goto <bb 10>; >>>> > else >>>> > goto <bb 5>; >>>> > >>>> > and the load is changed by dom2 to: >>>> > >>>> > <bb 4>: >>>> > ... >>>> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >>>> > ... >>>> > >>>> > where vector(4) int * vect_pa.9; >>>> > >>>> > And the scalar loop has no rotate for that load: >>>> >>>> Hum. This smells like we are hiding sth from the tree optimizers? >>> >>> Well, the back-end assumes a pointer to vector type is always >>> naturally aligned, and therefore the data it points to can be >>> accessed via a simple load, with no extra rotate needed. >> >> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >> anything about alignment just because of the kind of the pointer >> is bogus - the scalar code does a scalar read using that pointer. >> So the backend better should look at the memory operation, not >> at the pointer type. That said, I can't find any code that looks >> suspicious in the spu backend. >> >>> It seems what happened here is that somehow, a pointer to int >>> gets replaced by a pointer to vector, even though their alignment >>> properties are different. >> >> No, they are not. They get replaced if they are value-equivalent >> in which case they are also alignment-equivalent. But well, the >> dump snippet wasn't complete and I don't feel like building a >> SPU cross to verify myself. > > I am attaching the complete file. The issue seems to be that the IV in question, vect_pa.9_19, is defined as vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; but ivtmp.53_32 does not have a definition at all. Richard. > > Thanks, > Ira > > > >> >>> This vector pointer must originate somehow in the vectorizer, >>> however, since the original C source does not contain any >>> vector types at all ... >> >> That's for sure true, it must be the initial pointer we then increment >> in the vectorized loop. >> >> Richard. >> >>> Bye, >>> Ulrich >>> >>> -- >>> Dr. Ulrich Weigand >>> GNU Toolchain for Linux on System z and Cell BE >>> Ulrich.Weigand@de.ibm.com >>> >> > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 11:41 ` Richard Guenther @ 2011-07-25 12:33 ` Ira Rosen 2011-07-25 13:01 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ira Rosen @ 2011-07-25 12:33 UTC (permalink / raw) To: Richard Guenther; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking [-- Attachment #1: Type: text/plain, Size: 3634 bytes --] On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote: >>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>>> Richard Guenther wrote: >>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>> >> I reproduced the failure. It occurs without Richard's >>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >>>>> >> patches too. Obviously the vectorized loop is executed, but at the >>>>> >> moment I don't understand why. I'll have a better look on Sunday. >>>>> > >>>>> > Actually it doesn't choose the vectorized code. But the scalar version >>>>> > gets optimized in a harmful way for SPU, AFAIU. >>>>> > Here is the scalar loop after vrp2 >>>>> > >>>>> > <bb 8>: >>>>> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >>>>> > D.4593_42 = (void *) ivtmp.53_32; >>>>> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >>>>> > D.4521_34 = D.4520_33 + 1; >>>>> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >>>>> > ivtmp.42_45 = ivtmp.42_50 + 4; >>>>> > if (ivtmp.42_45 != 16) >>>>> > goto <bb 10>; >>>>> > else >>>>> > goto <bb 5>; >>>>> > >>>>> > and the load is changed by dom2 to: >>>>> > >>>>> > <bb 4>: >>>>> > ... >>>>> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >>>>> > ... >>>>> > >>>>> > where vector(4) int * vect_pa.9; >>>>> > >>>>> > And the scalar loop has no rotate for that load: >>>>> >>>>> Hum. This smells like we are hiding sth from the tree optimizers? >>>> >>>> Well, the back-end assumes a pointer to vector type is always >>>> naturally aligned, and therefore the data it points to can be >>>> accessed via a simple load, with no extra rotate needed. >>> >>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >>> anything about alignment just because of the kind of the pointer >>> is bogus - the scalar code does a scalar read using that pointer. >>> So the backend better should look at the memory operation, not >>> at the pointer type. That said, I can't find any code that looks >>> suspicious in the spu backend. >>> >>>> It seems what happened here is that somehow, a pointer to int >>>> gets replaced by a pointer to vector, even though their alignment >>>> properties are different. >>> >>> No, they are not. They get replaced if they are value-equivalent >>> in which case they are also alignment-equivalent. But well, the >>> dump snippet wasn't complete and I don't feel like building a >>> SPU cross to verify myself. >> >> I am attaching the complete file. > > The issue seems to be that the IV in question, vect_pa.9_19, is > defined as > > vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; > > but ivtmp.53_32 does not have a definition at all. > I am sorry, it's my fault, resending the file. Sorry, Ira > Richard. > >> >> Thanks, >> Ira >> >> >> >>> >>>> This vector pointer must originate somehow in the vectorizer, >>>> however, since the original C source does not contain any >>>> vector types at all ... >>> >>> That's for sure true, it must be the initial pointer we then increment >>> in the vectorized loop. >>> >>> Richard. >>> >>>> Bye, >>>> Ulrich >>>> >>>> -- >>>> Dr. Ulrich Weigand >>>> GNU Toolchain for Linux on System z and Cell BE >>>> Ulrich.Weigand@de.ibm.com >>>> >>> >> > [-- Attachment #2: my--pr49771.c.124t.dom2 --] [-- Type: application/octet-stream, Size: 19269 bytes --] ;; Function foo (foo, funcdef_no=0, decl_uid=2839, cgraph_uid=0) ;; 3 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 10 5 6 9 7 8 ;; ;; Loop 1 ;; header 7, latch 9 ;; depth 1, outer 0 ;; nodes: 7 9 6 5 4 3 10 ;; ;; Loop 2 ;; header 4, latch 10 ;; depth 2, outer 1 ;; nodes: 4 10 ;; 2 succs { 7 } ;; 3 succs { 4 } ;; 4 succs { 10 6 } ;; 10 succs { 4 } ;; 5 succs { 6 } ;; 6 succs { 9 8 } ;; 9 succs { 7 } ;; 7 succs { 3 5 } ;; 8 succs { 1 } Optimizing block #0 Optimizing block #2 Optimizing statement ivtmp.53_37 = (long unsigned int) &a; LKUP STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; 2>>> STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; Optimizing statement a.55_39 = (long unsigned int) &a; LKUP STMT a.55_39 = nop_expr &a a.55_39 = (long unsigned int) &a; FIND: ivtmp.53_37 Replaced redundant expr '(long unsigned int) &a' with 'ivtmp.53_37' ==== ASGN a.55_39 = ivtmp.53_37 Optimizing statement D.4591_40 = a.55_39 + 16; Replaced 'a.55_39' with variable 'ivtmp.53_37' LKUP STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; 2>>> STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; Optimizing block #7 Optimizing statement vect_pa.12_24 = &a; LKUP STMT vect_pa.12_24 = &a LKUP STMT vect_pa.12_24 = &a vect_pa.12_24 = &a; ==== ASGN vect_pa.12_24 = &a Optimizing statement D.4589_36 = ivtmp.53_32 + 16; LKUP STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; 2>>> STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; Optimizing statement D.4540_11 = (vector(4) int *) D.4589_36; LKUP STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; 2>>> STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; Optimizing statement D.4541_26 = D.4540_11 <= &a; LKUP STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; 2>>> STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; Optimizing statement D.4542_27 = &MEM[(void *)&a + 16B]; LKUP STMT D.4542_27 = &MEM[(void *)&a + 16B] D.4542_27 = &MEM[(void *)&a + 16B]; ==== ASGN D.4542_27 = &MEM[(void *)&a + 16B] Optimizing statement vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; 2>>> STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; Optimizing statement D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; 2>>> STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; Optimizing statement D.4544_29 = D.4541_26 & D.4543_28; LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; 2>>> STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; Optimizing statement if (D.4544_29 != 0) LKUP STMT D.4544_29 ne_expr 0 if (D.4544_29 != 0) Optimizing block #3 0>>> COPY D.4544_29 = 0 Optimizing statement ivtmp.42_59 = 0; LKUP STMT ivtmp.42_59 = 0 ivtmp.42_59 = 0; ==== ASGN ivtmp.42_59 = 0 Optimizing block #4 Optimizing statement D.4593_42 = (void *) ivtmp.53_32; LKUP STMT D.4593_42 = nop_expr ivtmp.53_32 D.4593_42 = (void *) ivtmp.53_32; FIND: vect_pa.9_19 Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19' ==== ASGN D.4593_42 = vect_pa.9_19 Optimizing statement D.4520_33 = MEM[base: D.4593_42, offset: 0B]; Replaced 'D.4593_42' with variable 'vect_pa.9_19' LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; 2>>> STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing statement D.4521_34 = D.4520_33 + 1; LKUP STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; 2>>> STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; Optimizing statement MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; LKUP STMT MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34 MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; LKUP STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; 2>>> STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; Optimizing statement ivtmp.42_45 = ivtmp.42_50 + 4; LKUP STMT ivtmp.42_45 = ivtmp.42_50 plus_expr 4 ivtmp.42_45 = ivtmp.42_50 + 4; Optimizing statement if (ivtmp.42_45 != 16) LKUP STMT ivtmp.42_45 ne_expr 16 if (ivtmp.42_45 != 16) Optimizing block #10 1>>> COND 1 = ivtmp.42_45 ne_expr 16 1>>> COND 0 = ivtmp.42_45 eq_expr 16 <<<< COND 0 = ivtmp.42_45 eq_expr 16 <<<< COND 1 = ivtmp.42_45 ne_expr 16 0>>> COPY ivtmp.42_45 = 16 1>>> COND 1 = ivtmp.42_45 le_expr 16 1>>> COND 1 = ivtmp.42_45 ge_expr 16 1>>> COND 1 = ivtmp.42_45 eq_expr 16 1>>> COND 0 = ivtmp.42_45 ne_expr 16 LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) <<<< COND 0 = ivtmp.42_45 ne_expr 16 <<<< COND 1 = ivtmp.42_45 eq_expr 16 <<<< COND 1 = ivtmp.42_45 ge_expr 16 <<<< COND 1 = ivtmp.42_45 le_expr 16 <<<< STMT D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B] D.4521_34 = MEM[symbol: a, index: ivtmp.42_50, offset: 0B]; <<<< STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; <<<< STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; <<<< COPY D.4544_29 = 0 LKUP STMT D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; LKUP STMT D.4521_34 = D.4520_33 plus_expr 1 D.4521_34 = D.4520_33 + 1; Optimizing block #5 0>>> COPY D.4544_29 = 1 Optimizing statement vect_cst_.29_56 = { 1, 1, 1, 1 }; LKUP STMT vect_cst_.29_56 = { 1, 1, 1, 1 } vect_cst_.29_56 = { 1, 1, 1, 1 }; ==== ASGN vect_cst_.29_56 = { 1, 1, 1, 1 } Optimizing statement vect_pa.33_58 = &a; LKUP STMT vect_pa.33_58 = &a vect_pa.33_58 = &a; ==== ASGN vect_pa.33_58 = &a Optimizing statement D.4592_41 = (void *) ivtmp.53_32; LKUP STMT D.4592_41 = nop_expr ivtmp.53_32 D.4592_41 = (void *) ivtmp.53_32; FIND: vect_pa.9_19 Replaced redundant expr '(void *) ivtmp.53_32' with 'vect_pa.9_19' ==== ASGN D.4592_41 = vect_pa.9_19 Optimizing statement D.4520_5 = MEM[base: D.4592_41, offset: 0B]; Replaced 'D.4592_41' with variable 'vect_pa.9_19' LKUP STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; 2>>> STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing statement vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; LKUP STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; 2>>> STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; Optimizing statement vect_var_.28_57 = vect_cst_.27_55 + vect_cst_.29_56; Replaced 'vect_cst_.29_56' with constant '{ 1, 1, 1, 1 }' LKUP STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; 2>>> STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; Optimizing statement MEM[(int[4] *)&a] = vect_var_.28_57; LKUP STMT MEM[(int[4] *)&a] = vect_var_.28_57 MEM[(int[4] *)&a] = vect_var_.28_57; LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; LKUP STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; 2>>> STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; Optimizing statement vect_pa.30_60 = &MEM[(void *)&a + 16B]; LKUP STMT vect_pa.30_60 = &MEM[(void *)&a + 16B] vect_pa.30_60 = &MEM[(void *)&a + 16B]; ==== ASGN vect_pa.30_60 = &MEM[(void *)&a + 16B] Optimizing statement ivtmp.34_62 = 1; LKUP STMT ivtmp.34_62 = 1 ivtmp.34_62 = 1; ==== ASGN ivtmp.34_62 = 1 LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) <<<< STMT vect_var_.28_57 = MEM[(int[4] *)&a] vect_var_.28_57 = MEM[(int[4] *)&a]; <<<< STMT vect_var_.28_57 = vect_cst_.27_55 plus_expr { 1, 1, 1, 1 } vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; <<<< STMT vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5} vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; <<<< STMT D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B] D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; Optimizing block #6 Optimizing statement ivtmp.53_30 = ivtmp.53_32 + 4; LKUP STMT ivtmp.53_30 = ivtmp.53_32 plus_expr 4 ivtmp.53_30 = ivtmp.53_32 + 4; Optimizing statement if (ivtmp.53_30 != D.4591_40) LKUP STMT ivtmp.53_30 ne_expr D.4591_40 if (ivtmp.53_30 != D.4591_40) Optimizing block #8 0>>> COPY ivtmp.53_30 = D.4591_40 1>>> COND 1 = ivtmp.53_30 le_expr D.4591_40 1>>> COND 1 = ivtmp.53_30 ge_expr D.4591_40 1>>> COND 1 = ivtmp.53_30 eq_expr D.4591_40 1>>> COND 0 = ivtmp.53_30 ne_expr D.4591_40 Optimizing statement D.4522_9 = a[0]; LKUP STMT D.4522_9 = a[0] D.4522_9 = a[0]; 2>>> STMT D.4522_9 = a[0] D.4522_9 = a[0]; Optimizing statement return D.4522_9; <<<< STMT D.4522_9 = a[0] D.4522_9 = a[0]; <<<< COND 0 = ivtmp.53_30 ne_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 ge_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 le_expr D.4591_40 <<<< COPY ivtmp.53_30 = D.4591_40 Optimizing block #9 1>>> COND 1 = ivtmp.53_30 ne_expr D.4591_40 1>>> COND 0 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 0 = ivtmp.53_30 eq_expr D.4591_40 <<<< COND 1 = ivtmp.53_30 ne_expr D.4591_40 <<<< STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; <<<< STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; <<<< STMT vect_pa.9_19 = nop_expr ivtmp.53_32 vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; <<<< STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; <<<< STMT D.4540_11 = nop_expr D.4589_36 D.4540_11 = (vector(4) int *) D.4589_36; <<<< STMT D.4589_36 = ivtmp.53_32 plus_expr 16 D.4589_36 = ivtmp.53_32 + 16; LKUP STMT D.4589_36 = ivtmp.53_37 plus_expr 16 D.4589_36 = ivtmp.53_37 + 16; FIND: D.4591_40 LKUP STMT D.4540_11 = nop_expr D.4591_40 D.4540_11 = (vector(4) int *) D.4591_40; LKUP STMT D.4541_26 = D.4540_11 le_expr &a D.4541_26 = D.4540_11 <= &a; LKUP STMT vect_pa.9_19 = nop_expr ivtmp.53_37 vect_pa.9_19 = (vector(4) int *) ivtmp.53_37; LKUP STMT D.4543_28 = vect_pa.9_19 ge_expr &MEM[(void *)&a + 16B] D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; LKUP STMT D.4544_29 = D.4541_26 bit_and_expr D.4543_28 D.4544_29 = D.4541_26 & D.4543_28; LKUP STMT D.4544_29 ne_expr 0 if (D.4544_29 != 0) <<<< STMT D.4591_40 = ivtmp.53_37 plus_expr 16 D.4591_40 = ivtmp.53_37 + 16; <<<< STMT ivtmp.53_37 = nop_expr &a ivtmp.53_37 = (long unsigned int) &a; Removing basic block 9 ;; basic block 9, loop depth 1, count 0 ;; prev block 6, next block 7 ;; pred: ;; succ: 7 [100.0%] (fallthru,dfs_back) <bb 9>: Invalid sum of incoming frequencies 0, should be 1500 Removing basic block 10 ;; basic block 10, loop depth 2, count 0 ;; prev block 4, next block 5 ;; pred: ;; succ: 4 [100.0%] (fallthru,dfs_back) <bb 10>: Invalid sum of incoming frequencies 0, should be 1200 goto <bb 4>; Scope blocks after cleanups: { Scope block #0 } foo () { void * D.4593; void * D.4592; long unsigned int a.55; long unsigned int D.4591; long unsigned int D.4589; long unsigned int ivtmp.53; sizetype ivtmp.42; unsigned int ivtmp.34; vector(4) int * vect_pa.33; vector(4) int * vect_pa.30; vector(4) int vect_cst_.29; vector(4) int vect_var_.28; vector(4) int vect_cst_.27; vector(4) int * D.4540; _Bool D.4541; vector(4) int * D.4542; _Bool D.4543; _Bool D.4544; vector(4) int * vect_pa.12; vector(4) int * vect_pa.9; int D.4522; int D.4521; int D.4520; <bb 2>: ivtmp.53_37 = (long unsigned int) &a; a.55_39 = ivtmp.53_37; D.4591_40 = ivtmp.53_37 + 16; goto <bb 7>; <bb 3>: ivtmp.42_59 = 0; <bb 4>: # ivtmp.42_50 = PHI <0(3), ivtmp.42_45(4)> D.4593_42 = vect_pa.9_19; D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; D.4521_34 = D.4520_33 + 1; MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; ivtmp.42_45 = ivtmp.42_50 + 4; if (ivtmp.42_45 != 16) goto <bb 4>; else goto <bb 6>; <bb 5>: vect_cst_.29_56 = { 1, 1, 1, 1 }; vect_pa.33_58 = &a; D.4592_41 = vect_pa.9_19; D.4520_5 = MEM[base: vect_pa.9_19, offset: 0B]; vect_cst_.27_55 = {D.4520_5, D.4520_5, D.4520_5, D.4520_5}; vect_var_.28_57 = vect_cst_.27_55 + { 1, 1, 1, 1 }; MEM[(int[4] *)&a] = vect_var_.28_57; vect_pa.30_60 = &MEM[(void *)&a + 16B]; ivtmp.34_62 = 1; <bb 6>: ivtmp.53_30 = ivtmp.53_32 + 4; if (ivtmp.53_30 != D.4591_40) goto <bb 7>; else goto <bb 8>; <bb 7>: # ivtmp.53_32 = PHI <ivtmp.53_37(2), ivtmp.53_30(6)> vect_pa.12_24 = &a; D.4589_36 = ivtmp.53_32 + 16; D.4540_11 = (vector(4) int *) D.4589_36; D.4541_26 = D.4540_11 <= &a; D.4542_27 = &MEM[(void *)&a + 16B]; vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; D.4543_28 = vect_pa.9_19 >= &MEM[(void *)&a + 16B]; D.4544_29 = D.4541_26 & D.4543_28; if (D.4544_29 != 0) goto <bb 5>; else goto <bb 3>; <bb 8>: D.4522_9 = a[0]; return D.4522_9; } ;; Function main (main, funcdef_no=1, decl_uid=2850, cgraph_uid=1) (executed once) ;; 2 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 3 4 5 7 6 ;; ;; Loop 1 ;; header 5, latch 7 ;; depth 1, outer 0 ;; nodes: 5 7 ;; 2 succs { 3 4 } ;; 3 succs { 4 } ;; 4 succs { 5 } ;; 5 succs { 7 6 } ;; 7 succs { 5 } ;; 6 succs { 1 } Optimizing block #0 Optimizing block #2 Optimizing statement a[0] = 0; LKUP STMT a[0] = 0 a[0] = 0; LKUP STMT 0 = a[0] 0 = a[0]; LKUP STMT 0 = a[0] 0 = a[0]; 2>>> STMT 0 = a[0] 0 = a[0]; Optimizing statement a[1] = 0; LKUP STMT a[1] = 0 a[1] = 0; LKUP STMT 0 = a[1] 0 = a[1]; LKUP STMT 0 = a[1] 0 = a[1]; 2>>> STMT 0 = a[1] 0 = a[1]; Optimizing statement a[2] = 0; LKUP STMT a[2] = 0 a[2] = 0; LKUP STMT 0 = a[2] 0 = a[2]; LKUP STMT 0 = a[2] 0 = a[2]; 2>>> STMT 0 = a[2] 0 = a[2]; Optimizing statement a[3] = 0; LKUP STMT a[3] = 0 a[3] = 0; LKUP STMT 0 = a[3] 0 = a[3]; LKUP STMT 0 = a[3] 0 = a[3]; 2>>> STMT 0 = a[3] 0 = a[3]; Optimizing statement res_5 = foo (); Optimizing statement if (res_5 != 31) LKUP STMT res_5 ne_expr 31 if (res_5 != 31) Optimizing block #3 1>>> COND 1 = res_5 ne_expr 31 1>>> COND 0 = res_5 eq_expr 31 Optimizing statement printf ("%d\n", res_5); <<<< COND 0 = res_5 eq_expr 31 <<<< COND 1 = res_5 ne_expr 31 Optimizing block #4 Optimizing block #5 Optimizing statement D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; LKUP STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; 2>>> STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; Optimizing statement printf ("%d ", D.4517_7); Optimizing statement ivtmp.67_12 = ivtmp.67_26 + 4; LKUP STMT ivtmp.67_12 = ivtmp.67_26 plus_expr 4 ivtmp.67_12 = ivtmp.67_26 + 4; Optimizing statement if (ivtmp.67_12 != 16) LKUP STMT ivtmp.67_12 ne_expr 16 if (ivtmp.67_12 != 16) Optimizing block #6 0>>> COPY ivtmp.67_12 = 16 1>>> COND 1 = ivtmp.67_12 le_expr 16 1>>> COND 1 = ivtmp.67_12 ge_expr 16 1>>> COND 1 = ivtmp.67_12 eq_expr 16 1>>> COND 0 = ivtmp.67_12 ne_expr 16 Optimizing statement __builtin_putchar (10); Optimizing statement return 0; <<<< COND 0 = ivtmp.67_12 ne_expr 16 <<<< COND 1 = ivtmp.67_12 eq_expr 16 <<<< COND 1 = ivtmp.67_12 ge_expr 16 <<<< COND 1 = ivtmp.67_12 le_expr 16 <<<< COPY ivtmp.67_12 = 16 Optimizing block #7 1>>> COND 1 = ivtmp.67_12 ne_expr 16 1>>> COND 0 = ivtmp.67_12 eq_expr 16 <<<< COND 0 = ivtmp.67_12 eq_expr 16 <<<< COND 1 = ivtmp.67_12 ne_expr 16 <<<< STMT D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B] D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; LKUP STMT D.4517_7 = MEM[symbol: a, index: 0, offset: 0B] D.4517_7 = MEM[symbol: a, index: 0, offset: 0B]; <<<< STMT 0 = a[3] 0 = a[3]; <<<< STMT 0 = a[2] 0 = a[2]; <<<< STMT 0 = a[1] 0 = a[1]; <<<< STMT 0 = a[0] 0 = a[0]; Removing basic block 7 ;; basic block 7, loop depth 1, count 0 ;; prev block 5, next block 6 ;; pred: ;; succ: 5 [100.0%] (fallthru,dfs_back) <bb 7>: Invalid sum of incoming frequencies 0, should be 6000 goto <bb 5>; Scope blocks after cleanups: { Scope block #0 int res; } main () { sizetype ivtmp.67; int res; int D.4517; <bb 2>: a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; res_5 = foo (); if (res_5 != 31) goto <bb 3>; else goto <bb 4>; <bb 3>: printf ("%d\n", res_5); <bb 4>: <bb 5>: # ivtmp.67_26 = PHI <ivtmp.67_12(5), 0(4)> D.4517_7 = MEM[symbol: a, index: ivtmp.67_26, offset: 0B]; printf ("%d ", D.4517_7); ivtmp.67_12 = ivtmp.67_26 + 4; if (ivtmp.67_12 != 16) goto <bb 5>; else goto <bb 6>; <bb 6>: __builtin_putchar (10); return 0; } ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 12:33 ` Ira Rosen @ 2011-07-25 13:01 ` Richard Guenther 2011-07-25 13:07 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 13:01 UTC (permalink / raw) To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote: > On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote: >> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote: >>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>>>> Richard Guenther wrote: >>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>>> >> I reproduced the failure. It occurs without Richard's >>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >>>>>> >> patches too. Obviously the vectorized loop is executed, but at the >>>>>> >> moment I don't understand why. I'll have a better look on Sunday. >>>>>> > >>>>>> > Actually it doesn't choose the vectorized code. But the scalar version >>>>>> > gets optimized in a harmful way for SPU, AFAIU. >>>>>> > Here is the scalar loop after vrp2 >>>>>> > >>>>>> > <bb 8>: >>>>>> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >>>>>> > D.4593_42 = (void *) ivtmp.53_32; >>>>>> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >>>>>> > D.4521_34 = D.4520_33 + 1; >>>>>> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >>>>>> > ivtmp.42_45 = ivtmp.42_50 + 4; >>>>>> > if (ivtmp.42_45 != 16) >>>>>> > goto <bb 10>; >>>>>> > else >>>>>> > goto <bb 5>; >>>>>> > >>>>>> > and the load is changed by dom2 to: >>>>>> > >>>>>> > <bb 4>: >>>>>> > ... >>>>>> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >>>>>> > ... >>>>>> > >>>>>> > where vector(4) int * vect_pa.9; >>>>>> > >>>>>> > And the scalar loop has no rotate for that load: >>>>>> >>>>>> Hum. This smells like we are hiding sth from the tree optimizers? >>>>> >>>>> Well, the back-end assumes a pointer to vector type is always >>>>> naturally aligned, and therefore the data it points to can be >>>>> accessed via a simple load, with no extra rotate needed. >>>> >>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >>>> anything about alignment just because of the kind of the pointer >>>> is bogus - the scalar code does a scalar read using that pointer. >>>> So the backend better should look at the memory operation, not >>>> at the pointer type. That said, I can't find any code that looks >>>> suspicious in the spu backend. >>>> >>>>> It seems what happened here is that somehow, a pointer to int >>>>> gets replaced by a pointer to vector, even though their alignment >>>>> properties are different. >>>> >>>> No, they are not. They get replaced if they are value-equivalent >>>> in which case they are also alignment-equivalent. But well, the >>>> dump snippet wasn't complete and I don't feel like building a >>>> SPU cross to verify myself. >>> >>> I am attaching the complete file. >> >> The issue seems to be that the IV in question, vect_pa.9_19, is >> defined as >> >> vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; >> >> but ivtmp.53_32 does not have a definition at all. >> > > I am sorry, it's my fault, resending the file. Seems perfectly valid to me. Or well - I suppose we might run into the issue that the vectorizer sets alignment data at the wrong spot? You can check alignment info when dumping with the -alias flag. Building a spu cross now. Richard. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 13:01 ` Richard Guenther @ 2011-07-25 13:07 ` Richard Guenther 2011-07-25 13:47 ` Ulrich Weigand 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 13:07 UTC (permalink / raw) To: Ira Rosen; +Cc: Ulrich Weigand, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 1:15 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >> On 25 July 2011 13:57, Richard Guenther <richard.guenther@gmail.com> wrote: >>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>>> On 25 July 2011 12:39, Richard Guenther <richard.guenther@gmail.com> wrote: >>>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>>>>> Richard Guenther wrote: >>>>>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>>>> > On 21 July 2011 15:19, Ira Rosen <ira.rosen@linaro.org> wrote: >>>>>>> >> I reproduced the failure. It occurs without Richard's >>>>>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >>>>>>> >> patches too. Obviously the vectorized loop is executed, but at the >>>>>>> >> moment I don't understand why. I'll have a better look on Sunday. >>>>>>> > >>>>>>> > Actually it doesn't choose the vectorized code. But the scalar version >>>>>>> > gets optimized in a harmful way for SPU, AFAIU. >>>>>>> > Here is the scalar loop after vrp2 >>>>>>> > >>>>>>> > <bb 8>: >>>>>>> > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> >>>>>>> > D.4593_42 = (void *) ivtmp.53_32; >>>>>>> > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; >>>>>>> > D.4521_34 = D.4520_33 + 1; >>>>>>> > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; >>>>>>> > ivtmp.42_45 = ivtmp.42_50 + 4; >>>>>>> > if (ivtmp.42_45 != 16) >>>>>>> > goto <bb 10>; >>>>>>> > else >>>>>>> > goto <bb 5>; >>>>>>> > >>>>>>> > and the load is changed by dom2 to: >>>>>>> > >>>>>>> > <bb 4>: >>>>>>> > ... >>>>>>> > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; >>>>>>> > ... >>>>>>> > >>>>>>> > where vector(4) int * vect_pa.9; >>>>>>> > >>>>>>> > And the scalar loop has no rotate for that load: >>>>>>> >>>>>>> Hum. This smells like we are hiding sth from the tree optimizers? >>>>>> >>>>>> Well, the back-end assumes a pointer to vector type is always >>>>>> naturally aligned, and therefore the data it points to can be >>>>>> accessed via a simple load, with no extra rotate needed. >>>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >>>>> anything about alignment just because of the kind of the pointer >>>>> is bogus - the scalar code does a scalar read using that pointer. >>>>> So the backend better should look at the memory operation, not >>>>> at the pointer type. That said, I can't find any code that looks >>>>> suspicious in the spu backend. >>>>> >>>>>> It seems what happened here is that somehow, a pointer to int >>>>>> gets replaced by a pointer to vector, even though their alignment >>>>>> properties are different. >>>>> >>>>> No, they are not. They get replaced if they are value-equivalent >>>>> in which case they are also alignment-equivalent. But well, the >>>>> dump snippet wasn't complete and I don't feel like building a >>>>> SPU cross to verify myself. >>>> >>>> I am attaching the complete file. >>> >>> The issue seems to be that the IV in question, vect_pa.9_19, is >>> defined as >>> >>> vect_pa.9_19 = (vector(4) int *) ivtmp.53_32; >>> >>> but ivtmp.53_32 does not have a definition at all. >>> >> >> I am sorry, it's my fault, resending the file. > > Seems perfectly valid to me. Or well - I suppose we might run into > the issue that the vectorizer sets alignment data at the wrong spot? > You can check alignment info when dumping with the -alias flag. > Building a spu cross now. Nope, all perfectly valid. > Richard. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 13:07 ` Richard Guenther @ 2011-07-25 13:47 ` Ulrich Weigand 2011-07-25 14:01 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-25 13:47 UTC (permalink / raw) To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking Richard Guenther wrote: > >>>>>> Well, the back-end assumes a pointer to vector type is always > >>>>>> naturally aligned, and therefore the data it points to can be > >>>>>> accessed via a simple load, with no extra rotate needed. > >>>>> > >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming > >>>>> anything about alignment just because of the kind of the pointer > >>>>> is bogus - the scalar code does a scalar read using that pointer. > >>>>> So the backend better should look at the memory operation, not > >>>>> at the pointer type. That said, I can't find any code that looks > >>>>> suspicious in the spu backend. > >>>>> > >>>>>> It seems what happened here is that somehow, a pointer to int > >>>>>> gets replaced by a pointer to vector, even though their alignment > >>>>>> properties are different. > >>>>> > >>>>> No, they are not. They get replaced if they are value-equivalent > >>>>> in which case they are also alignment-equivalent. But well, the > >>>>> dump snippet wasn't complete and I don't feel like building a > >>>>> SPU cross to verify myself. > > Seems perfectly valid to me. Or well - I suppose we might run into > > the issue that the vectorizer sets alignment data at the wrong spot? > > You can check alignment info when dumping with the -alias flag. > > Building a spu cross now. > > Nope, all perfectly valid. Ah, I guess I see what's happening here. When the SPU back-end is called to expand the load, the source operand is passed as: (mem:SI (reg/f:SI 226 [ vect_pa.9 ]) [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32]) Now this does say the MEM is only guaranteed to be aligned to 32 bits. However, spu_expand_load then goes and looks at the components of the address in detail, in order to figure out how to best perform the access. In doing so, it looks at the REGNO_POINTER_ALIGN values of the base registers involved in the address. In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore the back-end thinks it can use an aligned access after all. Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register is the DECL_RTL for the variable vect_pa.9, and that variable has a pointer-to-vector type (with target alignment 128). When expanding that variable, expand_one_register_var does: if (POINTER_TYPE_P (type)) mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); All this is normally completely correct -- a variable of type pointer to vector type *must* hold only properly aligned values. I guess the vectorizer deliberatly loads a (potentially) unaligned value into a vector pointer variable. It then generates a check whether the value is really aligned; and uses it only if so. But if that pointer variable "escapes" into the other branch because DOM thinks it can re-use the value, the REGNO_POINTER_ALIGN value carried for its DECL_RTL register is now incorrect ... Maybe the vectorizer ought to declare that variable with a non-default type alignment setting? Or else, perform the assignment to the variable only *inside* the "if" that checks for correct alignment? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 13:47 ` Ulrich Weigand @ 2011-07-25 14:01 ` Richard Guenther 2011-07-25 14:10 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 14:01 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: > >> >>>>>> Well, the back-end assumes a pointer to vector type is always >> >>>>>> naturally aligned, and therefore the data it points to can be >> >>>>>> accessed via a simple load, with no extra rotate needed. >> >>>>> >> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >> >>>>> anything about alignment just because of the kind of the pointer >> >>>>> is bogus - the scalar code does a scalar read using that pointer. >> >>>>> So the backend better should look at the memory operation, not >> >>>>> at the pointer type. That said, I can't find any code that looks >> >>>>> suspicious in the spu backend. >> >>>>> >> >>>>>> It seems what happened here is that somehow, a pointer to int >> >>>>>> gets replaced by a pointer to vector, even though their alignment >> >>>>>> properties are different. >> >>>>> >> >>>>> No, they are not. They get replaced if they are value-equivalent >> >>>>> in which case they are also alignment-equivalent. But well, the >> >>>>> dump snippet wasn't complete and I don't feel like building a >> >>>>> SPU cross to verify myself. > >> > Seems perfectly valid to me. Or well - I suppose we might run into >> > the issue that the vectorizer sets alignment data at the wrong spot? >> > You can check alignment info when dumping with the -alias flag. >> > Building a spu cross now. >> >> Nope, all perfectly valid. > > Ah, I guess I see what's happening here. When the SPU back-end is called > to expand the load, the source operand is passed as: > > (mem:SI (reg/f:SI 226 [ vect_pa.9 ]) > [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32]) > > Now this does say the MEM is only guaranteed to be aligned to 32 bits. > > However, spu_expand_load then goes and looks at the components of the > address in detail, in order to figure out how to best perform the access. > In doing so, it looks at the REGNO_POINTER_ALIGN values of the base > registers involved in the address. > > In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore > the back-end thinks it can use an aligned access after all. > > Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register > is the DECL_RTL for the variable vect_pa.9, and that variable has a > pointer-to-vector type (with target alignment 128). > > When expanding that variable, expand_one_register_var does: > > if (POINTER_TYPE_P (type)) > mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); > > All this is normally completely correct -- a variable of type pointer > to vector type *must* hold only properly aligned values. No, this is indeed completely bogus code ;) it should instead use get_pointer_alignment. Richard. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 14:01 ` Richard Guenther @ 2011-07-25 14:10 ` Richard Guenther 2011-07-25 14:14 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 14:10 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> Richard Guenther wrote: >> >>> >>>>>> Well, the back-end assumes a pointer to vector type is always >>> >>>>>> naturally aligned, and therefore the data it points to can be >>> >>>>>> accessed via a simple load, with no extra rotate needed. >>> >>>>> >>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >>> >>>>> anything about alignment just because of the kind of the pointer >>> >>>>> is bogus - the scalar code does a scalar read using that pointer. >>> >>>>> So the backend better should look at the memory operation, not >>> >>>>> at the pointer type. That said, I can't find any code that looks >>> >>>>> suspicious in the spu backend. >>> >>>>> >>> >>>>>> It seems what happened here is that somehow, a pointer to int >>> >>>>>> gets replaced by a pointer to vector, even though their alignment >>> >>>>>> properties are different. >>> >>>>> >>> >>>>> No, they are not. They get replaced if they are value-equivalent >>> >>>>> in which case they are also alignment-equivalent. But well, the >>> >>>>> dump snippet wasn't complete and I don't feel like building a >>> >>>>> SPU cross to verify myself. >> >>> > Seems perfectly valid to me. Or well - I suppose we might run into >>> > the issue that the vectorizer sets alignment data at the wrong spot? >>> > You can check alignment info when dumping with the -alias flag. >>> > Building a spu cross now. >>> >>> Nope, all perfectly valid. >> >> Ah, I guess I see what's happening here. When the SPU back-end is called >> to expand the load, the source operand is passed as: >> >> (mem:SI (reg/f:SI 226 [ vect_pa.9 ]) >> [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32]) >> >> Now this does say the MEM is only guaranteed to be aligned to 32 bits. >> >> However, spu_expand_load then goes and looks at the components of the >> address in detail, in order to figure out how to best perform the access. >> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base >> registers involved in the address. >> >> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore >> the back-end thinks it can use an aligned access after all. >> >> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register >> is the DECL_RTL for the variable vect_pa.9, and that variable has a >> pointer-to-vector type (with target alignment 128). >> >> When expanding that variable, expand_one_register_var does: >> >> if (POINTER_TYPE_P (type)) >> mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); >> >> All this is normally completely correct -- a variable of type pointer >> to vector type *must* hold only properly aligned values. > > No, this is indeed completely bogus code ;) it should instead > use get_pointer_alignment. Btw, as pseudos do not have a single def site how can the above ever be correct in the face of coalescing? For example on trees we can have p_1 = &a; // align 256 p_2 = p_1 + 4; // align 32 but we'll coalesce the thing and thus would have to use the weaker alignment of both SSA names. expand_one_register_var expands the decl, not the SSA name, so using get_pointer_alignment on the decl would probably be fine, though also pointless as it always will return 8. At least I don't see any code that would prevent a temporary variable of type int * of being coalesced with a temporary variable of type vector int *. Why should REGNO_POINTER_ALIGN be interesting to anyone? Proper alignment information is (should be) attached to every MEM already. Richard. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 14:10 ` Richard Guenther @ 2011-07-25 14:14 ` Richard Guenther 2011-07-25 14:54 ` Ulrich Weigand 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 14:14 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 4:03 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >>> Richard Guenther wrote: >>> >>>> >>>>>> Well, the back-end assumes a pointer to vector type is always >>>> >>>>>> naturally aligned, and therefore the data it points to can be >>>> >>>>>> accessed via a simple load, with no extra rotate needed. >>>> >>>>> >>>> >>>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming >>>> >>>>> anything about alignment just because of the kind of the pointer >>>> >>>>> is bogus - the scalar code does a scalar read using that pointer. >>>> >>>>> So the backend better should look at the memory operation, not >>>> >>>>> at the pointer type. That said, I can't find any code that looks >>>> >>>>> suspicious in the spu backend. >>>> >>>>> >>>> >>>>>> It seems what happened here is that somehow, a pointer to int >>>> >>>>>> gets replaced by a pointer to vector, even though their alignment >>>> >>>>>> properties are different. >>>> >>>>> >>>> >>>>> No, they are not. They get replaced if they are value-equivalent >>>> >>>>> in which case they are also alignment-equivalent. But well, the >>>> >>>>> dump snippet wasn't complete and I don't feel like building a >>>> >>>>> SPU cross to verify myself. >>> >>>> > Seems perfectly valid to me. Or well - I suppose we might run into >>>> > the issue that the vectorizer sets alignment data at the wrong spot? >>>> > You can check alignment info when dumping with the -alias flag. >>>> > Building a spu cross now. >>>> >>>> Nope, all perfectly valid. >>> >>> Ah, I guess I see what's happening here. When the SPU back-end is called >>> to expand the load, the source operand is passed as: >>> >>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ]) >>> [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32]) >>> >>> Now this does say the MEM is only guaranteed to be aligned to 32 bits. >>> >>> However, spu_expand_load then goes and looks at the components of the >>> address in detail, in order to figure out how to best perform the access. >>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base >>> registers involved in the address. >>> >>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore >>> the back-end thinks it can use an aligned access after all. >>> >>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register >>> is the DECL_RTL for the variable vect_pa.9, and that variable has a >>> pointer-to-vector type (with target alignment 128). >>> >>> When expanding that variable, expand_one_register_var does: >>> >>> if (POINTER_TYPE_P (type)) >>> mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); >>> >>> All this is normally completely correct -- a variable of type pointer >>> to vector type *must* hold only properly aligned values. >> >> No, this is indeed completely bogus code ;) it should instead >> use get_pointer_alignment. > > Btw, as pseudos do not have a single def site how can the above > ever be correct in the face of coalescing? For example on trees we > can have > > p_1 = &a; // align 256 > p_2 = p_1 + 4; // align 32 > > but we'll coalesce the thing and thus would have to use the weaker > alignment of both SSA names. expand_one_register_var expands > the decl, not the SSA name, so using get_pointer_alignment on > the decl would probably be fine, though also pointless as it always > will return 8. > > At least I don't see any code that would prevent a temporary variable > of type int * of being coalesced with a temporary variable of type vector int *. > > Why should REGNO_POINTER_ALIGN be interesting to anyone? > Proper alignment information is (should be) attached to every > MEM already. nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN apart from maybe alpha.c and spu.c. We should simply kill REGNO_POINTER_ALIGN IMHO. Richard. > Richard. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 14:14 ` Richard Guenther @ 2011-07-25 14:54 ` Ulrich Weigand 2011-07-25 14:59 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-25 14:54 UTC (permalink / raw) To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking Richard Guenther wrote: > > Btw, as pseudos do not have a single def site how can the above > > ever be correct in the face of coalescing? I had always understood this to reflect the simple fact that a pointer to some type must never hold a value that is not properly aligned for that type. (Maybe this is only true on STRICT_ALIGNMENT targets?) This has always been an important property to generate good code on SPU ... > > For example on trees we can have > > > > p_1 = &a; // align 256 > > p_2 = p_1 + 4; // align 32 > > > > but we'll coalesce the thing and thus would have to use the weaker > > alignment of both SSA names. expand_one_register_var expands > > the decl, not the SSA name, so using get_pointer_alignment on > > the decl would probably be fine, though also pointless as it always > > will return 8. > > > > At least I don't see any code that would prevent a temporary variable > > of type int * of being coalesced with a temporary variable of type vector > > int *. I don't really understand the coalesce code, but in the above sample, the two variables must have the same type, otherwise there'd have to be a cast somewhere. Does coalesce eliminate casts? > > Why should REGNO_POINTER_ALIGN be interesting to anyone? > > Proper alignment information is (should be) attached to every > > MEM already. > > nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN > apart from maybe alpha.c and spu.c. > > We should simply kill REGNO_POINTER_ALIGN IMHO. On the SPU at least, REGNO_POINTER_ALIGN carries additional information over just the MEM alignment. Say, I'm getting a MEM the form (mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits. This means I need to generate a rotate to fix up the value that was loaded by the (forced aligned) load instruction. However, the form of this rotate can be simpler if I know that e.g. reg X is always guaranteed to be 128-bits aligned and only reg Y introduces the potential misalignment. If on the other hand neither of the base registers is guaranteed to be 128-bit aligned, I need to generate more complex rotate code ... I understand this may also be important on other platforms, e.g. to choose which register to use as base and which as index in a memory operation ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 14:54 ` Ulrich Weigand @ 2011-07-25 14:59 ` Richard Guenther 2011-07-25 16:12 ` Ulrich Weigand 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-07-25 14:59 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: > >> > Btw, as pseudos do not have a single def site how can the above >> > ever be correct in the face of coalescing? > > I had always understood this to reflect the simple fact that a > pointer to some type must never hold a value that is not properly > aligned for that type. (Maybe this is only true on STRICT_ALIGNMENT > targets?) This has always been an important property to generate > good code on SPU ... We do not preserve pointer type casts in the middle-end (anymore). >> > For example on trees we can have >> > >> > p_1 = &a; // align 256 >> > p_2 = p_1 + 4; // align 32 >> > >> > but we'll coalesce the thing and thus would have to use the weaker >> > alignment of both SSA names. expand_one_register_var expands >> > the decl, not the SSA name, so using get_pointer_alignment on >> > the decl would probably be fine, though also pointless as it always >> > will return 8. >> > >> > At least I don't see any code that would prevent a temporary variable >> > of type int * of being coalesced with a temporary variable of type vector >> > int *. > > I don't really understand the coalesce code, but in the above sample, > the two variables must have the same type, otherwise there'd have to > be a cast somewhere. Does coalesce eliminate casts? No, there is no cast between different pointer types. Information is not attached to types but to real entities. >> > Why should REGNO_POINTER_ALIGN be interesting to anyone? >> > Proper alignment information is (should be) attached to every >> > MEM already. >> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN >> apart from maybe alpha.c and spu.c. >> >> We should simply kill REGNO_POINTER_ALIGN IMHO. > > On the SPU at least, REGNO_POINTER_ALIGN carries additional information > over just the MEM alignment. Say, I'm getting a MEM the form > (mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits. > > This means I need to generate a rotate to fix up the value that was > loaded by the (forced aligned) load instruction. However, the form > of this rotate can be simpler if I know that e.g. reg X is always > guaranteed to be 128-bits aligned and only reg Y introduces the > potential misalignment. If on the other hand neither of the base > registers is guaranteed to be 128-bit aligned, I need to generate > more complex rotate code ... Because then you need the value of X + Y instead of just picking either? Why not expand this explicitly when you still have the per-SSA name alignment information around? > I understand this may also be important on other platforms, e.g. > to choose which register to use as base and which as index in a > memory operation ... Well, we still have REG_POINTER. Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 14:59 ` Richard Guenther @ 2011-07-25 16:12 ` Ulrich Weigand 2011-07-26 8:25 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-25 16:12 UTC (permalink / raw) To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking Richard Guenther wrote: > On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > I had always understood this to reflect the simple fact that a > > pointer to some type must never hold a value that is not properly > > aligned for that type. (Maybe this is only true on STRICT_ALIGNMENT > > targets?) This has always been an important property to generate > > good code on SPU ... > > We do not preserve pointer type casts in the middle-end (anymore). Huh, OK. I was not aware of that ... > >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN > >> apart from maybe alpha.c and spu.c. There's also a use in find_reloads_subreg_address, as well as in the i386/predicates.md and arm/arm.md files. > > This means I need to generate a rotate to fix up the value that was > > loaded by the (forced aligned) load instruction. However, the form > > of this rotate can be simpler if I know that e.g. reg X is always > > guaranteed to be 128-bits aligned and only reg Y introduces the > > potential misalignment. If on the other hand neither of the base > > registers is guaranteed to be 128-bit aligned, I need to generate > > more complex rotate code ... > > Because then you need the value of X + Y instead of just picking either? Yes, exactly. > Why not expand this explicitly when you still have the per-SSA name > alignment information around? When would that be? The expansion does happen in the initial expand stage, but I'm getting called from the middle-end via emit_move_insn etc. which already provides me with a MEM ... Can I use REG_ATTRS->decl to get at the register's DECL and use get_pointer_alignment on that? [ On the other hand, don't we have the same problems with reliability of REG_ATTRS that we have with REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ] Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-25 16:12 ` Ulrich Weigand @ 2011-07-26 8:25 ` Richard Guenther 2011-07-26 8:59 ` Andrew Pinski 2011-07-26 14:23 ` Ulrich Weigand 0 siblings, 2 replies; 35+ messages in thread From: Richard Guenther @ 2011-07-26 8:25 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Ira Rosen, gcc-patches, Patch Tracking On Mon, Jul 25, 2011 at 5:25 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: >> On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> > I had always understood this to reflect the simple fact that a >> > pointer to some type must never hold a value that is not properly >> > aligned for that type. (Maybe this is only true on STRICT_ALIGNMENT >> > targets?) This has always been an important property to generate >> > good code on SPU ... >> >> We do not preserve pointer type casts in the middle-end (anymore). > > Huh, OK. I was not aware of that ... > >> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN >> >> apart from maybe alpha.c and spu.c. > > There's also a use in find_reloads_subreg_address, as well as in the > i386/predicates.md and arm/arm.md files. > >> > This means I need to generate a rotate to fix up the value that was >> > loaded by the (forced aligned) load instruction. However, the form >> > of this rotate can be simpler if I know that e.g. reg X is always >> > guaranteed to be 128-bits aligned and only reg Y introduces the >> > potential misalignment. If on the other hand neither of the base >> > registers is guaranteed to be 128-bit aligned, I need to generate >> > more complex rotate code ... >> >> Because then you need the value of X + Y instead of just picking either? > > Yes, exactly. > >> Why not expand this explicitly when you still have the per-SSA name >> alignment information around? > > When would that be? The expansion does happen in the initial expand > stage, but I'm getting called from the middle-end via emit_move_insn etc. > which already provides me with a MEM ... Hmm. I suppose we'd need to see at the initial expand stage that the move is going to be handled specially. For other strict-align targets we end up with store/load-bit-field for unaligned accesses, so I suppose SPU doesn't want to go down that path (via insv/extv)? > Can I use REG_ATTRS->decl to get at the register's DECL and use > get_pointer_alignment on that? [ On the other hand, don't we have > the same problems with reliability of REG_ATTRS that we have with > REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ] Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose we'd need to pick a conservative REGNO_POINTER_ALIGN during expansion of the SSA name partition - iterate over all of them in the partition and pick the lowest alignment. Or even adjust the partitioning to avoid losing alignment information that way. I suppose the RTL code transforms are careful to update REGNO_POINTER_ALIGN conservatively. Richard. > Bye, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-26 8:25 ` Richard Guenther @ 2011-07-26 8:59 ` Andrew Pinski 2011-07-26 14:23 ` Ulrich Weigand 1 sibling, 0 replies; 35+ messages in thread From: Andrew Pinski @ 2011-07-26 8:59 UTC (permalink / raw) To: Richard Guenther; +Cc: Ulrich Weigand, Ira Rosen, gcc-patches, Patch Tracking On Tue, Jul 26, 2011 at 12:23 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > Hmm. I suppose we'd need to see at the initial expand stage that the > move is going to be handled specially. For other strict-align targets > we end up with store/load-bit-field for unaligned accesses, so I suppose > SPU doesn't want to go down that path (via insv/extv)? The problem is that almost all load/stores on spu are unaligned. So there will be less optimized done on them if we go down that path (it was tried at least twice and it produced worse code). Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-26 8:25 ` Richard Guenther 2011-07-26 8:59 ` Andrew Pinski @ 2011-07-26 14:23 ` Ulrich Weigand 2011-07-26 14:25 ` Michael Matz 1 sibling, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-26 14:23 UTC (permalink / raw) To: Richard Guenther; +Cc: Ira Rosen, gcc-patches, Patch Tracking Richard Guenther wrote: > On Mon, Jul 25, 2011 at 5:25 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > When would that be? The expansion does happen in the initial expand > > stage, but I'm getting called from the middle-end via emit_move_insn etc. > > which already provides me with a MEM ... > > Hmm. I suppose we'd need to see at the initial expand stage that the > move is going to be handled specially. For other strict-align targets > we end up with store/load-bit-field for unaligned accesses, so I suppose > SPU doesn't want to go down that path (via insv/extv)? One issue here is that accesses aren't necessarily "unaligned" as far as the middle-end is concerned: in the current example, we in fact have an access to a 32-bit integer that is aligned on a 32-bit boundary (which is the default alignment for integers). It's just that even so, the address is not *128-bit* aligned, and all SPU load instructions require this level of alignment ... The other issue is that as Andrew mentioned, all this means that just about every single memory access needs to be handled this way, and attempts to have everying go through insv/extv in the past have resulted in less efficient code generation. > > Can I use REG_ATTRS->decl to get at the register's DECL and use > > get_pointer_alignment on that? [ On the other hand, don't we have > > the same problems with reliability of REG_ATTRS that we have with > > REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ] > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > we'd need to pick a conservative REGNO_POINTER_ALIGN during > expansion of the SSA name partition - iterate over all of them in the > partition and pick the lowest alignment. Or even adjust the partitioning > to avoid losing alignment information that way. That would certainly be helpful. > I suppose the RTL code transforms are careful to update REGNO_POINTER_ALIGN > conservatively. They're supposed to, yes. In practice, REGNO_POINTER_ALIGN is mostly used for pseudos allocated to hold pointer types (reflecting the type's alignment requirement) and for virtual/hard registers pointing into the stack (stack, frame, virtual args, ...), reflecting the various ABI alignment guarantees about the stack. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [patch] Fix PR tree-optimization/49771 2011-07-26 14:23 ` Ulrich Weigand @ 2011-07-26 14:25 ` Michael Matz 2011-07-26 16:18 ` Merge alignments from coalesced SSA pointers Michael Matz 0 siblings, 1 reply; 35+ messages in thread From: Michael Matz @ 2011-07-26 14:25 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Richard Guenther, Ira Rosen, gcc-patches, Patch Tracking Hi, On Tue, 26 Jul 2011, Ulrich Weigand wrote: > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > > we'd need to pick a conservative REGNO_POINTER_ALIGN during > > expansion of the SSA name partition - iterate over all of them in the > > partition and pick the lowest alignment. Or even adjust the partitioning > > to avoid losing alignment information that way. > > That would certainly be helpful. I'm working on a patch for that, stay tuned. Ciao, Michael. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Merge alignments from coalesced SSA pointers 2011-07-26 14:25 ` Michael Matz @ 2011-07-26 16:18 ` Michael Matz 2011-07-26 17:23 ` Michael Matz 2011-07-26 17:28 ` Merge alignments from coalesced SSA pointers Ulrich Weigand 0 siblings, 2 replies; 35+ messages in thread From: Michael Matz @ 2011-07-26 16:18 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches On Tue, 26 Jul 2011, Michael Matz wrote: > Hi, > > On Tue, 26 Jul 2011, Ulrich Weigand wrote: > > > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during > > > expansion of the SSA name partition - iterate over all of them in the > > > partition and pick the lowest alignment. Or even adjust the partitioning > > > to avoid losing alignment information that way. > > > > That would certainly be helpful. > > I'm working on a patch for that, stay tuned. Like so. Currently in regstrapping on x86_64-linux. Could you try if it helps spu? Okay for trunk? Ciao, Michael. * cfgexpand.c (expand_one_register_var): Use get_pointer_alignment. (gimple_expand_cfg): Merge alignment info for coalesced pointer SSA names. Index: cfgexpand.c =================================================================== --- cfgexpand.c (revision 176790) +++ cfgexpand.c (working copy) @@ -909,7 +909,7 @@ expand_one_register_var (tree var) mark_user_reg (x); if (POINTER_TYPE_P (type)) - mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); + mark_reg_pointer (x, get_pointer_alignment (var, BIGGEST_ALIGNMENT)); } /* A subroutine of expand_one_var. Called to assign rtl to a VAR_DECL that @@ -4265,6 +4265,25 @@ gimple_expand_cfg (void) } } + /* If we have a class containing differently aligned pointers + we need to merge those into the corresponding RTL pointer + alignment. */ + for (i = 1; i < num_ssa_names; i++) + { + tree name = ssa_name (i); + int part; + rtx r; + + if (!name || !POINTER_TYPE_P (TREE_TYPE (name))) + continue; + part = var_to_partition (SA.map, name); + if (part == NO_PARTITION) + continue; + r = SA.partition_to_pseudo[part]; + if (REG_P (r)) + mark_reg_pointer (r, get_pointer_alignment (name, BIGGEST_ALIGNMENT)); + } + /* If this function is `main', emit a call to `__main' to run global initializers, etc. */ if (DECL_NAME (current_function_decl) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Merge alignments from coalesced SSA pointers 2011-07-26 16:18 ` Merge alignments from coalesced SSA pointers Michael Matz @ 2011-07-26 17:23 ` Michael Matz 2011-08-08 16:34 ` Ulrich Weigand 2011-07-26 17:28 ` Merge alignments from coalesced SSA pointers Ulrich Weigand 1 sibling, 1 reply; 35+ messages in thread From: Michael Matz @ 2011-07-26 17:23 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches Hi, On Tue, 26 Jul 2011, Michael Matz wrote: > On Tue, 26 Jul 2011, Michael Matz wrote: > > > Hi, > > > > On Tue, 26 Jul 2011, Ulrich Weigand wrote: > > > > > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during > > > > expansion of the SSA name partition - iterate over all of them in the > > > > partition and pick the lowest alignment. Or even adjust the partitioning > > > > to avoid losing alignment information that way. > > > > > > That would certainly be helpful. > > > > I'm working on a patch for that, stay tuned. > > Like so. Currently in regstrapping on x86_64-linux. Could you try if it > helps spu? > > Okay for trunk? This patch exposes a problem in libada. But I'd still be interested if it fixes the spu problem. Ciao, Michael. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Merge alignments from coalesced SSA pointers 2011-07-26 17:23 ` Michael Matz @ 2011-08-08 16:34 ` Ulrich Weigand 2011-08-09 12:01 ` Michael Matz 0 siblings, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-08-08 16:34 UTC (permalink / raw) To: Michael Matz; +Cc: Richard Guenther, gcc-patches Michael Matz wrote: > Hi, > > On Tue, 26 Jul 2011, Michael Matz wrote: > > > On Tue, 26 Jul 2011, Michael Matz wrote: > > > > > Hi, > > > > > > On Tue, 26 Jul 2011, Ulrich Weigand wrote: > > > > > > > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > > > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during > > > > > expansion of the SSA name partition - iterate over all of them in the > > > > > partition and pick the lowest alignment. Or even adjust the partitioning > > > > > to avoid losing alignment information that way. > > > > > > > > That would certainly be helpful. > > > > > > I'm working on a patch for that, stay tuned. > > > > Like so. Currently in regstrapping on x86_64-linux. Could you try if it > > helps spu? > > > > Okay for trunk? > > This patch exposes a problem in libada. But I'd still be interested if it > fixes the spu problem. It turns out that the patch not only fixes the pr49771 failure, but in fact *all* outstanding vectorizer test case failures on SPU in both mainline and the 4.6 branch: On mainline: FAIL: gcc.dg/vect/pr49771.c execution test FAIL: gcc.dg/vect/vect-outer-4f.c execution test FAIL: gcc.dg/vect/vect-outer-4g.c execution test FAIL: gcc.dg/vect/vect-outer-4k.c execution test FAIL: gcc.dg/vect/vect-outer-4l.c execution test FAIL: gcc.dg/vect/pr49771.c -flto execution test FAIL: gcc.dg/vect/vect-outer-4f.c -flto execution test FAIL: gcc.dg/vect/vect-outer-4g.c -flto execution test FAIL: gcc.dg/vect/vect-outer-4k.c -flto execution test FAIL: gcc.dg/vect/vect-outer-4l.c -flto execution test On 4.6: FAIL: gcc.dg/vect/vect-double-reduc-5.c execution test FAIL: gcc.dg/vect/vect-outer-4f.c execution test FAIL: gcc.dg/vect/vect-outer-4g.c execution test FAIL: gcc.dg/vect/vect-outer-4k.c execution test FAIL: gcc.dg/vect/vect-outer-4l.c execution test While I'm still somewhat concerned about potential performance regressions, correctness is of course more important, so I'd really like to see your patch (or something along its lines) go in ... Are there any updates on the libada problem or other reasons why the patch cannot go in? Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Merge alignments from coalesced SSA pointers 2011-08-08 16:34 ` Ulrich Weigand @ 2011-08-09 12:01 ` Michael Matz 2011-08-12 16:41 ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz 0 siblings, 1 reply; 35+ messages in thread From: Michael Matz @ 2011-08-09 12:01 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Richard Guenther, gcc-patches Hi, On Mon, 8 Aug 2011, Ulrich Weigand wrote: > > This patch exposes a problem in libada. But I'd still be interested > > if it fixes the spu problem. > > It turns out that the patch not only fixes the pr49771 failure, but in > fact *all* outstanding vectorizer test case failures on SPU in both > mainline and the 4.6 branch: > > While I'm still somewhat concerned about potential performance > regressions, correctness is of course more important, so I'd really like > to see your patch (or something along its lines) go in ... > > Are there any updates on the libada problem or other reasons why the > patch cannot go in? Nope, I've solved that one. Letme update it. Ciao, Michael. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [rfa] Set alignment of pseudos via get_pointer_alignment 2011-08-09 12:01 ` Michael Matz @ 2011-08-12 16:41 ` Michael Matz 2011-08-12 22:53 ` Richard Guenther 0 siblings, 1 reply; 35+ messages in thread From: Michael Matz @ 2011-08-12 16:41 UTC (permalink / raw) To: Ulrich Weigand, gcc-patches; +Cc: Richard Guenther Hi, On Tue, 9 Aug 2011, Michael Matz wrote: > > Are there any updates on the libada problem or other reasons why the > > patch cannot go in? > > Nope, I've solved that one. Letme update it. Like so. Regstrapped on x86_64-linux (all languages + Ada). Okay for trunk? Ciao, Michael. -- * cfgexpand.c (expand_one_register_var): Use get_pointer_alignment. (gimple_expand_cfg): Merge alignment info for coalesced pointer SSA names. Index: cfgexpand.c =================================================================== --- cfgexpand.c (revision 177696) +++ cfgexpand.c (working copy) @@ -909,7 +909,7 @@ expand_one_register_var (tree var) mark_user_reg (x); if (POINTER_TYPE_P (type)) - mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); + mark_reg_pointer (x, get_pointer_alignment (var)); } /* A subroutine of expand_one_var. Called to assign rtl to a VAR_DECL that @@ -4265,6 +4265,31 @@ gimple_expand_cfg (void) } } + /* If we have a class containing differently aligned pointers + we need to merge those into the corresponding RTL pointer + alignment. */ + for (i = 1; i < num_ssa_names; i++) + { + tree name = ssa_name (i); + int part; + rtx r; + + if (!name + || !POINTER_TYPE_P (TREE_TYPE (name)) + /* We might have generated new SSA names in + update_alias_info_with_stack_vars. They will have a NULL + defining statements, and won't be part of the partitioning, + so ignore those. */ + || !SSA_NAME_DEF_STMT (name)) + continue; + part = var_to_partition (SA.map, name); + if (part == NO_PARTITION) + continue; + r = SA.partition_to_pseudo[part]; + if (REG_P (r)) + mark_reg_pointer (r, get_pointer_alignment (name)); + } + /* If this function is `main', emit a call to `__main' to run global initializers, etc. */ if (DECL_NAME (current_function_decl) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [rfa] Set alignment of pseudos via get_pointer_alignment 2011-08-12 16:41 ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz @ 2011-08-12 22:53 ` Richard Guenther 2011-08-23 15:04 ` Michael Matz 0 siblings, 1 reply; 35+ messages in thread From: Richard Guenther @ 2011-08-12 22:53 UTC (permalink / raw) To: Michael Matz; +Cc: Ulrich Weigand, gcc-patches On Fri, Aug 12, 2011 at 6:04 PM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Tue, 9 Aug 2011, Michael Matz wrote: > >> > Are there any updates on the libada problem or other reasons why the >> > patch cannot go in? >> >> Nope, I've solved that one. Letme update it. > > Like so. Regstrapped on x86_64-linux (all languages + Ada). Okay for > trunk? Ok. Thanks, Richard. > > Ciao, > Michael. > -- > * cfgexpand.c (expand_one_register_var): Use get_pointer_alignment. > (gimple_expand_cfg): Merge alignment info for coalesced pointer > SSA names. > > Index: cfgexpand.c > =================================================================== > --- cfgexpand.c (revision 177696) > +++ cfgexpand.c (working copy) > @@ -909,7 +909,7 @@ expand_one_register_var (tree var) > mark_user_reg (x); > > if (POINTER_TYPE_P (type)) > - mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type))); > + mark_reg_pointer (x, get_pointer_alignment (var)); > } > > /* A subroutine of expand_one_var. Called to assign rtl to a VAR_DECL that > @@ -4265,6 +4265,31 @@ gimple_expand_cfg (void) > } > } > > + /* If we have a class containing differently aligned pointers > + we need to merge those into the corresponding RTL pointer > + alignment. */ > + for (i = 1; i < num_ssa_names; i++) > + { > + tree name = ssa_name (i); > + int part; > + rtx r; > + > + if (!name > + || !POINTER_TYPE_P (TREE_TYPE (name)) > + /* We might have generated new SSA names in > + update_alias_info_with_stack_vars. They will have a NULL > + defining statements, and won't be part of the partitioning, > + so ignore those. */ > + || !SSA_NAME_DEF_STMT (name)) > + continue; > + part = var_to_partition (SA.map, name); > + if (part == NO_PARTITION) > + continue; > + r = SA.partition_to_pseudo[part]; > + if (REG_P (r)) > + mark_reg_pointer (r, get_pointer_alignment (name)); > + } > + > /* If this function is `main', emit a call to `__main' > to run global initializers, etc. */ > if (DECL_NAME (current_function_decl) > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [rfa] Set alignment of pseudos via get_pointer_alignment 2011-08-12 22:53 ` Richard Guenther @ 2011-08-23 15:04 ` Michael Matz 0 siblings, 0 replies; 35+ messages in thread From: Michael Matz @ 2011-08-23 15:04 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Guenther, Ulrich Weigand [-- Attachment #1: Type: TEXT/PLAIN, Size: 205 bytes --] Hi, > > Like so. Â Regstrapped on x86_64-linux (all languages + Ada). Â Okay for > > trunk? > > Ok. r177989 (JFYI because it's some time ago to make searching the archives easier). Ciao, Michael. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Merge alignments from coalesced SSA pointers 2011-07-26 16:18 ` Merge alignments from coalesced SSA pointers Michael Matz 2011-07-26 17:23 ` Michael Matz @ 2011-07-26 17:28 ` Ulrich Weigand 2011-07-27 9:13 ` Richard Guenther 1 sibling, 1 reply; 35+ messages in thread From: Ulrich Weigand @ 2011-07-26 17:28 UTC (permalink / raw) To: Michael Matz; +Cc: Richard Guenther, gcc-patches Michael Matz wrote: > On Tue, 26 Jul 2011, Michael Matz wrote: > > On Tue, 26 Jul 2011, Ulrich Weigand wrote: > > > > > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose > > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during > > > > expansion of the SSA name partition - iterate over all of them in the > > > > partition and pick the lowest alignment. Or even adjust the partitioning > > > > to avoid losing alignment information that way. > > > > > > That would certainly be helpful. > > > > I'm working on a patch for that, stay tuned. > > Like so. Currently in regstrapping on x86_64-linux. Could you try if it > helps spu? Well, it does help SPU in the sense that the wrong code generation goes away. However, it does so by setting REGNO_POINTER_ALIGN to the minimum of 8 just about every time -- not sure what the impact on generated code quality is. Maybe get_pointer_alignment should default to the type's alignment if nothing more specific is known, at least on STRICT_ALIGNMENT targets? Just like MEM_ALIGN defaults to the mode's alignment ... Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE Ulrich.Weigand@de.ibm.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Merge alignments from coalesced SSA pointers 2011-07-26 17:28 ` Merge alignments from coalesced SSA pointers Ulrich Weigand @ 2011-07-27 9:13 ` Richard Guenther 0 siblings, 0 replies; 35+ messages in thread From: Richard Guenther @ 2011-07-27 9:13 UTC (permalink / raw) To: Ulrich Weigand; +Cc: Michael Matz, gcc-patches On Tue, Jul 26, 2011 at 6:56 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Michael Matz wrote: >> On Tue, 26 Jul 2011, Michael Matz wrote: >> > On Tue, 26 Jul 2011, Ulrich Weigand wrote: >> > >> > > > Well, REG_ATTRS->decl is again a decl, not an SSA name. I suppose >> > > > we'd need to pick a conservative REGNO_POINTER_ALIGN during >> > > > expansion of the SSA name partition - iterate over all of them in the >> > > > partition and pick the lowest alignment. Or even adjust the partitioning >> > > > to avoid losing alignment information that way. >> > > >> > > That would certainly be helpful. >> > >> > I'm working on a patch for that, stay tuned. >> >> Like so. Currently in regstrapping on x86_64-linux. Could you try if it >> helps spu? > > Well, it does help SPU in the sense that the wrong code generation goes away. > > However, it does so by setting REGNO_POINTER_ALIGN to the minimum of 8 just > about every time -- not sure what the impact on generated code quality is. > > Maybe get_pointer_alignment should default to the type's alignment if > nothing more specific is known, at least on STRICT_ALIGNMENT targets? > Just like MEM_ALIGN defaults to the mode's alignment ... Which is bogus ... instead we should improve alignment tracking to take into account more sources of alignment information (it is very conservative right now - for a reason, of course, as we get most of the packed/aligned attribute stuff wrong from the frontend already as soon as pointers are involved). Richard. > Thanks, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com > ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2011-08-23 14:40 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-19 8:24 [patch] Fix PR tree-optimization/49771 Ira Rosen 2011-07-19 9:49 ` Richard Guenther 2011-07-19 14:01 ` Ira Rosen 2011-07-19 14:03 ` Richard Guenther 2011-07-20 19:47 ` Ulrich Weigand 2011-07-21 12:54 ` Ira Rosen 2011-07-24 14:32 ` Ira Rosen 2011-07-24 18:46 ` Richard Guenther 2011-07-25 9:44 ` Ulrich Weigand 2011-07-25 10:08 ` Richard Guenther 2011-07-25 11:26 ` Ira Rosen 2011-07-25 11:41 ` Richard Guenther 2011-07-25 12:33 ` Ira Rosen 2011-07-25 13:01 ` Richard Guenther 2011-07-25 13:07 ` Richard Guenther 2011-07-25 13:47 ` Ulrich Weigand 2011-07-25 14:01 ` Richard Guenther 2011-07-25 14:10 ` Richard Guenther 2011-07-25 14:14 ` Richard Guenther 2011-07-25 14:54 ` Ulrich Weigand 2011-07-25 14:59 ` Richard Guenther 2011-07-25 16:12 ` Ulrich Weigand 2011-07-26 8:25 ` Richard Guenther 2011-07-26 8:59 ` Andrew Pinski 2011-07-26 14:23 ` Ulrich Weigand 2011-07-26 14:25 ` Michael Matz 2011-07-26 16:18 ` Merge alignments from coalesced SSA pointers Michael Matz 2011-07-26 17:23 ` Michael Matz 2011-08-08 16:34 ` Ulrich Weigand 2011-08-09 12:01 ` Michael Matz 2011-08-12 16:41 ` [rfa] Set alignment of pseudos via get_pointer_alignment Michael Matz 2011-08-12 22:53 ` Richard Guenther 2011-08-23 15:04 ` Michael Matz 2011-07-26 17:28 ` Merge alignments from coalesced SSA pointers Ulrich Weigand 2011-07-27 9:13 ` Richard Guenther
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).