public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: Tree-SSA and POST_INC address mode inompatible in GCC4?
@ 2007-11-03 14:47 J.C. Pizarro
  2007-11-03 14:55 ` Kenneth Zadeck
  2007-11-03 14:55 ` J.C. Pizarro
  0 siblings, 2 replies; 16+ messages in thread
From: J.C. Pizarro @ 2007-11-03 14:47 UTC (permalink / raw)
  To: Kenneth Zadeck, gcc

2007/11/3, Kenneth Zadeck wrote:
> I believe that this is something new and is most likely fallout from
> diego's reworking of the tree to rtl converter.
>
> To fix this will require a round of copy propagation, most likely in
> concert with some induction variable detection, since the most
> profitable place for this will be in loops.
>
> I wonder if any of this effects the rtl level induction variable
> discovery?
>
> > Hi, Ramana,
> > I tried the trunk version  with/without your patch. It still produces
> > the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say
> >
> >          *a
> >            ...
> >            a <- a + c
> >
> >         becomes
> >
> >            *(a += c) post
> >
> > But the problem is after Tree-SSA pass,  there is no
> >            a <- a + c
> > But something like
> >            a_1 <- a + c
> >
> > Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
> > don't see this transformation is applicable in most scenarios. Any
> > comments?
> >
> > Cheers,
> > Bingfeng

They need to add an algorithm post-SSA that the code reuse the variables
converting a_j <- phi(a_i,...) to a_k <- phi(a_k,...).

The algorithms of POST_INC and POST_DEC are very specific, so an above
general algorithm is sufficient.

   J.C. Pizarro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 14:47 Tree-SSA and POST_INC address mode inompatible in GCC4? J.C. Pizarro
@ 2007-11-03 14:55 ` Kenneth Zadeck
  2007-11-03 14:59   ` J.C. Pizarro
  2007-11-03 14:55 ` J.C. Pizarro
  1 sibling, 1 reply; 16+ messages in thread
From: Kenneth Zadeck @ 2007-11-03 14:55 UTC (permalink / raw)
  To: J.C. Pizarro; +Cc: Kenneth Zadeck, gcc

J.C. Pizarro wrote:
> 2007/11/3, Kenneth Zadeck wrote:
>   
>> I believe that this is something new and is most likely fallout from
>> diego's reworking of the tree to rtl converter.
>>
>> To fix this will require a round of copy propagation, most likely in
>> concert with some induction variable detection, since the most
>> profitable place for this will be in loops.
>>
>> I wonder if any of this effects the rtl level induction variable
>> discovery?
>>
>>     
>>> Hi, Ramana,
>>> I tried the trunk version  with/without your patch. It still produces
>>> the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say
>>>
>>>          *a
>>>            ...
>>>            a <- a + c
>>>
>>>         becomes
>>>
>>>            *(a += c) post
>>>
>>> But the problem is after Tree-SSA pass,  there is no
>>>            a <- a + c
>>> But something like
>>>            a_1 <- a + c
>>>
>>> Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
>>> don't see this transformation is applicable in most scenarios. Any
>>> comments?
>>>
>>> Cheers,
>>> Bingfeng
>>>       
>
> They need to add an algorithm post-SSA that the code reuse the variables
> converting a_j <- phi(a_i,...) to a_k <- phi(a_k,...).
>
> The algorithms of POST_INC and POST_DEC are very specific, so an above
> general algorithm is sufficient.
>
>    J.C. Pizarro
>   
This is a little too simple.  It assumes that the tree passes did
nothing.  In fact you want to try to discover what the induction
variables are and rewrite them into a canonical form that auto-inc-dec
can then pick up on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 14:47 Tree-SSA and POST_INC address mode inompatible in GCC4? J.C. Pizarro
  2007-11-03 14:55 ` Kenneth Zadeck
@ 2007-11-03 14:55 ` J.C. Pizarro
  1 sibling, 0 replies; 16+ messages in thread
From: J.C. Pizarro @ 2007-11-03 14:55 UTC (permalink / raw)
  To: gcc

2007/11/3, J.C. Pizarro <jcpiza@gmail.com> wrote:
> They need to add an algorithm post-SSA that the code reuse the variables
> converting a_j <- phi(a_i,...) to a_k <- phi(a_k,...).

I'm sorry, "a_k <- phi(a_k,...)" is invalid due to SSA form definition,
but this algorithm need some form to represent it, e.g. SSA-extended.

   J.C. Pizarro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 14:55 ` Kenneth Zadeck
@ 2007-11-03 14:59   ` J.C. Pizarro
  0 siblings, 0 replies; 16+ messages in thread
From: J.C. Pizarro @ 2007-11-03 14:59 UTC (permalink / raw)
  To: Kenneth Zadeck, gcc

2007/11/3, Kenneth Zadeck <zadeck@naturalbridge.com> wrote:
> J.C. Pizarro wrote:
> > They need to add an algorithm post-SSA that the code reuse the variables
> > converting a_j <- phi(a_i,...) to a_k <- phi(a_k,...).
> >
> > The algorithms of POST_INC and POST_DEC are very specific, so an above
> > general algorithm is sufficient.
>
> This is a little too simple.  It assumes that the tree passes did
> nothing.  In fact you want to try to discover what the induction
> variables are and rewrite them into a canonical form that auto-inc-dec
> can then pick up on.

The general algorithm is not simple, it has to explore and to reorganize
the complex graph, and can fall to conflicts.

   J.C. Pizarro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-04 23:51       ` Mark Mitchell
@ 2007-11-05 19:30         ` Paul Brook
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Brook @ 2007-11-05 19:30 UTC (permalink / raw)
  To: gcc; +Cc: Mark Mitchell, Kenneth.Zadeck, zadeck, rakdver, dnovillo

On Sunday 04 November 2007, Mark Mitchell wrote:
> Kenneth Zadeck wrote:
> > To fix this will require a round of copy propagation, most likely in
> > concert with some induction variable detection, since the most
> > profitable place for this will be in loops.
>
> For code size, it will be profitable everywhere.  On ARM

Not technically true. On Thumb-2 we have variable length instruction encoding. 
For small offsets (< 16bytes IIRC) a pair of post increment loads is larger 
than a pair of offset loads and an add.

However using postincrement everywhere is a good start (probably better than 
what we have) and I'd guess is a prerequisite for adding more advanced cost 
heuristics.

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 13:52     ` Kenneth Zadeck
  2007-11-03 14:25       ` Zdenek Dvorak
@ 2007-11-04 23:51       ` Mark Mitchell
  2007-11-05 19:30         ` Paul Brook
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Mitchell @ 2007-11-04 23:51 UTC (permalink / raw)
  To: Kenneth.Zadeck; +Cc: gcc, zadeck, rakdver, dnovillo

Kenneth Zadeck wrote:

> To fix this will require a round of copy propagation, most likely in
> concert with some induction variable detection, since the most
> profitable place for this will be in loops.  

For code size, it will be profitable everywhere.  On ARM, aggressive use
of post-increment is an important code-size optimization and we are not
presently doing a very good job taking advantage.  So, whatever solution
we settle on should not be dependent on being in a loop.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 16:37             ` Richard Guenther
@ 2007-11-04  3:13               ` Daniel Berlin
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Berlin @ 2007-11-04  3:13 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Zdenek Dvorak, Kenneth Zadeck, gcc

On 11/3/07, Richard Guenther <richard.guenther@gmail.com> wrote:
> On 11/3/07, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> > Hi,
> >
> > > >> I believe that this is something new and is most likely fallout from
> > > >> diego's reworking of the tree to rtl converter.
> > > >>
> > > >> To fix this will require a round of copy propagation, most likely in
> > > >> concert with some induction variable detection, since the most
> > > >> profitable place for this will be in loops.
> > > >>
> > > >> I wonder if any of this effects the rtl level induction variable
> > > >> discovery?
> > > >>
> > > >
> > > > it should not (iv analysis is able to deal with this kind of ivs).
> > > >
> > > does the iv analysis canonize them in a way that we should perhaps
> > > consider moving the auto-inc detection after the iv analysis?
> >
> > no, iv analysis does not change the program; also, since the code in
> > this particular example is not in any loop, iv analysis is somewhat
> > irrelevant for it.
> >
> > Btw.  I would have actually expected this code to be folded to
> >
> >  *a_3(D) = D.1543_2;
> >   a_4 = a_3(D) + 1;
> >   b_5 = b_1(D) + 1;
> >   D.1543_6 = *b_5;
> >   *a_4 = D.1543_6;
> >   a_7 = a_3 + 2;
> >   b_8 = b_1 + 2;
> >   D.1543_9 = *b_8;
> >   *a_7 = D.1543_9;
> >   a_10 = a_3 + 3;
> >   b_11 = b_1 + 3;
> >   D.1543_12 = *b_11;
> >   *a_10 = D.1543_12;
> >   a_13 = a_3 + 4;
> >   b_14 = b_1 + 4;
> >   D.1543_15 = *b_14;
> >   *a_13 = D.1543_15;
> >
> > etc.; I am fairly sure we used to do this.
>
> I guess FRE did this in former times.  While current VN figures this out,
> it doesn't do the replacement:
>
> Value numbering a_10 stmt = a_10 = a_7 + 1;
> RHS a_7 + 1 simplified to a_3(D) + 3 has constants 0
> Setting value number of a_10 to a_10
>
> Danny, is this an oversight?

FRE currently will only try to replace with an existing SSA_NAME, not
with an expression.

It probably should, since it will help eliminate dead code if it does.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 15:27           ` Zdenek Dvorak
  2007-11-03 16:23             ` Bingfeng Mei
@ 2007-11-03 16:37             ` Richard Guenther
  2007-11-04  3:13               ` Daniel Berlin
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2007-11-03 16:37 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Kenneth Zadeck, gcc, Daniel Berlin

On 11/3/07, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
> > >> I believe that this is something new and is most likely fallout from
> > >> diego's reworking of the tree to rtl converter.
> > >>
> > >> To fix this will require a round of copy propagation, most likely in
> > >> concert with some induction variable detection, since the most
> > >> profitable place for this will be in loops.
> > >>
> > >> I wonder if any of this effects the rtl level induction variable
> > >> discovery?
> > >>
> > >
> > > it should not (iv analysis is able to deal with this kind of ivs).
> > >
> > does the iv analysis canonize them in a way that we should perhaps
> > consider moving the auto-inc detection after the iv analysis?
>
> no, iv analysis does not change the program; also, since the code in
> this particular example is not in any loop, iv analysis is somewhat
> irrelevant for it.
>
> Btw.  I would have actually expected this code to be folded to
>
>  *a_3(D) = D.1543_2;
>   a_4 = a_3(D) + 1;
>   b_5 = b_1(D) + 1;
>   D.1543_6 = *b_5;
>   *a_4 = D.1543_6;
>   a_7 = a_3 + 2;
>   b_8 = b_1 + 2;
>   D.1543_9 = *b_8;
>   *a_7 = D.1543_9;
>   a_10 = a_3 + 3;
>   b_11 = b_1 + 3;
>   D.1543_12 = *b_11;
>   *a_10 = D.1543_12;
>   a_13 = a_3 + 4;
>   b_14 = b_1 + 4;
>   D.1543_15 = *b_14;
>   *a_13 = D.1543_15;
>
> etc.; I am fairly sure we used to do this.

I guess FRE did this in former times.  While current VN figures this out,
it doesn't do the replacement:

Value numbering a_10 stmt = a_10 = a_7 + 1;
RHS a_7 + 1 simplified to a_3(D) + 3 has constants 0
Setting value number of a_10 to a_10

Danny, is this an oversight?

Richard.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 15:27           ` Zdenek Dvorak
@ 2007-11-03 16:23             ` Bingfeng Mei
  2007-11-03 16:37             ` Richard Guenther
  1 sibling, 0 replies; 16+ messages in thread
From: Bingfeng Mei @ 2007-11-03 16:23 UTC (permalink / raw)
  To: Zdenek Dvorak, Kenneth Zadeck; +Cc: gcc

Yes, that is the better way to generate code than using POST_INC since
it eliminates unnecessary dependency. Which version used to do this?
Where should it be done? I am thinking one of those copy propagation
passes. Am I right?

Cheers,
Bingfeng

-----Original Message-----
From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of
Zdenek Dvorak
Sent: 03 November 2007 15:27
To: Kenneth Zadeck
Cc: gcc@gcc.gnu.org
Subject: Re: Tree-SSA and POST_INC address mode inompatible in GCC4?

Hi,

> >> I believe that this is something new and is most likely fallout
from
> >> diego's reworking of the tree to rtl converter.
> >>
> >> To fix this will require a round of copy propagation, most likely
in
> >> concert with some induction variable detection, since the most
> >> profitable place for this will be in loops.  
> >>
> >> I wonder if any of this effects the rtl level induction variable
> >> discovery?
> >>     
> >
> > it should not (iv analysis is able to deal with this kind of ivs).
> >
> does the iv analysis canonize them in a way that we should perhaps
> consider moving the auto-inc detection after the iv analysis?

no, iv analysis does not change the program; also, since the code in
this particular example is not in any loop, iv analysis is somewhat
irrelevant for it.

Btw.  I would have actually expected this code to be folded to

 *a_3(D) = D.1543_2;
  a_4 = a_3(D) + 1;
  b_5 = b_1(D) + 1;
  D.1543_6 = *b_5;
  *a_4 = D.1543_6;
  a_7 = a_3 + 2;
  b_8 = b_1 + 2;
  D.1543_9 = *b_8;
  *a_7 = D.1543_9;
  a_10 = a_3 + 3;
  b_11 = b_1 + 3;
  D.1543_12 = *b_11;
  *a_10 = D.1543_12;
  a_13 = a_3 + 4;
  b_14 = b_1 + 4;
  D.1543_15 = *b_14;
  *a_13 = D.1543_15;

etc.; I am fairly sure we used to do this.

Zdenek


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 14:52         ` Kenneth Zadeck
@ 2007-11-03 15:27           ` Zdenek Dvorak
  2007-11-03 16:23             ` Bingfeng Mei
  2007-11-03 16:37             ` Richard Guenther
  0 siblings, 2 replies; 16+ messages in thread
From: Zdenek Dvorak @ 2007-11-03 15:27 UTC (permalink / raw)
  To: Kenneth Zadeck; +Cc: gcc

Hi,

> >> I believe that this is something new and is most likely fallout from
> >> diego's reworking of the tree to rtl converter.
> >>
> >> To fix this will require a round of copy propagation, most likely in
> >> concert with some induction variable detection, since the most
> >> profitable place for this will be in loops.  
> >>
> >> I wonder if any of this effects the rtl level induction variable
> >> discovery?
> >>     
> >
> > it should not (iv analysis is able to deal with this kind of ivs).
> >
> does the iv analysis canonize them in a way that we should perhaps
> consider moving the auto-inc detection after the iv analysis?

no, iv analysis does not change the program; also, since the code in
this particular example is not in any loop, iv analysis is somewhat
irrelevant for it.

Btw.  I would have actually expected this code to be folded to

 *a_3(D) = D.1543_2;
  a_4 = a_3(D) + 1;
  b_5 = b_1(D) + 1;
  D.1543_6 = *b_5;
  *a_4 = D.1543_6;
  a_7 = a_3 + 2;
  b_8 = b_1 + 2;
  D.1543_9 = *b_8;
  *a_7 = D.1543_9;
  a_10 = a_3 + 3;
  b_11 = b_1 + 3;
  D.1543_12 = *b_11;
  *a_10 = D.1543_12;
  a_13 = a_3 + 4;
  b_14 = b_1 + 4;
  D.1543_15 = *b_14;
  *a_13 = D.1543_15;

etc.; I am fairly sure we used to do this.

Zdenek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 14:25       ` Zdenek Dvorak
@ 2007-11-03 14:52         ` Kenneth Zadeck
  2007-11-03 15:27           ` Zdenek Dvorak
  0 siblings, 1 reply; 16+ messages in thread
From: Kenneth Zadeck @ 2007-11-03 14:52 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Kenneth Zadeck, gcc, rakdver, dnovillo

Zdenek Dvorak wrote:
> Hi,
>
>   
>> I believe that this is something new and is most likely fallout from
>> diego's reworking of the tree to rtl converter.
>>
>> To fix this will require a round of copy propagation, most likely in
>> concert with some induction variable detection, since the most
>> profitable place for this will be in loops.  
>>
>> I wonder if any of this effects the rtl level induction variable
>> discovery?
>>     
>
> it should not (iv analysis is able to deal with this kind of ivs).
>
> Zdenek
>
>   
does the iv analysis canonize them in a way that we should perhaps
consider moving the auto-inc detection after the iv analysis?
>>> Hi, Ramana,
>>> I tried the trunk version  with/without your patch. It still produces
>>> the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say 
>>>
>>>          *a
>>>            ...
>>>            a <- a + c
>>>
>>>         becomes
>>>
>>>            *(a += c) post
>>>
>>> But the problem is after Tree-SSA pass,  there is no
>>>            a <- a + c
>>> But something like
>>>            a_1 <- a + c
>>>
>>> Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
>>> don't see this transformation is applicable in most scenarios. Any
>>> comments? 
>>>
>>> Cheers,
>>> Bingfeng
>>>
>>>       

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-03 13:52     ` Kenneth Zadeck
@ 2007-11-03 14:25       ` Zdenek Dvorak
  2007-11-03 14:52         ` Kenneth Zadeck
  2007-11-04 23:51       ` Mark Mitchell
  1 sibling, 1 reply; 16+ messages in thread
From: Zdenek Dvorak @ 2007-11-03 14:25 UTC (permalink / raw)
  To: Kenneth Zadeck; +Cc: gcc, zadeck, rakdver, dnovillo

Hi,

> I believe that this is something new and is most likely fallout from
> diego's reworking of the tree to rtl converter.
> 
> To fix this will require a round of copy propagation, most likely in
> concert with some induction variable detection, since the most
> profitable place for this will be in loops.  
> 
> I wonder if any of this effects the rtl level induction variable
> discovery?

it should not (iv analysis is able to deal with this kind of ivs).

Zdenek

> > Hi, Ramana,
> > I tried the trunk version  with/without your patch. It still produces
> > the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say 
> > 
> >          *a
> >            ...
> >            a <- a + c
> > 
> >         becomes
> > 
> >            *(a += c) post
> > 
> > But the problem is after Tree-SSA pass,  there is no
> >            a <- a + c
> > But something like
> >            a_1 <- a + c
> > 
> > Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
> > don't see this transformation is applicable in most scenarios. Any
> > comments? 
> > 
> > Cheers,
> > Bingfeng
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-02 14:34   ` Bingfeng Mei
@ 2007-11-03 13:52     ` Kenneth Zadeck
  2007-11-03 14:25       ` Zdenek Dvorak
  2007-11-04 23:51       ` Mark Mitchell
  0 siblings, 2 replies; 16+ messages in thread
From: Kenneth Zadeck @ 2007-11-03 13:52 UTC (permalink / raw)
  To: gcc; +Cc: zadeck, rakdver, dnovillo

I believe that this is something new and is most likely fallout from
diego's reworking of the tree to rtl converter.

To fix this will require a round of copy propagation, most likely in
concert with some induction variable detection, since the most
profitable place for this will be in loops.  

I wonder if any of this effects the rtl level induction variable
discovery?

> Hi, Ramana,
> I tried the trunk version  with/without your patch. It still produces
> the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say 
> 
>          *a
>            ...
>            a <- a + c
> 
>         becomes
> 
>            *(a += c) post
> 
> But the problem is after Tree-SSA pass,  there is no
>            a <- a + c
> But something like
>            a_1 <- a + c
> 
> Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
> don't see this transformation is applicable in most scenarios. Any
> comments? 
> 
> Cheers,
> Bingfeng
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-02 12:38 ` Ramana Radhakrishnan
@ 2007-11-02 14:34   ` Bingfeng Mei
  2007-11-03 13:52     ` Kenneth Zadeck
  0 siblings, 1 reply; 16+ messages in thread
From: Bingfeng Mei @ 2007-11-02 14:34 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc

Hi, Ramana,
I tried the trunk version  with/without your patch. It still produces
the same code as gcc4.2.2 does. In auto-inc-dec.c, the comments say 

         *a
           ...
           a <- a + c

        becomes

           *(a += c) post

But the problem is after Tree-SSA pass,  there is no
           a <- a + c
But something like
           a_1 <- a + c

Unless the auto-inc-dec.c can reverse a_1 <- a + c to a <- a + c. I
don't see this transformation is applicable in most scenarios. Any
comments? 

Cheers,
Bingfeng


-----Original Message-----
From: Ramana Radhakrishnan [mailto:ramana.r@gmail.com] 
Sent: 02 November 2007 12:39
To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Tree-SSA and POST_INC address mode inompatible in GCC4?

Hi Bingfeng,


On 11/2/07, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hello,
>
> I look at the following the code to see what is the difference between
> GCC4 and GCC3 in using POST_INC address mode (or other similar modes).
>
> void tst(char * __restrict__ a, char * __restrict__ b){
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a = *b;
> }


We have seen this in a number of other ports as well - I had hacked up
a patch to sort this precise problem out but that was for trunk / 4.3
and is not applicable for 4.2.x since the autoincrement detector was
rewritten post 4.2.


http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01060.html

I haven't yet had time to rework this based on the comments but it
surely is on my radar of things to do.

cheers
Ramana


>
>
> Using ARM processor as a target, GCC4.2.2 generates the following
> assembly:
> tst:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         mov     r2, r1
>         ldrb    ip, [r2], #1    @ zero_extendqisi2
>         mov     r3, r0
>         strb    ip, [r3], #1
>         ldrb    r1, [r1, #1]    @ zero_extendqisi2
>         strb    r1, [r0, #1]
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         ldrb    r2, [r2, #2]    @ zero_extendqisi2
>         @ lr needed for prologue
>         strb    r2, [r3, #2]
>         bx      lr
>         .size   tst, .-tst
>         .ident  "GCC: (GNU) 4.2.2"
>
> And GCC3.4.6 generates much better code by using POST_INC address mode
> extensively
>
> tst:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1, #0]    @ zero_extendqisi2
>         @ lr needed for prologue
>         strb    r3, [r0, #0]
>         mov     pc, lr
>         .size   tst, .-tst
>         .ident  "GCC: (GNU) 3.4.6"
>
> I look at dumped tst.c.102t.final_cleanup:
> tst (a, b)
> {
>   char * restrict a.54;
>   char * restrict a.53;
>   char * restrict a.52;
>   char * restrict a.51;
>   char * restrict a.50;
>   char * restrict b.48;
>   char * restrict b.47;
>   char * restrict b.46;
>   char * restrict b.45;
>   char * restrict b.44;
>
> <bb 2>:
>   *a = *b;
>   a.50 = a + 1B;
>   b.44 = b + 1B;
>   *a.50 = *b.44;
>   a.51 = a.50 + 1B;
>   b.45 = b.44 + 1B;
>   *a.51 = *b.45;
>   a.52 = a.51 + 1B;
>   b.46 = b.45 + 1B;
>   *a.52 = *b.46;
>   a.53 = a.52 + 1B;
>   b.47 = b.46 + 1B;
>   *a.53 = *b.47;
>   a.54 = a.53 + 1B;
>   b.48 = b.47 + 1B;
>   *a.54 = *b.48;
>   *(a.54 + 1B) = *(b.48 + 1B);
>   return;
>
> }
> I believe it is a fundermental issue for Tree-SSA IR. POST_INC address
> mode requires a pattern that the same variable is used for
incrementing
> (both USE and DEF), while the SSA form produces a different varible
for
> each DEF. Therefore, GCC4 cannot efficiently use POST_INC and other
> similar address modes. Is there any solution to overcome this problem?
> Any suggestion is greatly appreciated.
>
>
> Bingfeng Mei
> Broadcom UK
>
>


-- 
Ramana Radhakrishnan
GNU Tools
Celunite Inc.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Tree-SSA and POST_INC address mode inompatible in GCC4?
  2007-11-02 12:24 Bingfeng Mei
@ 2007-11-02 12:38 ` Ramana Radhakrishnan
  2007-11-02 14:34   ` Bingfeng Mei
  0 siblings, 1 reply; 16+ messages in thread
From: Ramana Radhakrishnan @ 2007-11-02 12:38 UTC (permalink / raw)
  To: Bingfeng Mei; +Cc: gcc

Hi Bingfeng,


On 11/2/07, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hello,
>
> I look at the following the code to see what is the difference between
> GCC4 and GCC3 in using POST_INC address mode (or other similar modes).
>
> void tst(char * __restrict__ a, char * __restrict__ b){
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a++ = *b++;
>   *a = *b;
> }


We have seen this in a number of other ports as well - I had hacked up
a patch to sort this precise problem out but that was for trunk / 4.3
and is not applicable for 4.2.x since the autoincrement detector was
rewritten post 4.2.


http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01060.html

I haven't yet had time to rework this based on the comments but it
surely is on my radar of things to do.

cheers
Ramana


>
>
> Using ARM processor as a target, GCC4.2.2 generates the following
> assembly:
> tst:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         mov     r2, r1
>         ldrb    ip, [r2], #1    @ zero_extendqisi2
>         mov     r3, r0
>         strb    ip, [r3], #1
>         ldrb    r1, [r1, #1]    @ zero_extendqisi2
>         strb    r1, [r0, #1]
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         add     r2, r2, #1
>         ldrb    r1, [r2, #1]    @ zero_extendqisi2
>         add     r3, r3, #1
>         strb    r1, [r3, #1]
>         ldrb    r2, [r2, #2]    @ zero_extendqisi2
>         @ lr needed for prologue
>         strb    r2, [r3, #2]
>         bx      lr
>         .size   tst, .-tst
>         .ident  "GCC: (GNU) 4.2.2"
>
> And GCC3.4.6 generates much better code by using POST_INC address mode
> extensively
>
> tst:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1], #1    @ zero_extendqisi2
>         strb    r3, [r0], #1
>         ldrb    r3, [r1, #0]    @ zero_extendqisi2
>         @ lr needed for prologue
>         strb    r3, [r0, #0]
>         mov     pc, lr
>         .size   tst, .-tst
>         .ident  "GCC: (GNU) 3.4.6"
>
> I look at dumped tst.c.102t.final_cleanup:
> tst (a, b)
> {
>   char * restrict a.54;
>   char * restrict a.53;
>   char * restrict a.52;
>   char * restrict a.51;
>   char * restrict a.50;
>   char * restrict b.48;
>   char * restrict b.47;
>   char * restrict b.46;
>   char * restrict b.45;
>   char * restrict b.44;
>
> <bb 2>:
>   *a = *b;
>   a.50 = a + 1B;
>   b.44 = b + 1B;
>   *a.50 = *b.44;
>   a.51 = a.50 + 1B;
>   b.45 = b.44 + 1B;
>   *a.51 = *b.45;
>   a.52 = a.51 + 1B;
>   b.46 = b.45 + 1B;
>   *a.52 = *b.46;
>   a.53 = a.52 + 1B;
>   b.47 = b.46 + 1B;
>   *a.53 = *b.47;
>   a.54 = a.53 + 1B;
>   b.48 = b.47 + 1B;
>   *a.54 = *b.48;
>   *(a.54 + 1B) = *(b.48 + 1B);
>   return;
>
> }
> I believe it is a fundermental issue for Tree-SSA IR. POST_INC address
> mode requires a pattern that the same variable is used for incrementing
> (both USE and DEF), while the SSA form produces a different varible for
> each DEF. Therefore, GCC4 cannot efficiently use POST_INC and other
> similar address modes. Is there any solution to overcome this problem?
> Any suggestion is greatly appreciated.
>
>
> Bingfeng Mei
> Broadcom UK
>
>


-- 
Ramana Radhakrishnan
GNU Tools
Celunite Inc.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Tree-SSA and POST_INC address mode inompatible in GCC4?
@ 2007-11-02 12:24 Bingfeng Mei
  2007-11-02 12:38 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 16+ messages in thread
From: Bingfeng Mei @ 2007-11-02 12:24 UTC (permalink / raw)
  To: gcc

Hello,

I look at the following the code to see what is the difference between
GCC4 and GCC3 in using POST_INC address mode (or other similar modes). 

void tst(char * __restrict__ a, char * __restrict__ b){
  *a++ = *b++;
  *a++ = *b++;
  *a++ = *b++;
  *a++ = *b++;
  *a++ = *b++;
  *a++ = *b++;
  *a = *b;
}


Using ARM processor as a target, GCC4.2.2 generates the following
assembly:
tst:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	mov	r2, r1
	ldrb	ip, [r2], #1	@ zero_extendqisi2
	mov	r3, r0
	strb	ip, [r3], #1
	ldrb	r1, [r1, #1]	@ zero_extendqisi2
	strb	r1, [r0, #1]
	ldrb	r1, [r2, #1]	@ zero_extendqisi2
	strb	r1, [r3, #1]
	add	r2, r2, #1
	ldrb	r1, [r2, #1]	@ zero_extendqisi2
	add	r3, r3, #1
	strb	r1, [r3, #1]
	add	r2, r2, #1
	ldrb	r1, [r2, #1]	@ zero_extendqisi2
	add	r3, r3, #1
	strb	r1, [r3, #1]
	add	r2, r2, #1
	ldrb	r1, [r2, #1]	@ zero_extendqisi2
	add	r3, r3, #1
	strb	r1, [r3, #1]
	ldrb	r2, [r2, #2]	@ zero_extendqisi2
	@ lr needed for prologue
	strb	r2, [r3, #2]
	bx	lr
	.size	tst, .-tst
	.ident	"GCC: (GNU) 4.2.2"

And GCC3.4.6 generates much better code by using POST_INC address mode
extensively

tst:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1], #1	@ zero_extendqisi2
	strb	r3, [r0], #1
	ldrb	r3, [r1, #0]	@ zero_extendqisi2
	@ lr needed for prologue
	strb	r3, [r0, #0]
	mov	pc, lr
	.size	tst, .-tst
	.ident	"GCC: (GNU) 3.4.6"

I look at dumped tst.c.102t.final_cleanup:
tst (a, b)
{
  char * restrict a.54;
  char * restrict a.53;
  char * restrict a.52;
  char * restrict a.51;
  char * restrict a.50;
  char * restrict b.48;
  char * restrict b.47;
  char * restrict b.46;
  char * restrict b.45;
  char * restrict b.44;

<bb 2>:
  *a = *b;
  a.50 = a + 1B;
  b.44 = b + 1B;
  *a.50 = *b.44;
  a.51 = a.50 + 1B;
  b.45 = b.44 + 1B;
  *a.51 = *b.45;
  a.52 = a.51 + 1B;
  b.46 = b.45 + 1B;
  *a.52 = *b.46;
  a.53 = a.52 + 1B;
  b.47 = b.46 + 1B;
  *a.53 = *b.47;
  a.54 = a.53 + 1B;
  b.48 = b.47 + 1B;
  *a.54 = *b.48;
  *(a.54 + 1B) = *(b.48 + 1B);
  return;

}
I believe it is a fundermental issue for Tree-SSA IR. POST_INC address
mode requires a pattern that the same variable is used for incrementing
(both USE and DEF), while the SSA form produces a different varible for
each DEF. Therefore, GCC4 cannot efficiently use POST_INC and other
similar address modes. Is there any solution to overcome this problem?
Any suggestion is greatly appreciated. 


Bingfeng Mei
Broadcom UK

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-11-05 19:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-03 14:47 Tree-SSA and POST_INC address mode inompatible in GCC4? J.C. Pizarro
2007-11-03 14:55 ` Kenneth Zadeck
2007-11-03 14:59   ` J.C. Pizarro
2007-11-03 14:55 ` J.C. Pizarro
  -- strict thread matches above, loose matches on Subject: below --
2007-11-02 12:24 Bingfeng Mei
2007-11-02 12:38 ` Ramana Radhakrishnan
2007-11-02 14:34   ` Bingfeng Mei
2007-11-03 13:52     ` Kenneth Zadeck
2007-11-03 14:25       ` Zdenek Dvorak
2007-11-03 14:52         ` Kenneth Zadeck
2007-11-03 15:27           ` Zdenek Dvorak
2007-11-03 16:23             ` Bingfeng Mei
2007-11-03 16:37             ` Richard Guenther
2007-11-04  3:13               ` Daniel Berlin
2007-11-04 23:51       ` Mark Mitchell
2007-11-05 19:30         ` Paul Brook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).