public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GCSE/PRE problem
@ 2003-04-28  7:26 Mostafa Hagog
  2003-04-28  9:35 ` Jan Hubicka
  2003-04-28 13:44 ` Zdenek Dvorak
  0 siblings, 2 replies; 7+ messages in thread
From: Mostafa Hagog @ 2003-04-28  7:26 UTC (permalink / raw)
  To: gcc

There was a discussion on a problem in GCSE/PRE, I should have posted this
in gcc@gcc.gnu.org in the first place.
I am posting all the previous discussion now along with my response.

PREVIOUS DISCUSSION
> > I am trying to use this pass to implement an optimization of redundant
> > loads due to stores.
> > I gave it several tries and found out that we have a problem due to the
way
> > global variables are accessed - in such an access there will be one add
and
> > two loads. Thus in order to eliminate the load GCSE must eliminate the
add
> > in the first try , then the first load and finally the last load - a
total
> > of 3 GCSE passes.
> > Bottom line , we need at least three iterations of "one_pre_gcse_pass".
> > I made some tries to see if the GCSE is capable of doing so in general
and
> > wrote the below example.
> > It has a simple addition and multiplication expressions that are
dependent;
> > GCC couldn't make the second pass elimination as it should (according
to my
> > understanding) , it didn't remove the multiplication.

> the main reason (*) is that gcc's gcse is too stupid to handle
> multiplication. The reason for this is that multiplication instruction
> on x86 clobbers flags and we are unable to handle it (see
> want_to_gcse_p).

>
> Honza Hubicka has a patch for this on cfg branch (he planned to push it
> to mainline but probably had too much other work).

We've recently chattend about these patches with Richard.  I would like
to re/continue pushing them to manline but they need trorough testing as
the concept is pretty different.
It is top on my TODO, but recently I don=t even have time to look into
my TODO..

             Honza
>
             > Zdenek
>
> (*) There is one more minor issue I've found while examining your
> problem that I think also might also cause problems. Try the cfg branch
> first, if it won't help I will have a look at that.
>
> > I have also looked at the RTL dump after gcse to confirm this.
> > Is this is a known problem? Am I doing some things wrong or
misunderstood
> > what PRE should do?
> >
> > I have compiled with gcc -O3 -S -dG --param max-gcse-passes=3
> >
> > The code:
> > void g (int a , int b , int c);
> > void f (int x , int y , int z)
> > {
> >     int a = 0, b = 0 , c= 0,t1,t2,t3;
> >     if (z > 0 )
> >     {
> >        t1 = x + y;
> >        a = t1 * z;
> >     }
> >     else
> >     {
> >         t2 = x + y;
> >         b = t2 *z;
> >
> >     }
> >
> >     t3 = x + y;  /* Redundant , should be removed by GCSE first try*/
> >     c = t3 * z;  /* Redundant , should be removed by GCSE second try*/
> >     g(a,b,c);
> > }
> > Mustafa

MY RESPONSE:

I have tested the above example on a PowerPC machine, which doesn't have
this clobber problem, and I still have the same result.
Moreover, I have additional example with load and addition (without any
multiplication), which also suffer from the same problem.
I have continued looking at the GCSE code, and found out that the problem
is due to the fact that the "reaching pseudo register" that holds the
redundant add value has different number from the one that holds the
addition value in the other two additions.
Thus, the redundant multiplication is not recognized as the same expression
as the other two multiplications.
In order to be sure that this is indeed the problem, I took the early
coalesce patch (from cgf branch cfg-coalesce.c) and called  coalesce()
after each iteration of one_pre_gcse_pass. And it did removed the redundant
multiplication!
What happened is that the coalesce() rearranged the pseudo register usage
and put the addition result in one pseudo. Now the GCSE recognizes the
three multiplications as the same expression.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-04-28  7:26 GCSE/PRE problem Mostafa Hagog
@ 2003-04-28  9:35 ` Jan Hubicka
  2003-04-28 13:44 ` Zdenek Dvorak
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2003-04-28  9:35 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: gcc

> 
> MY RESPONSE:
> 
> I have tested the above example on a PowerPC machine, which doesn't have
> this clobber problem, and I still have the same result.
> Moreover, I have additional example with load and addition (without any
> multiplication), which also suffer from the same problem.
> I have continued looking at the GCSE code, and found out that the problem
> is due to the fact that the "reaching pseudo register" that holds the
> redundant add value has different number from the one that holds the
> addition value in the other two additions.
> Thus, the redundant multiplication is not recognized as the same expression
> as the other two multiplications.
> In order to be sure that this is indeed the problem, I took the early
> coalesce patch (from cgf branch cfg-coalesce.c) and called  coalesce()
> after each iteration of one_pre_gcse_pass. And it did removed the redundant
> multiplication!
> What happened is that the coalesce() rearranged the pseudo register usage
> and put the addition result in one pseudo. Now the GCSE recognizes the
> three multiplications as the same expression.

This is interesting.  I am still considering on what to do with the
register coalescing pass.  Is there any particular reason why the copy
propagation didn't unified the operands too? Or is that just lazyness in
the implementation?

Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-04-28  7:26 GCSE/PRE problem Mostafa Hagog
  2003-04-28  9:35 ` Jan Hubicka
@ 2003-04-28 13:44 ` Zdenek Dvorak
  2003-04-28 23:46   ` Mostafa Hagog
  1 sibling, 1 reply; 7+ messages in thread
From: Zdenek Dvorak @ 2003-04-28 13:44 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: gcc

Hello,

> There was a discussion on a problem in GCSE/PRE, I should have posted this
> in gcc@gcc.gnu.org in the first place.
> I am posting all the previous discussion now along with my response.
> 
> PREVIOUS DISCUSSION
> > > I am trying to use this pass to implement an optimization of redundant
> > > loads due to stores.
> > > I gave it several tries and found out that we have a problem due to the
> way
> > > global variables are accessed - in such an access there will be one add
> and
> > > two loads. Thus in order to eliminate the load GCSE must eliminate the
> add
> > > in the first try , then the first load and finally the last load - a
> total
> > > of 3 GCSE passes.
> > > Bottom line , we need at least three iterations of "one_pre_gcse_pass".
> > > I made some tries to see if the GCSE is capable of doing so in general
> and
> > > wrote the below example.
> > > It has a simple addition and multiplication expressions that are
> dependent;
> > > GCC couldn't make the second pass elimination as it should (according
> to my
> > > understanding) , it didn't remove the multiplication.
> 
> > the main reason (*) is that gcc's gcse is too stupid to handle
> > multiplication. The reason for this is that multiplication instruction
> > on x86 clobbers flags and we are unable to handle it (see
> > want_to_gcse_p).
> 
> >
> > Honza Hubicka has a patch for this on cfg branch (he planned to push it
> > to mainline but probably had too much other work).
> 
> We've recently chattend about these patches with Richard.  I would like
> to re/continue pushing them to manline but they need trorough testing as
> the concept is pretty different.
> It is top on my TODO, but recently I don=t even have time to look into
> my TODO..
> 
>              Honza
> >
>              > Zdenek
> >
> > (*) There is one more minor issue I've found while examining your
> > problem that I think also might also cause problems. Try the cfg branch
> > first, if it won't help I will have a look at that.
> >
> > > I have also looked at the RTL dump after gcse to confirm this.
> > > Is this is a known problem? Am I doing some things wrong or
> misunderstood
> > > what PRE should do?
> > >
> > > I have compiled with gcc -O3 -S -dG --param max-gcse-passes=3
> > >
> > > The code:
> > > void g (int a , int b , int c);
> > > void f (int x , int y , int z)
> > > {
> > >     int a = 0, b = 0 , c= 0,t1,t2,t3;
> > >     if (z > 0 )
> > >     {
> > >        t1 = x + y;
> > >        a = t1 * z;
> > >     }
> > >     else
> > >     {
> > >         t2 = x + y;
> > >         b = t2 *z;
> > >
> > >     }
> > >
> > >     t3 = x + y;  /* Redundant , should be removed by GCSE first try*/
> > >     c = t3 * z;  /* Redundant , should be removed by GCSE second try*/
> > >     g(a,b,c);
> > > }
> > > Mustafa
> 
> MY RESPONSE:
> 
> I have tested the above example on a PowerPC machine, which doesn't have
> this clobber problem, and I still have the same result.
> Moreover, I have additional example with load and addition (without any
> multiplication), which also suffer from the same problem.
> I have continued looking at the GCSE code, and found out that the problem
> is due to the fact that the "reaching pseudo register" that holds the
> redundant add value has different number from the one that holds the
> addition value in the other two additions.
> Thus, the redundant multiplication is not recognized as the same expression
> as the other two multiplications.
> In order to be sure that this is indeed the problem, I took the early
> coalesce patch (from cgf branch cfg-coalesce.c) and called  coalesce()
> after each iteration of one_pre_gcse_pass. And it did removed the redundant
> multiplication!
> What happened is that the coalesce() rearranged the pseudo register usage
> and put the addition result in one pseudo. Now the GCSE recognizes the
> three multiplications as the same expression.

yes, this is the additional problem I have spoken about. In fact the
solution is easier -- in pre_insert_copy_insn we replace

x = something

with

x = something
temp = x.

Instead we should do

temp = something
x = temp

and copy propagation would take care of the rest.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-04-28 13:44 ` Zdenek Dvorak
@ 2003-04-28 23:46   ` Mostafa Hagog
  2003-05-05 18:46     ` Zdenek Dvorak
  0 siblings, 1 reply; 7+ messages in thread
From: Mostafa Hagog @ 2003-04-28 23:46 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc

Hi ,
  You are correct , I have changed the implementation of  "pre_ins
ert_copy_insn" function and it did worked.

Regards,
  Mostafa Hagog
 Systems and Software  Department
 IBM Research Lab in Haifa
 e-mail: mustafa@il.ibm.com
 Phone: +972 4 829-6518   Fax: +972 4 829-6112



                                                                                                                                             
                      Zdenek Dvorak                                                                                                          
                      <rakdver@atrey.karlin.m        To:       Mostafa Hagog/Haifa/IBM@IBMIL                                                 
                      ff.cuni.cz>                    cc:       gcc@gcc.gnu.org                                                               
                                                     Subject:  Re: GCSE/PRE problem                                                          
                      28/04/2003 12:29                                                                                                       
                                                                                                                                             




Hello,

> There was a discussion on a problem in GCSE/PRE, I should have posted
this
> in gcc@gcc.gnu.org in the first place.
> I am posting all the previous discussion now along with my response.
>
> PREVIOUS DISCUSSION
> > > I am trying to use this pass to implement an optimization of
redundant
> > > loads due to stores.
> > > I gave it several tries and found out that we have a problem due to
the
> way
> > > global variables are accessed - in such an access there will be one
add
> and
> > > two loads. Thus in order to eliminate the load GCSE must eliminate
the
> add
> > > in the first try , then the first load and finally the last load - a
> total
> > > of 3 GCSE passes.
> > > Bottom line , we need at least three iterations of
"one_pre_gcse_pass".
> > > I made some tries to see if the GCSE is capable of doing so in
general
> and
> > > wrote the below example.
> > > It has a simple addition and multiplication expressions that are
> dependent;
> > > GCC couldn't make the second pass elimination as it should (according
> to my
> > > understanding) , it didn't remove the multiplication.
>
> > the main reason (*) is that gcc's gcse is too stupid to handle
> > multiplication. The reason for this is that multiplication instruction
> > on x86 clobbers flags and we are unable to handle it (see
> > want_to_gcse_p).
>
> >
> > Honza Hubicka has a patch for this on cfg branch (he planned to push it
> > to mainline but probably had too much other work).
>
> We've recently chattend about these patches with Richard.  I would like
> to re/continue pushing them to manline but they need trorough testing as
> the concept is pretty different.
> It is top on my TODO, but recently I don=t even have time to look into
> my TODO..
>
>              Honza
> >
>              > Zdenek
> >
> > (*) There is one more minor issue I've found while examining your
> > problem that I think also might also cause problems. Try the cfg branch
> > first, if it won't help I will have a look at that.
> >
> > > I have also looked at the RTL dump after gcse to confirm this.
> > > Is this is a known problem? Am I doing some things wrong or
> misunderstood
> > > what PRE should do?
> > >
> > > I have compiled with gcc -O3 -S -dG --param max-gcse-passes=3
> > >
> > > The code:
> > > void g (int a , int b , int c);
> > > void f (int x , int y , int z)
> > > {
> > >     int a = 0, b = 0 , c= 0,t1,t2,t3;
> > >     if (z > 0 )
> > >     {
> > >        t1 = x + y;
> > >        a = t1 * z;
> > >     }
> > >     else
> > >     {
> > >         t2 = x + y;
> > >         b = t2 *z;
> > >
> > >     }
> > >
> > >     t3 = x + y;  /* Redundant , should be removed by GCSE first try*/
> > >     c = t3 * z;  /* Redundant , should be removed by GCSE second
try*/
> > >     g(a,b,c);
> > > }
> > > Mustafa
>
> MY RESPONSE:
>
> I have tested the above example on a PowerPC machine, which doesn't have
> this clobber problem, and I still have the same result.
> Moreover, I have additional example with load and addition (without any
> multiplication), which also suffer from the same problem.
> I have continued looking at the GCSE code, and found out that the problem
> is due to the fact that the "reaching pseudo register" that holds the
> redundant add value has different number from the one that holds the
> addition value in the other two additions.
> Thus, the redundant multiplication is not recognized as the same
expression
> as the other two multiplications.
> In order to be sure that this is indeed the problem, I took the early
> coalesce patch (from cgf branch cfg-coalesce.c) and called  coalesce()
> after each iteration of one_pre_gcse_pass. And it did removed the
redundant
> multiplication!
> What happened is that the coalesce() rearranged the pseudo register usage
> and put the addition result in one pseudo. Now the GCSE recognizes the
> three multiplications as the same expression.

yes, this is the additional problem I have spoken about. In fact the
solution is easier -- in pre_insert_copy_insn we replace

x = something

with

x = something
temp = x.

Instead we should do

temp = something
x = temp

and copy propagation would take care of the rest.

Zdenek



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-04-28 23:46   ` Mostafa Hagog
@ 2003-05-05 18:46     ` Zdenek Dvorak
  2003-05-08 16:46       ` Mostafa Hagog
  2003-05-27  5:55       ` Mostafa Hagog
  0 siblings, 2 replies; 7+ messages in thread
From: Zdenek Dvorak @ 2003-05-05 18:46 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: gcc

Hello,

>   You are correct , I have changed the implementation of  "pre_ins
> ert_copy_insn" function and it did worked.

btw if you did not already, could you please try to push this patch to
mainline? I believe we may be missing important share of optimization
opportunities due to this.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-05-05 18:46     ` Zdenek Dvorak
@ 2003-05-08 16:46       ` Mostafa Hagog
  2003-05-27  5:55       ` Mostafa Hagog
  1 sibling, 0 replies; 7+ messages in thread
From: Mostafa Hagog @ 2003-05-08 16:46 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc


Yes,  I have implemented it. I am trying to make it pass the testsuite.

Regards,
  Mostafa Hagog
 Systems and Software  Department
 IBM Research Lab in Haifa
 e-mail: mustafa@il.ibm.com
 Phone: +972 4 829-6518   Fax: +972 4 829-6116



                                                                                                                                             
                      Zdenek Dvorak                                                                                                          
                      <rakdver@atrey.karlin.m        To:       Mostafa Hagog/Haifa/IBM@IBMIL                                                 
                      ff.cuni.cz>                    cc:       gcc@gcc.gnu.org                                                               
                                                     Subject:  Re: GCSE/PRE problem                                                          
                      05/05/2003 20:46                                                                                                       
                                                                                                                                             




Hello,

>   You are correct , I have changed the implementation of  "pre_ins
> ert_copy_insn" function and it did worked.

btw if you did not already, could you please try to push this patch to
mainline? I believe we may be missing important share of optimization
opportunities due to this.

Zdenek



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GCSE/PRE problem
  2003-05-05 18:46     ` Zdenek Dvorak
  2003-05-08 16:46       ` Mostafa Hagog
@ 2003-05-27  5:55       ` Mostafa Hagog
  1 sibling, 0 replies; 7+ messages in thread
From: Mostafa Hagog @ 2003-05-27  5:55 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc

Hi,
      I have noticed that in some spec benchmarks store motion has degraded
the performance.
by enabling/disabling store motion in gcse , I noticed that twolf , gap ,
and gzip performance was degraded.
Have you seen the same results? is there a known problem which causes this?

Regards,
  Mostafa Hagog
 Systems and Software  Department
 IBM Research Lab in Haifa
 e-mail: mustafa@il.ibm.com
 Phone: +972 4 829-6518   Fax: +972 4 829-6116



                                                                                                                                             
                      Zdenek Dvorak                                                                                                          
                      <rakdver@atrey.karlin.m        To:       Mostafa Hagog/Haifa/IBM@IBMIL                                                 
                      ff.cuni.cz>                    cc:       gcc@gcc.gnu.org                                                               
                                                     Subject:  Re: GCSE/PRE problem                                                          
                      05/05/2003 21:46                                                                                                       
                                                                                                                                             




Hello,

>   You are correct , I have changed the implementation of  "pre_ins
> ert_copy_insn" function and it did worked.

btw if you did not already, could you please try to push this patch to
mainline? I believe we may be missing important share of optimization
opportunities due to this.

Zdenek



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-05-27  5:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-28  7:26 GCSE/PRE problem Mostafa Hagog
2003-04-28  9:35 ` Jan Hubicka
2003-04-28 13:44 ` Zdenek Dvorak
2003-04-28 23:46   ` Mostafa Hagog
2003-05-05 18:46     ` Zdenek Dvorak
2003-05-08 16:46       ` Mostafa Hagog
2003-05-27  5:55       ` Mostafa Hagog

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).