Re: Transformations to increase parallelism (peepholes?)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: Transformations to increase parallelism (peepholes?)
@ 2003-07-23 16:03 Jan Hoogerbrugge
  2003-07-24 21:12 ` tm_gccmail
  2003-08-01 20:46 ` tm_gccmail
  0 siblings, 2 replies; 3+ messages in thread
From: Jan Hoogerbrugge @ 2003-07-23 16:03 UTC (permalink / raw)
  To: tm_gccmail; +Cc: gcc

> > Hi,
> >
> > Are there optimizations in gcc that increase instruction level 
>parallelism?
> > For example:
> >
> > void foo(int *p, int a)
> > {
> >         a += 2;
> >         p[0] = a;
> >         a += 2;
> >         p[1] = a;
> > }
> >
> > compiles to
> >
> >         add a, 2 -> tmp1
> >         store p[0], tmp1
> >         add tmp1, 2 -> tmp2
> >         store p[1], tmp2
> >
> > However, a more parallel translation would be:
> >
> >         add a, 2 -> tmp1
> >         store p[0], tmp1
> >         add a, 4 -> tmp2
> >         store p[1], tmp2
> >
> > In this case the two adds and the two stores can be executed in 
>parallel.
> >
> > Jan
>
>No. I've mentioned similar problems before on this list, though.
>
>GCC really needs a pass to preprocess the instructions before sched1 to
>give the scheduler more scheduling freedom, especially on many-issue
>processors.


Would it be possible to do (some of) these transformations by means of 
peepholes in the .md file? If so, could somebody tell me how a peephole 
should look like for

   reg1 = reg2 + const1
   reg3 = reg1 + const2

to

   reg1 = reg2 + const1
   reg3 = reg2 + (const1 + const2)

where const1 + const2 has to be within certain bounds. It tried to write a 
peephole for this but without success. Who helps?

Cheers,
Jan

_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*  
http://join.msn.com/?page=features/junkmail

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Transformations to increase parallelism (peepholes?)
  2003-07-23 16:03 Transformations to increase parallelism (peepholes?) Jan Hoogerbrugge
@ 2003-07-24 21:12 ` tm_gccmail
  2003-08-01 20:46 ` tm_gccmail
  1 sibling, 0 replies; 3+ messages in thread
From: tm_gccmail @ 2003-07-24 21:12 UTC (permalink / raw)
  To: Jan Hoogerbrugge; +Cc: gcc

On Wed, 23 Jul 2003, Jan Hoogerbrugge wrote:
> Would it be possible to do (some of) these transformations by means of 
> peepholes in the .md file? If so, could somebody tell me how a peephole 
> should look like for
> 
>    reg1 = reg2 + const1
>    reg3 = reg1 + const2
> 
> to
> 
>    reg1 = reg2 + const1
>    reg3 = reg2 + (const1 + const2)
> 
> where const1 + const2 has to be within certain bounds. It tried to write a 
> peephole for this but without success. Who helps?
> 
> Cheers,
> Jan

You can't treat GCC like a black box and expect this to work.

You need to use GDB and debug the peephole optimizer and figure out why
your pattern isn't matching. You can use the "call" command and use it
with debug_rtx() to debug the peephole optimizer.

You probably don't want to do this with the peephole optimizer, since it
runs after both instruction scheduling passes. You probably want to use
peephole2 instead.

 Toshi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Transformations to increase parallelism (peepholes?)
  2003-07-23 16:03 Transformations to increase parallelism (peepholes?) Jan Hoogerbrugge
  2003-07-24 21:12 ` tm_gccmail
@ 2003-08-01 20:46 ` tm_gccmail
  1 sibling, 0 replies; 3+ messages in thread
From: tm_gccmail @ 2003-08-01 20:46 UTC (permalink / raw)
  To: Jan Hoogerbrugge; +Cc: gcc

On Wed, 23 Jul 2003, Jan Hoogerbrugge wrote:

> >GCC really needs a pass to preprocess the instructions before sched1 to
> >give the scheduler more scheduling freedom, especially on many-issue
> >processors.
> 
> 
> Would it be possible to do (some of) these transformations by means of 
> peepholes in the .md file? If so, could somebody tell me how a peephole 
> should look like for
> 
>    reg1 = reg2 + const1
>    reg3 = reg1 + const2
> 
> to
> 
>    reg1 = reg2 + const1
>    reg3 = reg2 + (const1 + const2)
> 
> where const1 + const2 has to be within certain bounds. It tried to write a 
> peephole for this but without success. Who helps?
> 
> Cheers,
> Jan

I was thinking about both this and my idea for a prescheduler to remove
autodec/autoinc, and realized it's not a similar transformation; it's the
same transformation. They are both transformations which reduce the
height of a dependency tree by increasing the width to improve ILP
opportunities.

Are there good papers on this subject?

Toshi


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-08-01 20:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-23 16:03 Transformations to increase parallelism (peepholes?) Jan Hoogerbrugge
2003-07-24 21:12 ` tm_gccmail
2003-08-01 20:46 ` tm_gccmail

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).