public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: my EGCS status (really Fortran patches)
       [not found] <9710211752.AA19479@moene.indiv.nluug.nl>
@ 1997-10-22 21:33 ` Jeffrey A Law
  1997-10-22 22:55   ` Toon Moene
  1997-10-22 22:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
  1997-10-26  9:18 ` Jeffrey A Law
  2 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-22 21:33 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jim Wilson, egcs

Note I've moved this discussion to the egcs list.

  In message <9710211752.AA19479@moene.indiv.nluug.nl>you write:
  > >  The simplify_giv_expr change makes sense.  I haven't
  > >  tried building a PA compiler to see why it sometimes
  > >  gives worse code.
  > 
  > I must confess that I only tested code with perfectly nested loops  
  > (i.e. no code in the outer loop, like above).  I don't know if that  
  > would make a difference.
It might.  I really don't know.

Here's a relatively small example of the code explosion problem I
mentioned for the PA.

With the USE patch we eliminate 4 integer insns in the inner loop, but
we end up with 15 more integer insns in the outer loop.

Worse yet, the inner loop is so FP dominated that removing the
4 integer insns probably doesn't actually make the inner loop go any
faster since they're probably just filling bubbles in the pipeline
anyway.


      PARAMETER ( N = 257 )
      IMPLICIT REAL*8 (A-H,O-Z)
      DIMENSION AA(N,N), RX(N,N),RY(N,N),D(N,N)
      DO    501   J = 2,M
      K = M-J+1
      DO    501   I = I1P,I2M
      RX(I,K) = (RX(I,K)-AA(I,K)*RX(I,K+1))*D(I,K)
      RY(I,K) = (RY(I,K)-AA(I,K)*RY(I,K+1))*D(I,K)
  501 CONTINUE
      END


Without the use patch the loops look like:

L$0005
        sub %r4,%r31,%r21
        copy %r5,%r24
        sub %r6,%r24,%r23
        comib,> 0,%r23,L$0004
        ldo 1(%r21),%r22
        zdep %r21,23,24,%r19
        addl %r19,%r21,%r19
        zdep %r22,23,24,%r20
        zdep %r19,28,29,%r28
        addl %r20,%r22,%r20
        zdep %r24,28,29,%r19
        addil LR'ry___2-$global$,%r27
        zdep %r20,28,29,%r26
        ldo -8(%r19),%r22
        ldo RR'ry___2-$global$(%r1),%r24
L$0009
        addl %r22,%r28,%r20
        addl %r22,%r26,%r19
        flddx %r19(0,%r24),%fr25
        flddx %r20(0,%r3),%fr24
        flddx %r19(0,%r29),%fr22
        fmpy,dbl %fr24,%fr25,%fr25
        addl %r20,%r29,%r19
        fmpy,dbl %fr24,%fr22,%fr24
        addl %r20,%r24,%r21
        fldds 0(0,%r19),%fr22
        fldds 0(0,%r21),%fr23
        fsub,dbl %fr22,%fr24,%fr22
        ldo 8(%r22),%r22
        flddx %r20(0,%r2),%fr24
        fmpysub,dbl %fr22,%fr24,%fr22,%fr25,%fr23
        fmpy,dbl %fr23,%fr24,%fr23
        fstds %fr22,0(0,%r19)
        addib,>= -1,%r23,L$0009
        fstds %fr23,0(0,%r21)
L$0004
        addib,>= -1,%r25,L$0005
        ldo 1(%r31),%r31


With the use patch the loops look like:


L$0005
        copy %r6,%r23
        sub %r7,%r23,%r28
        comib,> 0,%r28,L$0004
        zdep %r31,23,24,%r21
        addl %r21,%r31,%r21
        addil LR'd___3-$global$,%r27
        zdep %r2,23,24,%r22
        copy %r1,%r3
        zdep %r21,28,29,%r21
        ldo -8(%r4),%r19
        addl %r22,%r2,%r22
        addil LR'ry___2-$global$,%r27
        addl %r21,%r19,%r19
        zdep %r23,28,29,%r23
        copy %r1,%r8
        zdep %r22,28,29,%r22
        ldo -8(%r5),%r20
        addl %r23,%r19,%r26
        addl %r22,%r20,%r24
        ldo RR'ry___2-$global$-8(%r8),%r19
        addl %r21,%r20,%r20
        addl %r22,%r19,%r22
        addl %r23,%r20,%r25
        addl %r21,%r19,%r19
        addl %r23,%r19,%r20
        ldo RR'd___3-$global$-8(%r3),%r19
        addl %r23,%r24,%r24
        addl %r23,%r22,%r22
        addl %r21,%r19,%r21
        addl %r23,%r21,%r23
L$0009
        fldds,ma 8(0,%r26),%fr24
        fldds,ma 8(0,%r24),%fr23
        fldds 0(0,%r25),%fr22
        fmpy,dbl %fr24,%fr23,%fr23
        fldds,ma 8(0,%r23),%fr25
        fsub,dbl %fr22,%fr23,%fr22
        fmpy,dbl %fr22,%fr25,%fr22
        fstds,ma %fr22,8(0,%r25)
        fldds,ma 8(0,%r22),%fr23
        fldds 0(0,%r20),%fr22
        fmpy,dbl %fr24,%fr23,%fr24
        fsub,dbl %fr22,%fr24,%fr22
        fmpy,dbl %fr22,%fr25,%fr22
        addib,>= -1,%r28,L$0009
        fstds,ma %fr22,8(0,%r20)
L$0004
        ldo -1(%r31),%r31
        addib,>= -1,%r29,L$0005
        ldo -1(%r2),%r2



Now, I don't actually know if one version executes any faster than
the other -- this is just something I noticed when looking for why
tomcatv ran 10% slower with the USE patch.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
       [not found] <9710211752.AA19479@moene.indiv.nluug.nl>
  1997-10-22 21:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
@ 1997-10-22 22:33 ` Jeffrey A Law
  1997-10-26  9:18 ` Jeffrey A Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-22 22:33 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jim Wilson, egcs

  In message <9710211752.AA19479@moene.indiv.nluug.nl>you write:
  > >  The simplify_giv_expr change makes sense.  I haven't
  > >  tried building a PA compiler to see why it sometimes
  > >  gives worse code.
  > 
  > I must confess that I only tested code with perfectly nested loops  
  > (i.e. no code in the outer loop, like above).  I don't know if that  
  > would make a difference.
For reference, I believe part of the problem is loop doesn't
recognize that some of the more complex expressions that the
new USE patch recognizes as GIVS are related.

Given the example fortran code in my last message, we find the
following in the .loop file:

Reg 101: biv verified
Reg 96: biv verified
Biv 101 initialized at insn 272: initial value (reg:SI 102)
Biv 96 initialized at insn 24: initial value (reg/v:SI 97)
Insn 38: giv reg 105 src reg 96 benefit 2 used 1 lifetime 1 replaceable mult 1 add -1
Insn 42: giv reg 107 src reg 96 benefit 4 used 1 lifetime 13 replaceable mult 8 add -8
Insn 51: giv reg 113 src reg 96 benefit 6 used 1 lifetime 23 replaceable mult 8 add (plus:SI (reg:SI 112)
    (const_int -8))
Insn 52: giv reg 114 src reg 96 benefit 8 used 1 lifetime 19 replaceable mult 8 add (plus:SI (reg:SI 103)
    (plus:SI (reg:SI 112)
        (const_int -8)))
Insn 86: giv reg 138 src reg 96 benefit 8 used 1 lifetime 6 replaceable mult 8 add (plus:SI (reg:SI 127)
    (plus:SI (reg:SI 112)
        (const_int -8)))
Insn 101: giv reg 148 src reg 96 benefit 6 used 1 lifetime 16 replaceable mult 8 add (plus:SI (reg:SI 147)
    (const_int -8))
Insn 102: giv reg 149 src reg 96 benefit 8 used 1 lifetime 2 replaceable mult 8 add (plus:SI (reg:SI 103)
    (plus:SI (reg:SI 147)
        (const_int -8)))
Insn 104: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 127))
Insn 106: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 147)
        (const_int -8))
    (reg:SI 103))
Insn 109: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 103))
Insn 127: giv reg 166 src reg 96 benefit 8 used 1 lifetime 1 replaceable mult 8 add (plus:SI (reg:SI 155)
    (plus:SI (reg:SI 112)
        (const_int -8)))
Insn 129: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 155))
Insn 132: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 103))
Insn 150: giv reg 180 src reg 96 benefit 8 used 1 lifetime 7 replaceable mult 8 add (plus:SI (reg:SI 169)
    (plus:SI (reg:SI 112)
        (const_int -8)))
Insn 200: giv reg 215 src reg 96 benefit 8 used 1 lifetime 1 replaceable mult 8 add (plus:SI (reg:SI 169)
    (plus:SI (reg:SI 147)
        (const_int -8)))
Insn 204: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 147)
        (const_int -8))
    (reg:SI 169))
Insn 207: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 169))
Insn 230: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 169))
Loop unrolling: Initial value not constant.
Cannot eliminate biv 101.
First use: insn 26, last use: insn 31.
biv 96 can be eliminated.
giv at 207 combined with giv at 230
giv at 109 combined with giv at 132
giv of insn 38 not worth while, 0 vs 42.
giv at 230 reduced to (reg:SI 246)
giv at 207 reduced to (reg:SI 246)
giv at 204 reduced to (reg:SI 251)
giv at 200 reduced to (reg:SI 256)
giv at 150 reduced to (reg:SI 261)
giv at 132 reduced to (reg:SI 266)
giv at 129 reduced to (reg:SI 271)
giv at 127 reduced to (reg:SI 276)
giv at 109 reduced to (reg:SI 266)
giv at 106 reduced to (reg:SI 281)
giv at 104 reduced to (reg:SI 286)
giv at 102 reduced to (reg:SI 291)
giv at 101 reduced to (reg:SI 296)
giv at 86 reduced to (reg:SI 300)
giv at 52 reduced to (reg:SI 305)
giv at 51 reduced to (reg:SI 310)
giv at 42 reduced to (reg:SI 314)


So, we see that all the givs share the same biv & mult_val.

May share partial expressions in add_val, but the add_val as a whole
generally isn't the same for each giv -- for example:


Insn 230: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 112)
        (const_int -8))
    (reg:SI 169))

Insn 204: dest address src reg 96 benefit 9 used 1 lifetime 1 replaceable mult 8 add (plus:SI (plus:SI (reg:SI 147)
        (const_int -8))
    (reg:SI 169))



jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-22 21:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
@ 1997-10-22 22:55   ` Toon Moene
  1997-10-22 23:26     ` Jeffrey A Law
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Toon Moene @ 1997-10-22 22:55 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeff,

L$0009
        fldds,ma 8(0,%r26),%fr24
        fldds,ma 8(0,%r24),%fr23
        fldds 0(0,%r25),%fr22
        fmpy,dbl %fr24,%fr23,%fr23
        fldds,ma 8(0,%r23),%fr25
        fsub,dbl %fr22,%fr23,%fr22
        fmpy,dbl %fr22,%fr25,%fr22
        fstds,ma %fr22,8(0,%r25)
        fldds,ma 8(0,%r22),%fr23
        fldds 0(0,%r20),%fr22
        fmpy,dbl %fr24,%fr23,%fr24
        fsub,dbl %fr22,%fr24,%fr22
        fmpy,dbl %fr22,%fr25,%fr22
        addib,>= -1,%r28,L$0009
        fstds,ma %fr22,8(0,%r20)


>  Now, I don't actually know if one version executes any
>  faster than the other -- this is just something I noticed
>  when looking for why tomcatv ran 10% slower with the USE
>  patch.

[ I once saw HP-PA assembler before, about a year ago, because  
someone complained to the g77-bug list that he couldn't get Fortran  
code with assigned goto's to assemble - turned out the HP assembler  
can't cope with forward labels in instructions other than jumps.  So  
take this with a grain of salt ]

I gather, from looking at this code that instructions ending in .ma  
do (implicit) post-increment addressing ?  That would explain a  
lot: The instructions normally associated with updating address  
registers can be interspersed between the floating point ops, which  
is a win on a CPU that has separate integer and floating point  
units.  You lose this advantage when these integer instructions  
aren't explicit.  On the m68k post-increment addressing is a real  
win, because you're actually saving instructions.  Seems that HP  
wanted the best of both worlds ... and lost (wonder how they solve  
this themselves ...)

I don't see a simple way out of this.

HTH,
Toon.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-22 22:55   ` Toon Moene
@ 1997-10-22 23:26     ` Jeffrey A Law
  1997-10-23  0:47     ` Jeffrey A Law
  1997-10-24 22:37     ` -frerun-loop Jeffrey A Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-22 23:26 UTC (permalink / raw)
  To: Toon Moene; +Cc: egcs

  In message < 9710230552.AA21715@moene.indiv.nluug.nl >you write:
  > [ I once saw HP-PA assembler before, about a year ago, because  
  > someone complained to the g77-bug list that he couldn't get Fortran  
  > code with assigned goto's to assemble - turned out the HP assembler  
  > can't cope with forward labels in instructions other than jumps.  So  
  > take this with a grain of salt ]
Almost -- it can't deal with temporary labels in non-jump instructions :-)

  > I gather, from looking at this code that instructions ending in .ma  
  > do (implicit) post-increment addressing ?
",ma" is a postmodify (can be an increment or decrement)
",mb" is a premodify (can be an increment or decrement)

  > lot: The instructions normally associated with updating address  
  > registers can be interspersed between the floating point ops, which  
  > is a win on a CPU that has separate integer and floating point  
  > units.
Exactly.  However, on this model PA the inner loop should run no
slower if we use autoincrement insns to update the pointers instead
of explcit address computation instructions.  This is true for.
PAs except PA8000 based systems.

And, having thought of this already, I've already tested it on
tomcatv (which is where that sample code came from).

autoinc !use	  autoinc use	!autoinc !use	  !autoinc use

  11.5		     12.5	     11.5	      12.5

ie, use of autoinc makes no difference for this code, which isn't
a suprise.

  > aren't explicit.  On the m68k post-increment addressing is a real  
  > win, because you're actually saving instructions.
It's really a win most of the time on HPs too -- but this code is
so FP intensive that the explicit insns to increment the pointers
are completely hidden in pipeline bubbles waiting on memory and
the FP unit.

  > I don't see a simple way out of this.
We haven't necessarily hit the root of the problem yet, so this
conclusion is premature.

The more I think about it the more I bet the poor giv combination
code is the culprit.

Take the example I gave -- we've added 14 insns in the outer loop.
At best they will execute in 7 cycles on this machine.  Furthermore,
let's assume the inner loops gains are somewhere between minimal
and none because they're dominated by FP/memory latency -- which
means we've burned 7 cycles in the outer loop for almost no gain
in the inner loop.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (really Fortran patches)
  1997-10-22 22:55   ` Toon Moene
  1997-10-22 23:26     ` Jeffrey A Law
@ 1997-10-23  0:47     ` Jeffrey A Law
  1997-10-23 10:45       ` Toon Moene
  1997-10-24 22:37     ` -frerun-loop Jeffrey A Law
  2 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-23  0:47 UTC (permalink / raw)
  To: Toon Moene; +Cc: egcs

This gets more interesting by the minute.

Like many modern cpus, the PA has a counter that increments once per
machine cycle -- and the counter is visible to the user :-)

So, I got off my lazy butt and started timing stuff -- basically noting
the time for any particular loop takes for one of the 100 iterations
in tomcatv.

I ran across a pretty simple loop, which ran slower with the USE patch,
so I started fooling around.

Fixing the giv combination problem by hand made things a little
better, but relative to the total slowdown, it was in the noise --
even after pulling another invariant out of the outer loop
(it's a double nested loop).

The slowdown turned out to be a bad schedule.  After hand fixing
the schedule the version with the USE patch ran faster than the
one without the USE patch.

The loop is dominated by memory and FP latency (no suprise with
tomcatv).  With the USE patch the scheduler did not move some of
the memory instructions as aggressively as it used to.

I'm wondering if the givs created by the USE patch are confusing
the alias code into thinking some particular memory addresses
conflict, when in fact they don't.  Before you ask, no nothing
needed to move past an autoincrement memory reference :-)

I'll look at this further as soon as I can get another block of
free time.

I don't know if bad schedules will account for all the slowdown,
but one can always hope!

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (really Fortran patches)
  1997-10-23  0:47     ` Jeffrey A Law
@ 1997-10-23 10:45       ` Toon Moene
  1997-10-23 11:04         ` Jeffrey A Law
  0 siblings, 1 reply; 15+ messages in thread
From: Toon Moene @ 1997-10-23 10:45 UTC (permalink / raw)
  To: law; +Cc: egcs

> This gets more interesting by the minute.

[ ... ]

>  The slowdown turned out to be a bad schedule.  After hand
>  fixing the schedule the version with the USE patch ran
>  faster than the one without the USE patch.

Well, there is another difference between the loops you showed.   
The one produced without your `use' patch has this instruction:

        fmpysub,dbl %fr22,%fr24,%fr22,%fr25,%fr23

whereas with your `use' patch, the following sequence is produced:

        fmpy,dbl %fr24,%fr23,%fr24
        fsub,dbl %fr22,%fr24,%fr22

Now, assuming that the fmpysub instruction really buys you anything  
over the above sequence, that could be another cause of the  
slowdown.

I have no clue why the fmpysub instruction wasn't generated in the  
second case.

BTW, does HP really palm off this PA as a *R*ISC architecture ?   
With a five operand instruction ?  Are you sure there isn't a  
`movc5' hiding somewhere, or `index', so that we don't have to do  
strength reduction at all (not to mention `editpc' to help the COBOL  
programs in SPEC95 :-)

Cheers,
Toon.

A RISC architecture is basically a 6600 with 64- instead of 60-bit  
FP registers, a unified integer/address register file and some more  
addressing bits; in short, an Alpha.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (really Fortran patches)
  1997-10-23 10:45       ` Toon Moene
@ 1997-10-23 11:04         ` Jeffrey A Law
  1997-10-23 12:18           ` Joe Buck
  0 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-23 11:04 UTC (permalink / raw)
  To: Toon Moene; +Cc: egcs

  In message < 9710231737.AA22320@moene.indiv.nluug.nl >you write:

  > Well, there is another difference between the loops you showed.   
  > The one produced without your `use' patch has this instruction:
  > 
  >         fmpysub,dbl %fr22,%fr24,%fr22,%fr25,%fr23
Yup.  I'd have to look closely at the code, this is probably
a one cycle difference on a relatively modern PA.

The USE variant didn't use fmpysub because it couldn't find an independent 
fmpy and fsub to issue together -- fmpyadd/fmpysub was a poor man's
way to increase FP performance back in 1991.  It's still useful on
most PAs, except the PA8000 based machines.

Why didn't it find independent ones in USE patch version?  Because
the scheduler wasn't able to reorder move instructions in such a
way as to force more registers to be used (and thus make it more
likely that later passes can find independent operations to combine
into an fmpyadd or fmpysub pattern).

I took a peek at another loop this morning, and it's got the same
fundamental problem -- the scheduler isn't able to move the loads
around enough.  After hand scheduling that one loop pair, half of the
overall tomcatv slowdown disappears.

The basic problem is it appears that the alias code gets confused
when a particular register is several sets removed from the original
base reg.

We exposed a similar problem a couple years ago with the static
combination code -- the trick is to recursively continue to
look for the base register instead of a one or two level search.

x = symbol_ref

y = x + index

z = y + index

etc.

To get the basereg for z, you need to recurse back to the symbol_ref
instead of stopping at y.

We may also be losing REGNO_POINTER_FLAG for some of the pseudos
created by loop -- which would have similar effects -- I'll have
to look at this further too.


  > BTW, does HP really palm off this PA as a *R*ISC architecture ?   
Yup.

  > With a five operand instruction ?
Yup.  Plus we have more general auto_inc_dec addressing than the
m68k, base + scaled index addressing, base + [scaled] index with
base register modification , etc etc.

  > `movc5' hiding somewhere, or `index',
It doesn't have movc5, but it can be easily synthesized from a
2 instruction sequence.

The PA has some ciscy characteristics, but it's still got many
riscy characteristics (load/store architecture, fixed instruction
length, instruction execute in a single cycle, etc).

Often I look at it as a risc box with ciscy address modes for
load/store operations.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: (really Fortran patches)
  1997-10-23 11:04         ` Jeffrey A Law
@ 1997-10-23 12:18           ` Joe Buck
  0 siblings, 0 replies; 15+ messages in thread
From: Joe Buck @ 1997-10-23 12:18 UTC (permalink / raw)
  To: law; +Cc: toon, egcs

> The USE variant didn't use fmpysub because it couldn't find an independent 
> fmpy and fsub to issue together -- fmpyadd/fmpysub was a poor man's
> way to increase FP performance back in 1991.  It's still useful on
> most PAs, except the PA8000 based machines.

Well, in digital logic you can convert a multiplier into a multiply-add or
multiply-subtract unit almost for free (almost no timing or area penalty),
and they are common operations, so it does make sense even in a RISC view
of the world.  It's common for very small DSP cores to have such
instruction for this reason.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: -frerun-loop
  1997-10-22 22:55   ` Toon Moene
  1997-10-22 23:26     ` Jeffrey A Law
  1997-10-23  0:47     ` Jeffrey A Law
@ 1997-10-24 22:37     ` Jeffrey A Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-24 22:37 UTC (permalink / raw)
  To: Toon Moene; +Cc: egcs

It should be noted that rerunning the loop optimizer seems to
interact nicely with the simplify_giv_expr patch.

The simplify_giv_expr patch can potentially create many insns
one loop level out of a loop with complex GIVs.

Re-running loop allows many of those new insns to be hoisted
out of another loop nest.

I don't know how much gain this is in practice, but it may
be worth playing with.

jeff


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
       [not found] <9710211752.AA19479@moene.indiv.nluug.nl>
  1997-10-22 21:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
  1997-10-22 22:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
@ 1997-10-26  9:18 ` Jeffrey A Law
  1997-10-27 16:30   ` Jim Wilson
  2 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-26  9:18 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jim Wilson, egcs

  In message <9710211752.AA19479@moene.indiv.nluug.nl>you write:
  > >  The only obvious problem I can see in them is that the
  > >  fold patch does not preserve SAVE_EXPRs.  There should
  > >  be a little bit of code that does
  > >  	      if (have_save_expr)
  > >  		t = save_expr (t);
  > >  We should not install this patch as is.  Otherwise, it
  > >  does seem that it should always give better code.
  > 
  > Oops, does this show that I don't really understand this SAVE_EXPR  
  > business :-(  Interesting that I did a full build of my Fortran  
  > software and a complete 24 hour run without any trouble ;-)
Hmmm, Jim -- are you referring to the multiple_of_p patch that
went in long ago?  Or something else that I can't seem to find :-)

If it's the multiple_of_p patch, then I think we're OK -- we never
actually do anything with the SAVE_EXPR other than peek inside at
the contents -- we don't pass it off fold or return any of the inner
trees to the caller.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-27 16:30   ` Jim Wilson
@ 1997-10-27 15:54     ` Jeffrey A Law
  1997-10-27 17:21       ` Jim Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-27 15:54 UTC (permalink / raw)
  To: Jim Wilson; +Cc: Toon Moene, egcs

  In message < 199710272237.OAA21142@cygnus.com >you write:
  > I was looking at this patch.
  > 
  > I see that you have installed the first hunk,
Yea.  I had that one lying around.  I guess Toon sent it separately
once.


  > but the second hunk is still
  > missing.  The second hunk is accidentally losing a SAVE_EXPR which needs
  > to be fixed before we could install it.
Ah.  Now your comments make a little more sense :-)

So basically we want to do something like this in both of the
if clauses?

  t = fold (build ....);
  if (have_save_expr)
    t = save_expr (t)
  return t



Just want to be sure since I'm not familiar with the SAVE_EXPR issues.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-26  9:18 ` Jeffrey A Law
@ 1997-10-27 16:30   ` Jim Wilson
  1997-10-27 15:54     ` Jeffrey A Law
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Wilson @ 1997-10-27 16:30 UTC (permalink / raw)
  To: law; +Cc: Toon Moene, egcs

I was looking at this patch.

I see that you have installed the first hunk, but the second hunk is still
missing.  The second hunk is accidentally losing a SAVE_EXPR which needs
to be fixed before we could install it.

*** egcs-970929/gcc/fold-const.c.orig   Fri Oct  3 10:07:37 1997
--- egcs-970929/gcc/fold-const.c        Sat Oct  4 13:18:29 1997
*************** fold (expr)
*** 4611,4619 ****
         operation, EXACT_DIV_EXPR.

!        Note that only CEIL_DIV_EXPR is rewritten now, only because the
!        others seem to be faster in some cases.  This is probably just
!        due to more work being done to optimize others in expmed.c  
than on
!        EXACT_DIV_EXPR.  */
!       if (code == CEIL_DIV_EXPR
          && multiple_of_p (type, arg0, arg1))
        return fold (build (EXACT_DIV_EXPR, type, arg0, arg1));
--- 4611,4619 ----
         operation, EXACT_DIV_EXPR.

!        Note that only CEIL_DIV_EXPR and FLOOR_DIV_EXPR are  
rewritten now,
!          only because the others seem to be faster in some cases.
!          This is probably just due to more work being done to optimize
!          others in expmed.c than on EXACT_DIV_EXPR.  */
!       if ((code == CEIL_DIV_EXPR || code == FLOOR_DIV_EXPR)
          && multiple_of_p (type, arg0, arg1))
        return fold (build (EXACT_DIV_EXPR, type, arg0, arg1));
*************** fold (expr)
*** 4657,4660 ****
--- 4657,4680 ----
          STRIP_NOPS (xarg0);

+           if (TREE_CODE (xarg0) == MULT_EXPR
+               && multiple_of_p (type, TREE_OPERAND (xarg0, 0), arg1))
+             {
+               return fold (build (MULT_EXPR, type,
+                                   fold (build (EXACT_DIV_EXPR, type,
+                                                TREE_OPERAND (xarg0, 0),
+                                                arg1)),
+                                   TREE_OPERAND (xarg0, 1)));
+             }
+
+           if (TREE_CODE (xarg0) == MULT_EXPR
+               && multiple_of_p (Ttype, REE_OPERAND (xarg0, 1), arg1))
+             {
+               return fold (build (MULT_EXPR, type,
+                                   fold (build (EXACT_DIV_EXPR, type,
+                                                TREE_OPERAND (xarg0, 1),
+                                                arg1)),
+                                   TREE_OPERAND (xarg0, 0)));
+             }
+
          if (TREE_CODE (xarg0) == PLUS_EXPR
              && TREE_CODE (TREE_OPERAND (xarg0, 1)) == INTEGER_CST)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-27 15:54     ` Jeffrey A Law
@ 1997-10-27 17:21       ` Jim Wilson
  1997-10-28 12:35         ` Jeffrey A Law
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Wilson @ 1997-10-27 17:21 UTC (permalink / raw)
  To: law; +Cc: Toon Moene, egcs

	So basically we want to do something like this in both of the
	if clauses?

	  t = fold (build ....);
	  if (have_save_expr)
	    t = save_expr (t)
	  return t

yes.

Jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-27 17:21       ` Jim Wilson
@ 1997-10-28 12:35         ` Jeffrey A Law
  1997-10-28 12:35           ` Toon Moene
  0 siblings, 1 reply; 15+ messages in thread
From: Jeffrey A Law @ 1997-10-28 12:35 UTC (permalink / raw)
  To: Jim Wilson; +Cc: Toon Moene, egcs

  In message < 199710280121.RAA27973@cygnus.com >you write:
  > 	So basically we want to do something like this in both of the
  > 	if clauses?
  > 
  > 	  t = fold (build ....);
  > 	  if (have_save_expr)
  > 	    t = save_expr (t)
  > 	  return t
  > 
  > yes.
OK.  I've made the appropriate updates and checked in the
patch.

So that covers this round of Fortran improvements right? :-)

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: my EGCS status (really Fortran patches)
  1997-10-28 12:35         ` Jeffrey A Law
@ 1997-10-28 12:35           ` Toon Moene
  0 siblings, 0 replies; 15+ messages in thread
From: Toon Moene @ 1997-10-28 12:35 UTC (permalink / raw)
  To: law; +Cc: egcs

> So that covers this round of Fortran improvements right? :-)

Yep.  As soon as the release is out, I'll present the next batch of  
possible optimisations.  Note that I'm talking about  
*opportunities* here; if I'd done the code already, I'd hand it over  
immediately, of course.

:-) :-)

Cheers,
Toon.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~1997-10-28 12:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9710211752.AA19479@moene.indiv.nluug.nl>
1997-10-22 21:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
1997-10-22 22:55   ` Toon Moene
1997-10-22 23:26     ` Jeffrey A Law
1997-10-23  0:47     ` Jeffrey A Law
1997-10-23 10:45       ` Toon Moene
1997-10-23 11:04         ` Jeffrey A Law
1997-10-23 12:18           ` Joe Buck
1997-10-24 22:37     ` -frerun-loop Jeffrey A Law
1997-10-22 22:33 ` my EGCS status (really Fortran patches) Jeffrey A Law
1997-10-26  9:18 ` Jeffrey A Law
1997-10-27 16:30   ` Jim Wilson
1997-10-27 15:54     ` Jeffrey A Law
1997-10-27 17:21       ` Jim Wilson
1997-10-28 12:35         ` Jeffrey A Law
1997-10-28 12:35           ` Toon Moene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).