Re: rs6000 fused multiply-add patch [+ patchlet]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-30  4:58 Richard Kenner
  2002-12-30 20:17 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Kenner @ 2002-12-30  4:58 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Isn't this part of the patch preventing such loops?

Yes, that's *part* of the code, but the other part is what you've taken
out: not combining two insns into two!

The *purpose* of combine is to reduce the number of insns.  Why would it
be helpful to replace two insns by two?  You still have the same number
of insns.  What's better about it?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30  4:58 rs6000 fused multiply-add patch [+ patchlet] Richard Kenner
@ 2002-12-30 20:17 ` Segher Boessenkool
  2002-12-30 20:51   ` David Edelsohn
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2002-12-30 20:17 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Isn't this part of the patch preventing such loops?
> 
> Yes, that's *part* of the code, but the other part is what you've taken
> out: not combining two insns into two!
> 
> The *purpose* of combine is to reduce the number of insns.  Why would it
> be helpful to replace two insns by two?  You still have the same number
> of insns.  What's better about it?

Without it, lots of simplifications don't ever get applied.  This results
in worse code.  For example, with the patch applied, bootstrap time goes
down by a few percent (powerpc-unknown-linux-gnu), as well as code size.

One common example is, without the patch, computations involving bitfields
use mfcr insns; with it, they use logic instructions.

Another example: with the patch, loads + mask are done as a smaller width
load if appropriate, which sometimes enables further optimizations.

There are many more cases then just thes two examples, though.  I'll try
and find some on i686 tomorrow.

Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30 20:17 ` Segher Boessenkool
@ 2002-12-30 20:51   ` David Edelsohn
  2003-01-02  0:52     ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: David Edelsohn @ 2002-12-30 20:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

	The combine pass is called "combine" because it is predicated on
combining instructions and uses decreased instructions as the goal.  If
more optimal instruction sequences should be used, that needs to be
optimized by a different pass.  Your bitfield+mfcr -> logic instruction
example might be appropriate as define_peephole2 patterns.  Your load+mask
-> narrower load already should be handled correctly by other
optimizations or the combiner because it seem like it should decrease the
number of instructions.  If the code size is decreasing then the number of
instructions is decreasing, so maybe we need to add patterns transforming
three instructions into two instructions.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30 20:51   ` David Edelsohn
@ 2003-01-02  0:52     ` Segher Boessenkool
  2003-01-02  1:44       ` Geoff Keating
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-02  0:52 UTC (permalink / raw)
  To: David Edelsohn, Richard Kenner; +Cc: gcc-patches

David Edelsohn wrote:
> 
>         The combine pass is called "combine" because it is predicated on
> combining instructions and uses decreased instructions as the goal.

Yes, I understand that.

> If more optimal instruction sequences should be used, that needs to be
> optimized by a different pass.  Your bitfield+mfcr -> logic instruction
> example might be appropriate as define_peephole2 patterns.  Your load+mask
> -> narrower load already should be handled correctly by other
> optimizations or the combiner because it seem like it should decrease the
> number of instructions.

Well, all these optimizations are already there, in simplify-rtx.c etc., it's
just that they never get done by the current combine, because it doesn't
decrease the number of rtl insns _if looking through a very small (3 insn)
window_.

> If the code size is decreasing then the number of
> instructions is decreasing, so maybe we need to add patterns transforming
> three instructions into two instructions.

That's a lot of different patterns; also, it's not only 3->2, but also 4->3
and I saw an 8->7, even.

Richard Kenner wrote:
> 
>     Without it, lots of simplifications don't ever get applied.  This results
>     in worse code.  For example, with the patch applied, bootstrap time goes
>     down by a few percent (powerpc-unknown-linux-gnu), as well as code size.
> 
>     One common example is, without the patch, computations involving bitfields
>     use mfcr insns; with it, they use logic instructions.
> 
> But that's not what combine is supposed to do!  The purpose of combine
> is what it's name says, to *combine* insns.

Yes, but sometimes doing a 2->2 simplification will allow it to do a 2->1
simplification, or two 2->2 simplifications will allow a 3->2, or maybe
some even longer chain.

If combine is supposed to apply (recursively) all possible simplifications,
it needs to do all canonicalizations that are in simplify-rtx and friends,
or it will fail to do some simplifications because the simplification
patterns assume their "sub patterns" (child nodes? nomenclature fails me)
are already simplified.

> If there is a simpler way to do an insn, it should be in the MD file.

Most of these are not machine dependent.  The mcrf thing just stuck out
because I looked at GCC itself as an example of "normal" big code, which
of course it is not (much too many bitfields).

Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  0:52     ` Segher Boessenkool
@ 2003-01-02  1:44       ` Geoff Keating
  2003-01-04  1:59         ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Geoff Keating @ 2003-01-02  1:44 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Segher Boessenkool <segher@koffie.nl> writes:

> David Edelsohn wrote:
> > 
> >         The combine pass is called "combine" because it is predicated on
> > combining instructions and uses decreased instructions as the goal.
> 
> Yes, I understand that.
> 
> > If more optimal instruction sequences should be used, that needs to be
> > optimized by a different pass.  Your bitfield+mfcr -> logic instruction
> > example might be appropriate as define_peephole2 patterns.  Your load+mask
> > -> narrower load already should be handled correctly by other
> > optimizations or the combiner because it seem like it should decrease the
> > number of instructions.
> 
> Well, all these optimizations are already there, in simplify-rtx.c etc., it's
> just that they never get done by the current combine, because it doesn't
> decrease the number of rtl insns _if looking through a very small (3 insn)
> window_.

What you usually do in this case is create a combination insn+splitter
to give combine an appropriate intermediate result.

If you could provide testcases/code samples for the cases you mention,
it would be easier to see precisely what's going on.  Often several
iterations and many eyes are necessary to get the proper fix.

It is also possible that some rearchitecting of combine might be
necessary, but that's something that would require careful
consideration and an examination of all the possible alternatives,
not an ad-hoc patch to solve a specific problem.

> > If the code size is decreasing then the number of
> > instructions is decreasing, so maybe we need to add patterns transforming
> > three instructions into two instructions.
> 
> That's a lot of different patterns; also, it's not only 3->2, but also 4->3
> and I saw an 8->7, even.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  1:44       ` Geoff Keating
@ 2003-01-04  1:59         ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-04  1:59 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc-patches

Geoff Keating wrote:
> 
> What you usually do in this case is create a combination insn+splitter
> to give combine an appropriate intermediate result.
> 
> If you could provide testcases/code samples for the cases you mention,
> it would be easier to see precisely what's going on.  Often several
> iterations and many eyes are necessary to get the proper fix.

Here are some of the cases that happen most on PowerPC:

-- comparing an SI to a "big" constant.
	(set (reg2) (big_constant))
	(set:CC (compare:CC (reg1) (reg2)))
		becomes
	(set (reg1) (xor (reg1) (hi_constant)))
	(set:CC (compare:CC (reg1) (lo_constant)))

	There is a define_split to handle this; it splits two rtl insns
	into two different rtl insns.  In actual machine insns this is
	an improvement, though (either less insns, or less register
	pressure if lo == 0).  So you're saying there should be a define_insn
	for this too, so combine would think this is just one insn?

-- sign extend of xor (twice) => xor (twice) of sign extend.

	Swapping an xor with a sign extend doesn't help per se, but it
	does in cases like this.  combine can't help it: it would need
	to look at 4 insns.

> It is also possible that some rearchitecting of combine might be
> necessary, but that's something that would require careful
> consideration and an examination of all the possible alternatives,
> not an ad-hoc patch to solve a specific problem.

Small progress is progress too.  Of course there's more to be gained
by writing a new'n'improved combine pass, or a new separate pass, but
that's not a reason to not accept other improvements.  All IMHO of course,
and assuming it is actually an improvement :)


Segher


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-10  2:15 Richard Kenner
@ 2003-01-12  1:38 ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-12  1:38 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Reducing the amount of rtl isn't the same as improving performance.
> 
> No, but unless the MD file has a serious problem, it is *one way* of
> improving performance.

I don't dispute that.

> There's no question that we don't current have a pass which is good at
> finding the simplest form of the RTL.  The idea is to move as much of
> combine as possible into simplify-rtx.c.

It's on my todo list (it's in the beginner projects list).

> Then combine is very small and other passes will do what you are
> trying to do.

Part of it, maybe.  But certainly not all, unless we run simplify
and combine over and over again until nothing changes anymore.


Segher


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-10  2:15 Richard Kenner
  2003-01-12  1:38 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Kenner @ 2003-01-10  2:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Reducing the amount of rtl isn't the same as improving performance.

No, but unless the MD file has a serious problem, it is *one way* of
improving performance.

    The only difference in this regard is that the original combine
    is easier to prove to be terminating.

And that it sticks to the semantics, which is *combining* insns.

There's no question that we don't current have a pass which is good at
finding the simplest form of the RTL.  The idea is to move as much of
combine as possible into simplify-rtx.c.  Then combine is very small and
other passes will do what you are trying to do.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06 23:21     ` Geoff Keating
@ 2003-01-09 22:41       ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-09 22:41 UTC (permalink / raw)
  To: Geoff Keating; +Cc: David Edelsohn, Richard Kenner, gcc-patches

Geoff Keating wrote:
> 
> It could also cause combine to perform a de-optimisation.

Example?

> Now, its
> default behaivour is to leave the original code alone, which at least
> allows the user to try to obtain the best code sequence.

Try, yes.  But the user code really has almost no influence at this
level.

> The difficulty I have with this patch is the justification for it.

I agree with you here.  This is somewhat scary patch, and it needs good
testing.  But I can't do much more than I already did.

> It's not claimed that the change in the RTL itself improves
> performance,

No, it is claimed.

> but that because of other limitations in combine, the
> change allows other changes that improve performance;

That, too.  With a hypothetical infinite-window combine, the patch wouldn't
be useful, of course.

> and this example
> is used as justification for allowing a whole class of changes only
> one of which is the one that has been justified.

No, my justification was "if a tranformation makes the code worse, the
transformation shouldn't be there in the first place"; also, "the
documentation implies this is the intended behaviour"; the examples I
gave I just did because I was asked to give such examples.

> No analysis has been made of the impact of this patch on compile
> speed,

I did.  See my mail from dec 31.  It helped bootstrap time, which
is good enough imho.

> no analysis has been made of the impact of the patch on
> compiled code performance on other architectures,

I can compile for other architectures, but not run on it, sorry.
Maybe someone else?

> only limited
> analysis has been made of compiled code performance on powerpc,

I don't have access to SPEC etc., sorry.  Maybe someone else?

> and there is no sound theoretical foundation for the patch.

If you say so, but then, neither is there for combine *at all*.
Reducing the amount of rtl isn't the same as improving performance.
The only difference in this regard is that the original combine
is easier to prove to be terminating.

Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  2:26   ` David Edelsohn
  2003-01-06  4:02     ` Segher Boessenkool
@ 2003-01-06 23:21     ` Geoff Keating
  2003-01-09 22:41       ` Segher Boessenkool
  1 sibling, 1 reply; 26+ messages in thread
From: Geoff Keating @ 2003-01-06 23:21 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Kenner, segher, gcc-patches

David Edelsohn <dje@watson.ibm.com> writes:

> 	I don't think that anyone is objecting to the concept and the
> benefit.  If I understand correctly, the patch violates the semantics of
> the combiner algorithm which requires a declining cost calculated as the
> number of instructions.  Allowing combinations that do not decrease the
> cost would make the algorithm non-deterministic and possibly not converge,
> right? 

It could also cause combine to perform a de-optimisation.  Now, its
default behaivour is to leave the original code alone, which at least
allows the user to try to obtain the best code sequence.

The difficulty I have with this patch is the justification for it.
It's not claimed that the change in the RTL itself improves
performance, but that because of other limitations in combine, the
change allows other changes that improve performance; and this example
is used as justification for allowing a whole class of changes only
one of which is the one that has been justified.

No analysis has been made of the impact of this patch on compile
speed, no analysis has been made of the impact of the patch on
compiled code performance on other architectures, only limited
analysis has been made of compiled code performance on powerpc, and
there is no sound theoretical foundation for the patch.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-06 22:09 Richard Kenner
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Kenner @ 2003-01-06 22:09 UTC (permalink / raw)
  To: dje; +Cc: gcc-patches

    I don't think that anyone is objecting to the concept and the benefit.
    If I understand correctly, the patch violates the semantics of the
    combiner algorithm which requires a declining cost calculated as the
    number of instructions.  Allowing combinations that do not decrease
    the cost would make the algorithm non-deterministic and possibly not
    converge, right?

That's what I'm saying, yes.  I'm out of the country right now with poor
email access and will have more to say on this when I et bck in a few days.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  4:02     ` Segher Boessenkool
@ 2003-01-06  4:07       ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-06  4:07 UTC (permalink / raw)
  To: David Edelsohn, Zack Weinberg, Richard Kenner, gcc-patches

Segher Boessenkool wrote:
> 
> David Edelsohn wrote:
> >
> >         I don't think that anyone is objecting to the concept and the
> > benefit.  If I understand correctly, the patch violates the semantics of
> > the combiner algorithm which requires a declining cost calculated as the
> > number of instructions.  Allowing combinations that do not decrease the
> > cost would make the algorithm non-deterministic and possibly not converge,
> > right?
> 
> Only if the "canonicalizations" don't actually canonicalize... but yes, in
> that (hypothetical) case that could happen.  Some documentation might be
> needed to that extend if this patch ever gets accepted.

I should add that this would require two simplifications that do exactly the
opposite transform, and would maybe already recurse -- in a subroutine of combine.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  2:26   ` David Edelsohn
@ 2003-01-06  4:02     ` Segher Boessenkool
  2003-01-06  4:07       ` Segher Boessenkool
  2003-01-06 23:21     ` Geoff Keating
  1 sibling, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-06  4:02 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Zack Weinberg, Richard Kenner, gcc-patches

David Edelsohn wrote:
> 
>         I don't think that anyone is objecting to the concept and the
> benefit.  If I understand correctly, the patch violates the semantics of
> the combiner algorithm which requires a declining cost calculated as the
> number of instructions.  Allowing combinations that do not decrease the
> cost would make the algorithm non-deterministic and possibly not converge,
> right?

Only if the "canonicalizations" don't actually canonicalize... but yes, in
that (hypothetical) case that could happen.  Some documentation might be
needed to that extend if this patch ever gets accepted.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-05 23:24 ` Zack Weinberg
@ 2003-01-06  2:26   ` David Edelsohn
  2003-01-06  4:02     ` Segher Boessenkool
  2003-01-06 23:21     ` Geoff Keating
  0 siblings, 2 replies; 26+ messages in thread
From: David Edelsohn @ 2003-01-06  2:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Richard Kenner, segher, gcc-patches

	I don't think that anyone is objecting to the concept and the
benefit.  If I understand correctly, the patch violates the semantics of
the combiner algorithm which requires a declining cost calculated as the
number of instructions.  Allowing combinations that do not decrease the
cost would make the algorithm non-deterministic and possibly not converge,
right? 

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-31  3:59 Richard Kenner
@ 2003-01-05 23:24 ` Zack Weinberg
  2003-01-06  2:26   ` David Edelsohn
  0 siblings, 1 reply; 26+ messages in thread
From: Zack Weinberg @ 2003-01-05 23:24 UTC (permalink / raw)
  To: Richard Kenner; +Cc: segher, gcc-patches

kenner@vlsi1.ultra.nyu.edu (Richard Kenner) writes:

>     Without it, lots of simplifications don't ever get applied.
>     This results in worse code.  For example, with the patch
>     applied, bootstrap time goes down by a few percent
>     (powerpc-unknown-linux-gnu), as well as code size.
>
>     One common example is, without the patch, computations involving
>     bitfields use mfcr insns; with it, they use logic instructions.
>
> But that's not what combine is supposed to do!  The purpose of
> combine is what it's name says, to *combine* insns.

That's what it has historically done, but that is not adequate reason
to refuse to make it do something else, if the something else is a
more effective optimization.  It sounds like you are reacting
negatively to this patch without even considering what merit it might
have.

The alternative you and others have suggested -- adding splitters to
the machine description -- neglects to consider that this change
potentially benefits *all* architectures, and without requiring
backend authors to jump through even more hoops than they already do.

I am not saying that this patch should definitely be applied; I just
think it is worthy of serious consideration on its technical merits,
rather than being blown off because it makes the name "combine.c"
marginally inaccurate.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  1:39 Richard Kenner
@ 2003-01-04  1:59 ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2003-01-04  1:59 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Yes, but sometimes doing a 2->2 simplification will allow it to do a
>     2->1 simplification, or two 2->2 simplifications will allow a 3->2, or
>     maybe some even longer chain.
> 
> But that should get done when combining 3 insns, for example.

But that won't work if 3 insns don't reduce to 2, but with one more, it
_does_ reduce to 3 insns, for example:

	(set (t1) (op1 (reg0) (const X1)))
	(set (reg1) (op2 (t1)))
	(set (t2) (op1 (reg0) (const X2)))
	(set (reg2) (op2 (t2)))

where op1 and op2 can be swapped (like, op1 is xor and op2 is extend), so we get

	(set (t1) (op2 (reg0)))
	(set (reg1) (op1 (t1) (xonst X1)))
	(set (t2) (op2 (reg0)))
	(set (reg2) (op1 (t2) (xonst X2)))

where the sets of t1 and t2 can be merged.

Without always canonicalizing all rtl, you miss some simplifications.  But
canonicalizing won't always help, either; sometimes the non-canonical pattern
will simplify better.  It may very well be too slow to try all possible
transformations, however.  Always doing the canonicalizations was a win on
almost all the code I inspected (gcc itself, some codecs, a forth vm); I saw
worse code only two or three times (and those were all non-eliminated dead code).

Some more to try (these use mfcr while better and/or shorter patterns are possible):

int xx1(int a, int b)
{
	return a == 0 && b == 0;
}

int xx2(int a, int b)
{
	a &= 1; b &= 1;
	return a == 0 && b == 0;
}

Another nice one:

int xx3(int a, int b)
{
	return ~a ^ b ^ a ^ ~b;
}

Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-02  1:39 Richard Kenner
  2003-01-04  1:59 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Kenner @ 2003-01-02  1:39 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Yes, but sometimes doing a 2->2 simplification will allow it to do a
    2->1 simplification, or two 2->2 simplifications will allow a 3->2, or
    maybe some even longer chain.

But that should get done when combining 3 insns, for example.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-31  3:59 Richard Kenner
  2003-01-05 23:24 ` Zack Weinberg
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Kenner @ 2002-12-31  3:59 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Without it, lots of simplifications don't ever get applied.  This results
    in worse code.  For example, with the patch applied, bootstrap time goes
    down by a few percent (powerpc-unknown-linux-gnu), as well as code size.

    One common example is, without the patch, computations involving bitfields
    use mfcr insns; with it, they use logic instructions.

But that's not what combine is supposed to do!  The purpose of combine
is what it's name says, to *combine* insns.

If there is a simpler way to do an insn, it should be in the MD file.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-29  5:13 Richard Kenner
@ 2002-12-29 19:25 ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2002-12-29 19:25 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     The simplifications are implemented, but combine won't simplify two insns
>     into two different insns.
> 
> Correct, and it *must* not.
> 
> The reason is that combine must always *decrease* the number of insns
> to avoid infinite loops.
> 
>     The attached patch fixes this.
> 
> This patch is *not* OK.


Isn't this part of the patch preventing such loops?


Segher


*************** try_combine (i3, i2, i1, new_direct_jump
*** 2472,2477 ****
--- 2475,2498 ----
        return 0;
      }
  
+   /* If combining two insns doesn't improve things, fail.  */
+   /* If NEWI2PAT is a PARALLEL, it might still improve things;
+      that will need additional checks though, as I2PAT can be
+      part of an identical PARALLEL, and we end up with an
+      infinite loop.  For now, just don't allow it.  */
+   /* Similarly, it is also possible that there is some valid
+      simplification for which the GET_CODE ... clause below
+      is true.  */
+ 
+   if (i1 == 0 && newi2pat
+       && (GET_CODE (i2pat) != SET || GET_CODE (newi2pat) == PARALLEL
+ 	  || (GET_CODE (newi2pat) == SET
+ 	      && GET_CODE (SET_SRC (i2pat)) == GET_CODE (SET_SRC (newi2pat)))))
+     {
+       undo_all ();
+       return 0;
+     }
+ 
    /* If we had to change another insn, make sure it is valid also.  */
    if (undobuf.other_insn)
      {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-29  5:13 Richard Kenner
  2002-12-29 19:25 ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Kenner @ 2002-12-29  5:13 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    The simplifications are implemented, but combine won't simplify two insns
    into two different insns.  

Correct, and it *must* not.

The reason is that combine must always *decrease* the number of insns
to avoid infinite loops.

    The attached patch fixes this.  

This patch is *not* OK.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21         ` Segher Boessenkool
  2002-12-20 22:28           ` David Edelsohn
  2002-12-21 21:55           ` Geoff Keating
@ 2002-12-28 22:08           ` Segher Boessenkool
  2 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2002-12-28 22:08 UTC (permalink / raw)
  To: Geoff Keating, dje, gcc-patches, pinskia, dalej

Segher Boessenkool wrote:
> 
> Geoff Keating wrote:
> > The canonical form for (mult A (neg B)) is (mult (neg A) B);
> > this is documented but may not be implemented.  If it was implemented,
> > (mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
> > and that simplifies down to (mult A B).
> 
> I'll make a patch to fix that, then.

The simplifications are implemented, but combine won't simplify two insns
into two different insns.  The attached patch fixes this.  Bootstrapped on
powerpc-unknown-linux-gnu (c,c++,objc); regression checked too.


Segher


2002-12-28  Segher Boessenkool  <segher@koffie.nl>

	* combine.c (try_combine): Handle simplifying two insns into
	two different insns.



*** ../../gcc-clean/gcc/combine.c	Fri Dec 27 03:21:29 2002
--- ./combine.c	Sat Dec 28 07:18:19 2002
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2113,2120 ****
       insns.  There are two ways to do this.  It can be split using a
       machine-specific method (like when you have an addition of a large
       constant) or by combine in the function find_split_point.  */
  
!   if (i1 && insn_code_number < 0 && GET_CODE (newpat) == SET
        && asm_noperands (newpat) < 0)
      {
        rtx m_split, *split;
--- 2113,2123 ----
       insns.  There are two ways to do this.  It can be split using a
       machine-specific method (like when you have an addition of a large
       constant) or by combine in the function find_split_point.  */
+   /* We need to do this when combining only two insns, too; otherwise,
+      canonicalizations and simplifications that do not result in a single
+      machine insn will never be performed.  */
  
!   if (insn_code_number < 0 && GET_CODE (newpat) == SET
        && asm_noperands (newpat) < 0)
      {
        rtx m_split, *split;
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2472,2477 ****
--- 2475,2498 ----
        return 0;
      }
  
+   /* If combining two insns doesn't improve things, fail.  */
+   /* If NEWI2PAT is a PARALLEL, it might still improve things;
+      that will need additional checks though, as I2PAT can be
+      part of an identical PARALLEL, and we end up with an
+      infinite loop.  For now, just don't allow it.  */
+   /* Similarly, it is also possible that there is some valid
+      simplification for which the GET_CODE ... clause below
+      is true.  */
+ 
+   if (i1 == 0 && newi2pat
+       && (GET_CODE (i2pat) != SET || GET_CODE (newi2pat) == PARALLEL
+ 	  || (GET_CODE (newi2pat) == SET
+ 	      && GET_CODE (SET_SRC (i2pat)) == GET_CODE (SET_SRC (newi2pat)))))
+     {
+       undo_all ();
+       return 0;
+     }
+ 
    /* If we had to change another insn, make sure it is valid also.  */
    if (undobuf.other_insn)
      {
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2666,2672 ****
         patterns, move from I1 to I2 then I2 to I3 so that we get the
         proper movement on registers that I2 modifies.  */
  
!     if (newi2pat)
        {
  	move_deaths (newi2pat, NULL_RTX, INSN_CUID (i1), i2, &midnotes);
  	move_deaths (newpat, newi2pat, INSN_CUID (i1), i3, &midnotes);
--- 2687,2693 ----
         patterns, move from I1 to I2 then I2 to I3 so that we get the
         proper movement on registers that I2 modifies.  */
  
!     if (newi2pat && i1)
        {
  	move_deaths (newi2pat, NULL_RTX, INSN_CUID (i1), i2, &midnotes);
  	move_deaths (newpat, newi2pat, INSN_CUID (i1), i3, &midnotes);
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2869,2875 ****
        && INSN_CUID (added_links_insn) < INSN_CUID (i3))
      return added_links_insn;
    else
!     return newi2pat ? i2 : i3;
  }
  \f
  /* Undo all the modifications recorded in undobuf.  */
--- 2890,2896 ----
        && INSN_CUID (added_links_insn) < INSN_CUID (i3))
      return added_links_insn;
    else
!     return (i1 && newi2pat) ? i2 : i3;
  }
  \f
  /* Undo all the modifications recorded in undobuf.  */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21         ` Segher Boessenkool
  2002-12-20 22:28           ` David Edelsohn
@ 2002-12-21 21:55           ` Geoff Keating
  2002-12-28 22:08           ` Segher Boessenkool
  2 siblings, 0 replies; 26+ messages in thread
From: Geoff Keating @ 2002-12-21 21:55 UTC (permalink / raw)
  To: segher; +Cc: dje, gcc-patches, pinskia, dalej

> Date: Sat, 21 Dec 2002 06:58:50 +0100
> From: Segher Boessenkool <segher@koffie.nl>

> > > Oh btw, for a function like
> > >
> > > float bla(float a, float b, float c)
> > > {
> > >       return a + b * c;
> > > }
> > >
> > > gcc generates something like
> > >
> > >       fmadds  2,2,3,1
> > >       fmr     1,2
> > >       blr
> > >
> > > any idea what needs to be fixed to get rid of the fmr insn?
> > 
> > The register allocator.
> 
> I was hoping for something a little bit more specific than this, like,
> if it's a known shortcoming for example.

I don't know about this particular case, but it's known that the
allocator doesn't prefer to allocate pseudos to hard registers that
don't have their own separate register class just because the psuedo
is copied to that register.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21         ` Segher Boessenkool
@ 2002-12-20 22:28           ` David Edelsohn
  2002-12-21 21:55           ` Geoff Keating
  2002-12-28 22:08           ` Segher Boessenkool
  2 siblings, 0 replies; 26+ messages in thread
From: David Edelsohn @ 2002-12-20 22:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Geoff Keating, gcc-patches, pinskia, dalej

>>>>> Segher Boessenkool writes:

>> >       fmadds  2,2,3,1
>> >       fmr     1,2
>> >
>> > any idea what needs to be fixed to get rid of the fmr insn?
>> 
>> The register allocator.

Segher> I was hoping for something a little bit more specific than this, like,
Segher> if it's a known shortcoming for example.

	It's a known problem.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 21:38       ` Geoff Keating
@ 2002-12-20 22:21         ` Segher Boessenkool
  2002-12-20 22:28           ` David Edelsohn
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Segher Boessenkool @ 2002-12-20 22:21 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dje, gcc-patches, pinskia, dalej

Geoff Keating wrote:
> The canonical form for (mult A (neg B)) is (mult (neg A) B);
> this is documented but may not be implemented.  If it was implemented,
> (mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
> and that simplifies down to (mult A B).

I'll make a patch to fix that, then.

> > Oh btw, for a function like
> >
> > float bla(float a, float b, float c)
> > {
> >       return a + b * c;
> > }
> >
> > gcc generates something like
> >
> >       fmadds  2,2,3,1
> >       fmr     1,2
> >       blr
> >
> > any idea what needs to be fixed to get rid of the fmr insn?
> 
> The register allocator.

I was hoping for something a little bit more specific than this, like,
if it's a known shortcoming for example.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 21:08     ` rs6000 fused multiply-add patch [+ patchlet] Segher Boessenkool
@ 2002-12-20 21:38       ` Geoff Keating
  2002-12-20 22:21         ` Segher Boessenkool
  0 siblings, 1 reply; 26+ messages in thread
From: Geoff Keating @ 2002-12-20 21:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: dje, gcc-patches, pinskia, dalej

Segher Boessenkool <segher@koffie.nl> writes:

> Sorry for the delay...
> 
> First, a trivial extra patch that's needed to get fma's in all
> fastmath cases:
> 
> 
> 2002-12-21  Segher Boessenkool  <segher@koffie.nl>
> 
> 	* combine.c (combine_simplify_rtx): Add a simplification
> 	for the multiplication of two negations.

The canonical form for (mult A (neg B)) is (mult (neg A) B);
this is documented but may not be implemented.  If it was implemented,
(mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
and that simplifies down to (mult A B).

> > > > +@code{(plus (mult (neg A) B) C)} is canonicalized as
> > > > +@code{(minus A (mult B C))}.
> 
> [Typo: this last example is in error (it mixes up the ABC).]

Sigh.  Yes, it should be '(minus C (mult A B))'..

> This exact example is the only case where ieee-math doesn't use an
> fma instruction where it could [as you yourself already mentioned, btw]
> (it uses an fmul/fsub sequence instead of an fneg/fmadd sequence); it might
> be nice to get the extra precision the fma provides.  Slightly worse though,
> for -(a - b * c), it generates fmul/fsub/fneg instead of fneg/fnmadd.

Yes.

> Oh btw, for a function like
> 
> float bla(float a, float b, float c)
> {
> 	return a + b * c;
> }
> 
> gcc generates something like
> 
> 	fmadds	2,2,3,1
> 	fmr	1,2
> 	blr
> 
> any idea what needs to be fixed to get rid of the fmr insn?

The register allocator.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-05 14:04   ` Geoff Keating
@ 2002-12-20 21:08     ` Segher Boessenkool
  2002-12-20 21:38       ` Geoff Keating
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2002-12-20 21:08 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dje, gcc-patches, pinskia, dalej

Sorry for the delay...

First, a trivial extra patch that's needed to get fma's in all
fastmath cases:

2002-12-21  Segher Boessenkool  <segher@koffie.nl>

	* combine.c (combine_simplify_rtx): Add a simplification
	for the multiplication of two negations.

*** gcc/combine.c.orig	Sat Dec 21 01:22:18 2002
--- gcc/combine.c	Sat Dec 21 01:49:03 2002
*************** combine_simplify_rtx (x, op0_mode, last,
*** 4417,4422 ****
--- 4417,4427 ----
  	  if (tem)
  	    return gen_binary (DIV, mode, tem, XEXP (XEXP (x, 0), 1));
  	}
+ 
+       /* Simplify (mult (neg A) (neg B)) to (mult A B).  */
+       if (GET_CODE (XEXP (x, 0)) == NEG && GET_CODE (XEXP (x, 1)) == NEG)
+ 	return gen_binary (MULT, mode, XEXP (XEXP (x, 0), 0),
+ 				       XEXP (XEXP (x, 1), 0));
        break;

      case UDIV:

---end of patch

This is needed because of the pushing-inwards of NEG's, when there's
already a NEG inside, like in  -(b * -c) - a .  With this, it passes
all my 96 cases for a single fma insn expression.  Great!

Geoff Keating wrote:
> > Geoffrey Keating wrote:
> > > +@item
> > > +In combinations of @code{neg}, @code{mult}, @code{plus}, and
> > > +@code{minus}, the @code{neg} operations (if any) will be moved inside
> > > +the operations as far as possible.  For instance,
> > > +@code{(neg (mult A B))} is canonicalized as @code{(mult (neg A) B)}, but
> > > +@code{(plus (mult (neg A) B) C)} is canonicalized as
> > > +@code{(minus A (mult B C))}.

[Typo: this last example is in error (it mixes up the ABC).]

This exact example is the only case where ieee-math doesn't use an
fma instruction where it could [as you yourself already mentioned, btw]
(it uses an fmul/fsub sequence instead of an fneg/fmadd sequence); it might
be nice to get the extra precision the fma provides.  Slightly worse though,
for -(a - b * c), it generates fmul/fsub/fneg instead of fneg/fnmadd.

I'll try to write a splitter for this, like you suggest.

> I'd have used your testcases, but they didn't seem to be in your patch.

I don't know how to write a testsuite thingy yet; maybe tomorrow, if I can
find myself a working version of dejagnu.  I'll send my testcases, promise :)

Oh btw, for a function like

float bla(float a, float b, float c)
{
	return a + b * c;
}

gcc generates something like

	fmadds	2,2,3,1
	fmr	1,2
	blr

any idea what needs to be fixed to get rid of the fmr insn?

Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2003-01-12  1:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-30  4:58 rs6000 fused multiply-add patch [+ patchlet] Richard Kenner
2002-12-30 20:17 ` Segher Boessenkool
2002-12-30 20:51   ` David Edelsohn
2003-01-02  0:52     ` Segher Boessenkool
2003-01-02  1:44       ` Geoff Keating
2003-01-04  1:59         ` Segher Boessenkool
  -- strict thread matches above, loose matches on Subject: below --
2003-01-10  2:15 Richard Kenner
2003-01-12  1:38 ` Segher Boessenkool
2003-01-06 22:09 Richard Kenner
2003-01-02  1:39 Richard Kenner
2003-01-04  1:59 ` Segher Boessenkool
2002-12-31  3:59 Richard Kenner
2003-01-05 23:24 ` Zack Weinberg
2003-01-06  2:26   ` David Edelsohn
2003-01-06  4:02     ` Segher Boessenkool
2003-01-06  4:07       ` Segher Boessenkool
2003-01-06 23:21     ` Geoff Keating
2003-01-09 22:41       ` Segher Boessenkool
2002-12-29  5:13 Richard Kenner
2002-12-29 19:25 ` Segher Boessenkool
2002-12-03 17:29 rs6000 fused multiply-add patch David Edelsohn
2002-12-04 19:41 ` Segher Boessenkool
2002-12-05 14:04   ` Geoff Keating
2002-12-20 21:08     ` rs6000 fused multiply-add patch [+ patchlet] Segher Boessenkool
2002-12-20 21:38       ` Geoff Keating
2002-12-20 22:21         ` Segher Boessenkool
2002-12-20 22:28           ` David Edelsohn
2002-12-21 21:55           ` Geoff Keating
2002-12-28 22:08           ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).