public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC patch: invariant addresses too cheap
@ 2009-10-19 16:17 Michael Matz
  2009-10-19 16:45 ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Matz @ 2009-10-19 16:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: bonzini

Hi,

the patch from http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00341.html 
(one part of fixing PR33928) introduced a 10% regression in 410.bwaves 
(http://gcc.opensuse.org/SPEC/CFP/sb-barbella.suse.de-head-64-2006/index.html 
the drop in May)

The reason is that some invariants aren't moved out of a non-inner loop 
in mat_times_vec (the hot function).  A C testcase is something like this:

---------------
#define I(c,d) (c*n+d)
int f (long x[], long n, long n2, long n3)
{
  int sum = 0;
  long k, l, m;
  long km1, kp1;
  for (k = 0; k < n; k++)
    {
      km1 = (k + n - 2) % n + 1;
      kp1 = (k + 1) % n;
      for (l = 0; l < n; l++)
        for (m = 0; m < n; m++)
          {
            sum += x[I(k,m)];
            sum += x[I(kp1,m)];
            sum += x[I(km1,m)];
          }
    }
  return sum;
}
----------------

(compile with -O3 -ffast-math -funroll-loops -fpeel-loops -fno-tree-vectorize)

The invariants are of the form
   (set (reg:DI 193 [ ivtmp.96 ])
        (plus:DI (reg/v/f:DI 229 [ x ]) (reg:DI 245)))

They aren't moved because they are considered too cheap (they cost on x86 
is 3, which is < COSTS_N_INSN(1)).  As they use two registers they better 
be moved independend of cost.  Looking at the definition and uses of 
address_cost in the compiler I doubt that the comparison with COSTS_N_INSN 
is correct.  Certainly currently the comparison with COSTS_N_INSN is just 
a magic number.  There are more ports defining TARGET_ADDRESS_COSTS in 
terms of simple ordinals numbers than there are defining them relative to 
COSTS_N_INSN.  The default definition does use COSTS_N_INSN, though.

This inconsistency didn't matter until the patch because we always only 
compared these costs with each other, not to determine cheapness in an 
absolute sense.  The port definitions weren't changed with the patch, so 
COSTS_N_INSN(1) is just a fancy way of writing the magic number "4".  With 
that we can as well write the magic number "3", and get both: the testcase 
from the PR still doesn't exhibit moving trivial invariants like 
"-16(%r12), -24(%r12), ...", and doesn't regress 410.bwaves either.

Now, that is of course a bit unsatisfying, but there aren't that terribly 
many ways how to fix this because we have conflicting requirements:
(a) for relative differences the numbers don't need a unit, and ports are 
    defined with that in mind.  E.g. on x86 reg+ofs has cost 2, while 
    reg+reg has cost 3, and reg alone has cost 1.  All three of them are 
    usable as address operands without needing additional instructions, so 
    they are very cheap in one sense, but still they are different.
(b) but with that comparison for absolute cheapness doesn't make sense.  
    Even if the ports would multiply everything with COSTS_N_INSN(1) we 
    had the same problem, loop-invariant.c (comparing with 
    COSTS_N_INSN(1)) still wouldn't move anything.

Really the address_cost target hook is better thought of as determining 
relative complexity.  For loop-invariant.c we could also try to do 
something similar and look at the form of the address to determine if it's 
really trivial enough to ignore moving (e.g. doesn't refer to two regs).

We could also change the i386 target hook to return something > 
COSTS_N_INSN(1) when two regs are mentioned and something smaller 
(0,1,2,3) when not.  I.e. "half" insn costs.  That doesn't sound too 
appealing either, OTOH that may be the reason why COSTS_N_INSN is scaled 
at all.

Does anyone have any opinions?


Ciao,
Michael.
-- 
	* loop-invariant.c (create_new_invariant): Use different magic number.

Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 147274)
+++ loop-invariant.c	(working copy)
@@ -686,7 +686,7 @@ create_new_invariant (struct def *def, r
     {
       inv->cost = rtx_cost (set, SET, speed);
       inv->cheap_address = address_cost (SET_SRC (set), word_mode,
-					 speed) < COSTS_N_INSNS (1);
+					 speed) < 3;
     }
   else
     {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-19 16:17 RFC patch: invariant addresses too cheap Michael Matz
@ 2009-10-19 16:45 ` Paolo Bonzini
  2009-10-20  9:04   ` Richard Guenther
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2009-10-19 16:45 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

> We could also change the i386 target hook to return something >
> COSTS_N_INSN(1) when two regs are mentioned and something smaller
> (0,1,2,3) when not.  I.e. "half" insn costs.  That doesn't sound too
> appealing either, OTOH that may be the reason why COSTS_N_INSN is scaled
> at all.

I have no qualms with your patch at all, but I think that fractional costs
do make sense if you want address_cost to have the same unit as
rtx_costs (which the default implementation suggests).

I would just write < 3 in your patch as "<= COSTS_N_INSN (1) / 2".
Which shows even more how it is taken from thin air :-) but at least
shows that it is supposed to be in the same unit as rtx_costs.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-19 16:45 ` Paolo Bonzini
@ 2009-10-20  9:04   ` Richard Guenther
  2009-10-20 12:07     ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Guenther @ 2009-10-20  9:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Matz, gcc-patches

On Mon, Oct 19, 2009 at 6:42 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> We could also change the i386 target hook to return something >
>> COSTS_N_INSN(1) when two regs are mentioned and something smaller
>> (0,1,2,3) when not.  I.e. "half" insn costs.  That doesn't sound too
>> appealing either, OTOH that may be the reason why COSTS_N_INSN is scaled
>> at all.
>
> I have no qualms with your patch at all, but I think that fractional costs
> do make sense if you want address_cost to have the same unit as
> rtx_costs (which the default implementation suggests).
>
> I would just write < 3 in your patch as "<= COSTS_N_INSN (1) / 2".
> Which shows even more how it is taken from thin air :-) but at least
> shows that it is supposed to be in the same unit as rtx_costs.

Did you at the time when producing the original patch do measurements
with different magic numbers?  As of using a magic number there,
another option would be to have a new target macro specifying
CHEAP_ADDRESS_COST or sth like that...

Richard.

> Paolo
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-20  9:04   ` Richard Guenther
@ 2009-10-20 12:07     ` Paolo Bonzini
  2009-10-20 16:00       ` Michael Matz
  2009-10-31 10:05       ` Richard Sandiford
  0 siblings, 2 replies; 24+ messages in thread
From: Paolo Bonzini @ 2009-10-20 12:07 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Matz, gcc-patches

> Did you at the time when producing the original patch do measurements
> with different magic numbers?

No.

> As of using a magic number there,
> another option would be to have a new target macro specifying
> CHEAP_ADDRESS_COST or sth like that...

That would be premature.  The optimization was introduced specifically
for x86, any other target probably does not care (surely not the RISC
ones that always have COSTS_N_INSNS(1) for the address_cost).

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-20 12:07     ` Paolo Bonzini
@ 2009-10-20 16:00       ` Michael Matz
  2009-10-20 16:06         ` Richard Guenther
  2009-10-31 10:05       ` Richard Sandiford
  1 sibling, 1 reply; 24+ messages in thread
From: Michael Matz @ 2009-10-20 16:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Hi,

On Tue, 20 Oct 2009, Paolo Bonzini wrote:

> > As of using a magic number there, another option would be to have a 
> > new target macro specifying CHEAP_ADDRESS_COST or sth like that...
> 
> That would be premature.  The optimization was introduced specifically 
> for x86, any other target probably does not care (surely not the RISC 
> ones that always have COSTS_N_INSNS(1) for the address_cost).

Actually some (alpha,ia64,m32r,mcore,mmix,rs6000,sparc,spu,v850,xtensa) 
override it with hook_int_rtx_bool_0 (hence all addresses have cost 0), 
some (fr30,frv,h8300,m68k,moxie,pdp11,picochip,vms) use the 
default_address_cost, which returns also 0 for simple REG addresses and 
otherwise is defined in terms of COSTS_N_INSN.

Some targets compute the cost in terms of COSTS_N_INSN (m32c, s390), and 
the rest some custom selection of magic numbers (cycle counts, or 
instruction count, or any other number).

Okay, so I'm asking for approval of the below patch, that simply uses the 
other magic number that happens to be the good one on x86, with an extra 
large comment in front refering to this thread.

Regstrapped on x86_64-linux (all langs+Ada), runtime of 410.bwaves 
decreases from 1440 to 1350 seconds which is similar to before May.
This needs deactivation of Vlads regpressure aware loop invariant motion.  
With that active _no_ invariants at all are moved for the hot routine in 
410.bwaves.


Ciao,
Michael.
-- 
	* loop-invariant.c (create_new_invariant): Use different magic number.

Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 152670)
+++ loop-invariant.c	(working copy)
@@ -685,8 +685,17 @@ create_new_invariant (struct def *def, r
   if (def)
     {
       inv->cost = rtx_cost (set, SET, speed);
+      /* ??? Try to determine cheapness of address computation.  Unfortunately
+         the address cost is only a relative measure, we can't really compare
+	 it with any absolute number, but only with other address costs.
+	 But here we don't have any other addresses, so compare with a magic
+	 number anyway.  It has to be large enough to not regress PR33928
+	 (by avoiding to move reg+8,reg+16,reg+24 invariants), but small
+	 enough to not regress 410.bwaves either (by still moving reg+reg
+	 invariants).
+	 See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
       inv->cheap_address = address_cost (SET_SRC (set), word_mode,
-					 speed) < COSTS_N_INSNS (1);
+					 speed) < 3;
     }
   else
     {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-20 16:00       ` Michael Matz
@ 2009-10-20 16:06         ` Richard Guenther
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Guenther @ 2009-10-20 16:06 UTC (permalink / raw)
  To: Michael Matz; +Cc: Paolo Bonzini, gcc-patches

On Tue, Oct 20, 2009 at 6:00 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 20 Oct 2009, Paolo Bonzini wrote:
>
>> > As of using a magic number there, another option would be to have a
>> > new target macro specifying CHEAP_ADDRESS_COST or sth like that...
>>
>> That would be premature.  The optimization was introduced specifically
>> for x86, any other target probably does not care (surely not the RISC
>> ones that always have COSTS_N_INSNS(1) for the address_cost).
>
> Actually some (alpha,ia64,m32r,mcore,mmix,rs6000,sparc,spu,v850,xtensa)
> override it with hook_int_rtx_bool_0 (hence all addresses have cost 0),
> some (fr30,frv,h8300,m68k,moxie,pdp11,picochip,vms) use the
> default_address_cost, which returns also 0 for simple REG addresses and
> otherwise is defined in terms of COSTS_N_INSN.
>
> Some targets compute the cost in terms of COSTS_N_INSN (m32c, s390), and
> the rest some custom selection of magic numbers (cycle counts, or
> instruction count, or any other number).
>
> Okay, so I'm asking for approval of the below patch, that simply uses the
> other magic number that happens to be the good one on x86, with an extra
> large comment in front refering to this thread.
>
> Regstrapped on x86_64-linux (all langs+Ada), runtime of 410.bwaves
> decreases from 1440 to 1350 seconds which is similar to before May.
> This needs deactivation of Vlads regpressure aware loop invariant motion.
> With that active _no_ invariants at all are moved for the hot routine in
> 410.bwaves.

Ok.

Thanks,
Richard.

> Ciao,
> Michael.
> --
>        * loop-invariant.c (create_new_invariant): Use different magic number.
>
> Index: loop-invariant.c
> ===================================================================
> --- loop-invariant.c    (revision 152670)
> +++ loop-invariant.c    (working copy)
> @@ -685,8 +685,17 @@ create_new_invariant (struct def *def, r
>   if (def)
>     {
>       inv->cost = rtx_cost (set, SET, speed);
> +      /* ??? Try to determine cheapness of address computation.  Unfortunately
> +         the address cost is only a relative measure, we can't really compare
> +        it with any absolute number, but only with other address costs.
> +        But here we don't have any other addresses, so compare with a magic
> +        number anyway.  It has to be large enough to not regress PR33928
> +        (by avoiding to move reg+8,reg+16,reg+24 invariants), but small
> +        enough to not regress 410.bwaves either (by still moving reg+reg
> +        invariants).
> +        See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
>       inv->cheap_address = address_cost (SET_SRC (set), word_mode,
> -                                        speed) < COSTS_N_INSNS (1);
> +                                        speed) < 3;
>     }
>   else
>     {
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-20 12:07     ` Paolo Bonzini
  2009-10-20 16:00       ` Michael Matz
@ 2009-10-31 10:05       ` Richard Sandiford
  2009-10-31 11:14         ` Paolo Bonzini
  2009-10-31 19:23         ` Michael Matz
  1 sibling, 2 replies; 24+ messages in thread
From: Richard Sandiford @ 2009-10-31 10:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, Michael Matz, gcc-patches

[catching up on backlog]

Paolo Bonzini <bonzini@gnu.org> writes:
>> As of using a magic number there,
>> another option would be to have a new target macro specifying
>> CHEAP_ADDRESS_COST or sth like that...
>
> That would be premature.  The optimization was introduced specifically
> for x86, any other target probably does not care (surely not the RISC
> ones that always have COSTS_N_INSNS(1) for the address_cost).

I don't understand why it would be premature.  As Michael says,
some ports don't a COSTS_N_INSNS-based value, and MIPS is one
of those.  As it happens, in the MIPS costs, 1 is "cheap" and
2+ is "expensive".  So either I need to hack MIPS so that
2 is cheap and 3+ is "expensive", or we need some better way
of determining this.

I don't think sticking an x86-specific magic number in the
middle of generic code is acceptable, regardless of how big
the comment above it is. ;)

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 10:05       ` Richard Sandiford
@ 2009-10-31 11:14         ` Paolo Bonzini
  2009-10-31 11:16           ` Richard Guenther
  2009-10-31 19:23         ` Michael Matz
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2009-10-31 11:14 UTC (permalink / raw)
  To: gcc-patches

On 10/31/2009 10:57 AM, Richard Sandiford wrote:
> So either I need to hack MIPS so that
> 2 is cheap and 3+ is "expensive", or we need some better way
> of determining this.

-  return mips_address_insns (addr, SImode, false);
+  int insns = mips_address_insns (addr, SImode, false);
+  return insns == 1 ? 1 : COSTS_N_INSNS (insns - 1);

?

> I don't think sticking an x86-specific magic number in the
> middle of generic code is acceptable, regardless of how big
> the comment above it is. ;)

That's why I suggested using "<= COSTS_N_INSNS (1) / 2" instead of "< 
3".  Also ix86-specific calibration, but not so brutally. :-)

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 11:14         ` Paolo Bonzini
@ 2009-10-31 11:16           ` Richard Guenther
  2009-10-31 12:56             ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Guenther @ 2009-10-31 11:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches

On Sat, Oct 31, 2009 at 11:58 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 10/31/2009 10:57 AM, Richard Sandiford wrote:
>>
>> So either I need to hack MIPS so that
>> 2 is cheap and 3+ is "expensive", or we need some better way
>> of determining this.
>
> -  return mips_address_insns (addr, SImode, false);
> +  int insns = mips_address_insns (addr, SImode, false);
> +  return insns == 1 ? 1 : COSTS_N_INSNS (insns - 1);
>
> ?
>
>> I don't think sticking an x86-specific magic number in the
>> middle of generic code is acceptable, regardless of how big
>> the comment above it is. ;)
>
> That's why I suggested using "<= COSTS_N_INSNS (1) / 2" instead of "< 3".
>  Also ix86-specific calibration, but not so brutally. :-)

But then you should document that address-cost is supposed to be
insn-cost based.  Otherwise comparing apples with oranges doesn't
make sense, even if it might look less magic to you.

Richard.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 11:16           ` Richard Guenther
@ 2009-10-31 12:56             ` Paolo Bonzini
  2009-10-31 13:53               ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2009-10-31 12:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On 10/31/2009 12:14 PM, Richard Guenther wrote:
> On Sat, Oct 31, 2009 at 11:58 AM, Paolo Bonzini<bonzini@gnu.org>  wrote:
>> On 10/31/2009 10:57 AM, Richard Sandiford wrote:
>>>
>>> So either I need to hack MIPS so that
>>> 2 is cheap and 3+ is "expensive", or we need some better way
>>> of determining this.
>>
>> -  return mips_address_insns (addr, SImode, false);
>> +  int insns = mips_address_insns (addr, SImode, false);
>> +  return insns == 1 ? 1 : COSTS_N_INSNS (insns - 1);
>>
>> ?
>>
>>> I don't think sticking an x86-specific magic number in the
>>> middle of generic code is acceptable, regardless of how big
>>> the comment above it is. ;)
>>
>> That's why I suggested using "<= COSTS_N_INSNS (1) / 2" instead of "<  3".
>>   Also ix86-specific calibration, but not so brutally. :-)
>
> But then you should document that address-cost is supposed to be
> insn-cost based.  Otherwise comparing apples with oranges doesn't
> make sense, even if it might look less magic to you.

That's true.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 12:56             ` Paolo Bonzini
@ 2009-10-31 13:53               ` Richard Sandiford
  2009-10-31 14:31                 ` Paolo Bonzini
  2009-11-17 18:02                 ` Mark Mitchell
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Sandiford @ 2009-10-31 13:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Paolo Bonzini <bonzini@gnu.org> writes:
> On 10/31/2009 12:14 PM, Richard Guenther wrote:
>> On Sat, Oct 31, 2009 at 11:58 AM, Paolo Bonzini<bonzini@gnu.org>  wrote:
>>> On 10/31/2009 10:57 AM, Richard Sandiford wrote:
>>>>
>>>> So either I need to hack MIPS so that
>>>> 2 is cheap and 3+ is "expensive", or we need some better way
>>>> of determining this.
>>>
>>> -  return mips_address_insns (addr, SImode, false);
>>> +  int insns = mips_address_insns (addr, SImode, false);
>>> +  return insns == 1 ? 1 : COSTS_N_INSNS (insns - 1);

Ick. ;/

>>>
>>> ?
>>>
>>>> I don't think sticking an x86-specific magic number in the
>>>> middle of generic code is acceptable, regardless of how big
>>>> the comment above it is. ;)
>>>
>>> That's why I suggested using "<= COSTS_N_INSNS (1) / 2" instead of "<  3".
>>>   Also ix86-specific calibration, but not so brutally. :-)
>>
>> But then you should document that address-cost is supposed to be
>> insn-cost based.  Otherwise comparing apples with oranges doesn't
>> make sense, even if it might look less magic to you.
>
> That's true.

Yeah.  We'd also have to document that "half an insn" is considered cheap.
And what does that really mean?  Are we saying that "at most half of an
insn can be given over to address calculation"?  Seems a bit of a weird
concept to hard-wire into the compiler.

And if we did that, I think we'd also want to rewrite the x86 hook so
that it does use COSTS_N_INSNS.  Backends shouldn't really be assuming
that COSTS_N_INSNS(1) == 4.

I prefer the abstract address costs we have now, and in that situation,
Michael's original suggestion of having a target-defined threshold
macro seems more natural.  I'm not sure why it got shot down.

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 13:53               ` Richard Sandiford
@ 2009-10-31 14:31                 ` Paolo Bonzini
  2009-11-01 10:24                   ` Richard Sandiford
  2009-11-17 18:02                 ` Mark Mitchell
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2009-10-31 14:31 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, rdsandiford

On 10/31/2009 02:45 PM, Richard Sandiford wrote:
> I prefer the abstract address costs we have now, and in that situation,
> Michael's original suggestion of having a target-defined threshold
> macro seems more natural.  I'm not sure why it got shot down.

I guess it was just "this patch is ready".

I said I considered it premature, but this does not mean I would oppose 
it in any way, especially after you brought up an argument in favor of 
the target hook.

However, now I'm thinking this: if the address_cost is just meant to be 
compared *except for this particular occasion*, why not make "<= 0" 
cheap?  This would of course need fixing the ix86 backend too.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 10:05       ` Richard Sandiford
  2009-10-31 11:14         ` Paolo Bonzini
@ 2009-10-31 19:23         ` Michael Matz
  2009-11-01 10:13           ` Richard Sandiford
  1 sibling, 1 reply; 24+ messages in thread
From: Michael Matz @ 2009-10-31 19:23 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Paolo Bonzini, Richard Guenther, gcc-patches

Hi,

On Sat, 31 Oct 2009, Richard Sandiford wrote:

> Paolo Bonzini <bonzini@gnu.org> writes:
> >> As of using a magic number there,
> >> another option would be to have a new target macro specifying
> >> CHEAP_ADDRESS_COST or sth like that...
> >
> > That would be premature.  The optimization was introduced specifically
> > for x86, any other target probably does not care (surely not the RISC
> > ones that always have COSTS_N_INSNS(1) for the address_cost).
> 
> I don't understand why it would be premature.  As Michael says, some 
> ports don't a COSTS_N_INSNS-based value, and MIPS is one of those.  As 
> it happens, in the MIPS costs, 1 is "cheap" and 2+ is "expensive".

So that wouldn't have made a difference with the old magic number (which 
was 4) ;-)

> So either I need to hack MIPS so that 2 is cheap and 3+ is "expensive", 
> or we need some better way of determining this.

Yeah, actually that we need to determine absolute address cheapness at all 
is a bit unsatisfying.  We probably could get away with a target 
independend notion of when to hoist invariant addresses.  For instance, if 
it has two register references, those registers are only used in this 
address, then it's always a win to hoist (because it reduces register 
pressure by one).  Then we obviously have the problem what to do with 
offsetted addresses, some of them are cheap (small constants), some of 
them expensive (large constant), but that's target dependend.

I don't really see a good way out here, except sitting down, redefining 
the target address cost hook (so that it's usable in its two contextes 
it's used in now, relative cost comparisons and absolute ones), and make 
all targets follow that.  I chickened out then :-)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 19:23         ` Michael Matz
@ 2009-11-01 10:13           ` Richard Sandiford
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Sandiford @ 2009-11-01 10:13 UTC (permalink / raw)
  To: Michael Matz; +Cc: Paolo Bonzini, Richard Guenther, gcc-patches

Michael Matz <matz@suse.de> writes:
> Hi,
>
> On Sat, 31 Oct 2009, Richard Sandiford wrote:
>
>> Paolo Bonzini <bonzini@gnu.org> writes:
>> >> As of using a magic number there,
>> >> another option would be to have a new target macro specifying
>> >> CHEAP_ADDRESS_COST or sth like that...
>> >
>> > That would be premature.  The optimization was introduced specifically
>> > for x86, any other target probably does not care (surely not the RISC
>> > ones that always have COSTS_N_INSNS(1) for the address_cost).
>> 
>> I don't understand why it would be premature.  As Michael says, some 
>> ports don't a COSTS_N_INSNS-based value, and MIPS is one of those.  As 
>> it happens, in the MIPS costs, 1 is "cheap" and 2+ is "expensive".
>
> So that wouldn't have made a difference with the old magic number (which 
> was 4) ;-)

Right.  I'm not defending the old code either ;)

>> So either I need to hack MIPS so that 2 is cheap and 3+ is "expensive", 
>> or we need some better way of determining this.
>
> Yeah, actually that we need to determine absolute address cheapness at all 
> is a bit unsatisfying.  We probably could get away with a target 
> independend notion of when to hoist invariant addresses.  For instance, if 
> it has two register references, those registers are only used in this 
> address, then it's always a win to hoist (because it reduces register 
> pressure by one).  Then we obviously have the problem what to do with 
> offsetted addresses, some of them are cheap (small constants), some of 
> them expensive (large constant), but that's target dependend.
>
> I don't really see a good way out here, except sitting down, redefining 
> the target address cost hook (so that it's usable in its two contextes 
> it's used in now, relative cost comparisons and absolute ones), and make 
> all targets follow that.  I chickened out then :-)

How about adding a new hook to say whether it is worth considering an
address for hoisting?  A sensible default might be:

   targetm.address_cost (x, speed) > register_cost

where register_cost is the cached value of:

   targetm.address_cost (y, speed)

for an arbitrary Pmode pseudo register Y.  (I wondered about using
stack_pointer_rtx or hard_frame_pointer_rtx instead, but those registers
aren't necessarily base registers.)  We can make x86 instead use:

   targetm.address_cost (x, speed) > 2

That seems pretty clean to me, since it's the kind of thing a target
ought to control.  In future, we might even want to give the hook more
context than address_cost has (and more context than address_cost can
reasonably have in general).

It also wouldn't mean changing much.

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 14:31                 ` Paolo Bonzini
@ 2009-11-01 10:24                   ` Richard Sandiford
  2009-11-01 10:25                     ` Paolo Bonzini
  2009-11-14 11:19                     ` Richard Sandiford
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Sandiford @ 2009-11-01 10:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Paolo Bonzini <bonzini@gnu.org> writes:
> On 10/31/2009 02:45 PM, Richard Sandiford wrote:
>> I prefer the abstract address costs we have now, and in that situation,
>> Michael's original suggestion of having a target-defined threshold
>> macro seems more natural.  I'm not sure why it got shot down.
>
> I guess it was just "this patch is ready".
>
> I said I considered it premature, but this does not mean I would oppose 
> it in any way, especially after you brought up an argument in favor of 
> the target hook.

OK.

> However, now I'm thinking this: if the address_cost is just meant to be 
> compared *except for this particular occasion*, why not make "<= 0" 
> cheap?  This would of course need fixing the ix86 backend too.

Well, I suppose it's more "this particular occasion _for now_".
If we make 0 be the cheapness threshold for hoisting, we might later
find we have some other threshold we want to model as well.  We'd then
have an inconsistency whereby one magic value is hard-wired to 0 and
another isn't.

And really, given that we're talking about an abstract metric, rather than
specific units, there isn't anything special about 0 vs. 3.

I'd rather stick with nonnegative costs and make the target explicitly
say what it considers cheap for hoisting.  Adding a target hook seems
like the natural thing to do, and I don't think it's something we should
go out of our way to avoid.

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-01 10:24                   ` Richard Sandiford
@ 2009-11-01 10:25                     ` Paolo Bonzini
  2009-11-14 11:19                     ` Richard Sandiford
  1 sibling, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2009-11-01 10:25 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, rdsandiford


>> However, now I'm thinking this: if the address_cost is just meant to be
>> compared *except for this particular occasion*, why not make "<= 0"
>> cheap?  This would of course need fixing the ix86 backend too.
>
> Well, I suppose it's more "this particular occasion _for now_".
> If we make 0 be the cheapness threshold for hoisting, we might later
> find we have some other threshold we want to model as well.  We'd then
> have an inconsistency whereby one magic value is hard-wired to 0 and
> another isn't.
>
> And really, given that we're talking about an abstract metric, rather than
> specific units, there isn't anything special about 0 vs. 3.
 >
> I'd rather stick with nonnegative costs and make the target explicitly
> say what it considers cheap for hoisting.  Adding a target hook seems
> like the natural thing to do, and I don't think it's something we should
> go out of our way to avoid.

I wouldn't complain.  The default you proposed seems sane.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-01 10:24                   ` Richard Sandiford
  2009-11-01 10:25                     ` Paolo Bonzini
@ 2009-11-14 11:19                     ` Richard Sandiford
  2009-11-15 16:50                       ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2009-11-14 11:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Richard Sandiford <rdsandiford@googlemail.com> writes:
> Paolo Bonzini <bonzini@gnu.org> writes:
>> However, now I'm thinking this: if the address_cost is just meant to be 
>> compared *except for this particular occasion*, why not make "<= 0" 
>> cheap?  This would of course need fixing the ix86 backend too.
>
> Well, I suppose it's more "this particular occasion _for now_".
> If we make 0 be the cheapness threshold for hoisting, we might later
> find we have some other threshold we want to model as well.  We'd then
> have an inconsistency whereby one magic value is hard-wired to 0 and
> another isn't.
>
> And really, given that we're talking about an abstract metric, rather than
> specific units, there isn't anything special about 0 vs. 3.
>
> I'd rather stick with nonnegative costs and make the target explicitly
> say what it considers cheap for hoisting.  Adding a target hook seems
> like the natural thing to do, and I don't think it's something we should
> go out of our way to avoid.

So, I finally started writing the patch this morning, and realised
I don't understand why we're handling this in loop-invariant.c rather
than fwprop2.  The motivation for the original cheap_address change:

    http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00341.html

was to make sure that we don't hoist lots of addresses of the form:

        leaq    -8(%r12), %rsi
        leaq    8(%r12), %r10
        leaq    -16(%r12), %r9
        leaq    -24(%r12), %rbx
        leaq    -32(%r12), %rbp
        leaq    -40(%r12), %rdi
        leaq    -48(%r12), %r11
        leaq    40(%r12), %rdx

which is sensible enough.  But if an invariant is only used in places
where it is "cheap", why aren't we allowing fwprop2 to propagate them?
I.e. why don't we let loop-invariant.c act as normal and relax the
fwprop2 condition:

  /* Do not propagate loop invariant definitions inside the loop.  */
  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
    return;

in cases where we think propagating an invariant address is "cheap"?
This feels like a more general fix, since it would also cope with
hoisting done at the tree level or by GCSE.  E.g. with something
as simple as:

  void foo (int n, int *x, int *y)
  {
    while (n-- > 0)
      {
        x[0] = y[0];
        x[1] = y[1];
        x[2] = y[2];
        x[3] = y[3];
        x += 4;
      }
  }

tree PRE will hoist the y+{4,8,12} expressions.  It took a few
iterations to get to a testcase that the loop-invariant.c patch
actually affects.  (FTR, I ended up using:

  float g[4];
  void foo (int n, float *x)
  {
    while (n-- > 0)
      {
        float *y = g;
        x[0] = y[0];
        x[1] = y[1];
        x[2] = y[2];
        x[3] = y[3];
        x += 4;
      }
  }

with -fno-gcse.)

I realise it wouldn't be trivial to implement in fwprop2, since it would
involve making a pan-use check.  That's effectively what loop-invariant.c
is doing now, though, and I think it'd be possible to do the same in fwprop2.

The reason I ask is that my patch started to diverge from what we discussed.
It didn't feel right to be using:

    address_cost (SET_SRC (set), word_mode, ADDR_SPACE_GENERIC, speed)

for two reasons:

  1) The mode and address space are completely arbitrary, even though
     we could tell perfectly well what the right values are for each use.

  2) "n_addr_uses" counts the number of times the register appears
     _within_ an address, not the number of times it is used _as_
     an address.  I.e. n_addr_uses counts nested uses as well as
     top-level uses.  And it doesn't feel right to be using
     address_cost on partial addresses.  (I realise that address_cost
     returns a high cost for invalid addresses, so there are no
     correctness concerns.  But I don't think we're calculating
     a meaningful value in the case where the register is nested
     within a more complicated expression.)

So I ended with the attached patch (which is untested except for a
C bootstrap and the testcase above).  It looks better to me than the
magic constant, but makes the "why not fwprop2?" question loom even
larger.

Richard


Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 154177)
+++ loop-invariant.c	(working copy)
@@ -79,7 +79,6 @@
 {
   rtx *pos;			/* Position of the use.  */
   rtx insn;			/* The insn in that the use occurs.  */
-  unsigned addr_use_p;		/* Whether the use occurs in an address.  */
   struct use *next;		/* Next use in the list.  */
 };
 
@@ -87,11 +86,18 @@
 
 struct def
 {
-  struct use *uses;		/* The list of uses that are uniquely reached
-				   by it.  */
-  unsigned n_uses;		/* Number of such uses.  */
-  unsigned n_addr_uses;		/* Number of uses in addresses.  */
-  unsigned invno;		/* The corresponding invariant.  */
+  /* The list of uses that are uniquely reached by the definition.  */
+  struct use *uses;
+
+  /* The number of such uses.  */
+  unsigned n_uses;
+
+  /* The number of uses that fwprop2 might be able to remove, if we think
+     that would be as cheap as hoisting the invariant.  */
+  unsigned n_cheap_addr_uses;
+
+  /* The corresponding invariant.  */  
+  unsigned invno;
 };
 
 /* The data stored for each invariant.  */
@@ -124,9 +130,6 @@
   /* Whether to move the invariant.  */
   bool move;
 
-  /* Whether the invariant is cheap when used as an address.  */
-  bool cheap_address;
-
   /* Cost of the invariant.  */
   unsigned cost;
 
@@ -703,25 +706,9 @@
   /* If the set is simple, usually by moving it we move the whole store out of
      the loop.  Otherwise we save only cost of the computation.  */
   if (def)
-    {
-      inv->cost = rtx_cost (set, SET, speed);
-      /* ??? Try to determine cheapness of address computation.  Unfortunately
-         the address cost is only a relative measure, we can't really compare
-	 it with any absolute number, but only with other address costs.
-	 But here we don't have any other addresses, so compare with a magic
-	 number anyway.  It has to be large enough to not regress PR33928
-	 (by avoiding to move reg+8,reg+16,reg+24 invariants), but small
-	 enough to not regress 410.bwaves either (by still moving reg+reg
-	 invariants).
-	 See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
-      inv->cheap_address = address_cost (SET_SRC (set), word_mode,
-					 ADDR_SPACE_GENERIC, speed) < 3;
-    }
+    inv->cost = rtx_cost (set, SET, speed);
   else
-    {
-      inv->cost = rtx_cost (SET_SRC (set), SET, speed);
-      inv->cheap_address = false;
-    }
+    inv->cost = rtx_cost (SET_SRC (set), SET, speed);
 
   inv->move = false;
   inv->reg = NULL_RTX;
@@ -746,22 +733,69 @@
   return inv;
 }
 
-/* Record USE at DEF.  */
+/* A for_each_rtx callback for which DATA points to an invariant.
+   Look for a MEM that uses the invariant and decide between
+   two possibilities:
 
+     (1) keeping the address in its current form, with the invariant
+         hoisted out of the loop.
+
+     (2) propagating the invariant's definition into the MEM.
+
+   Return 1 if we find a MEM for which (1) is better than (2).
+
+   *LOC belongs to the first use in the invariant's linked list.  */
+
+static int
+worth_hoisting_address (rtx *loc, void *data)
+{
+  struct invariant *inv;
+  rtx x, insn, set, addr_on_hoist, addr_on_subst;
+  bool speed;
+  addr_space_t as;
+  enum machine_mode mode;
+
+  inv = (struct invariant *) data;
+  x = *loc;
+  if (MEM_P (x))
+    {
+      set = single_set (inv->insn);
+      addr_on_hoist = XEXP (x, 0);
+      addr_on_subst = simplify_replace_rtx (addr_on_hoist,
+					    SET_DEST (set),
+					    SET_SRC (set));
+      if (addr_on_hoist != addr_on_subst)
+	{
+	  insn = inv->def->uses->insn;
+	  speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
+	  as = MEM_ADDR_SPACE (x);
+	  mode = GET_MODE (x);
+	  if (address_cost (addr_on_hoist, mode, as, speed)
+	      < address_cost (addr_on_subst, mode, as, speed))
+	    return 1;
+	}
+      return -1;
+    }
+  return 0;
+}
+
+/* Record USE of INV.  */
+
 static void
-record_use (struct def *def, df_ref use)
+record_use (struct invariant *inv, df_ref use)
 {
   struct use *u = XNEW (struct use);
+  struct def *def = inv->def;
 
   u->pos = DF_REF_REAL_LOC (use);
   u->insn = DF_REF_INSN (use);
-  u->addr_use_p = (DF_REF_TYPE (use) == DF_REF_REG_MEM_LOAD
-		   || DF_REF_TYPE (use) == DF_REF_REG_MEM_STORE);
   u->next = def->uses;
   def->uses = u;
+  if (DF_REF_REG_MEM_P (use)
+      && def->n_uses == def->n_cheap_addr_uses
+      && !for_each_rtx (&PATTERN (u->insn), worth_hoisting_address, inv))
+    def->n_cheap_addr_uses++;
   def->n_uses++;
-  if (u->addr_use_p)
-    def->n_addr_uses++;
 }
 
 /* Finds the invariants USE depends on and store them to the DEPENDS_ON
@@ -908,14 +942,14 @@
       df_ref use = *use_rec;
       inv = invariant_for_use (use);
       if (inv)
-	record_use (inv->def, use);
+	record_use (inv, use);
     }
   for (use_rec = DF_INSN_INFO_EQ_USES (insn_info); *use_rec; use_rec++)
     {
       df_ref use = *use_rec;
       inv = invariant_for_use (use);
       if (inv)
-	record_use (inv->def, use);
+	record_use (inv, use);
     }
 }
 
@@ -1084,8 +1118,7 @@
       regs_needed[cover_class] += nregs;
     }
 
-  if (!inv->cheap_address
-      || inv->def->n_addr_uses < inv->def->n_uses)
+  if (inv->def && inv->def->n_uses != inv->def->n_cheap_addr_uses)
     (*comp_cost) += inv->cost;
 
 #ifdef STACK_REGS

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-14 11:19                     ` Richard Sandiford
@ 2009-11-15 16:50                       ` Paolo Bonzini
  2009-11-15 21:36                         ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2009-11-15 16:50 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Guenther, gcc-patches, rdsandiford

> which is sensible enough.  But if an invariant is only used in places
> where it is "cheap", why aren't we allowing fwprop2 to propagate them?
> I.e. why don't we let loop-invariant.c act as normal and relax the
> fwprop2 condition:
>
>  /* Do not propagate loop invariant definitions inside the loop.  */
>  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
>    return;
>
> in cases where we think propagating an invariant address is "cheap"?
> This feels like a more general fix, since it would also cope with
> hoisting done at the tree level or by GCSE.

This makes sense.  However it is tricky because of PR30907 which
we don't want to pessimize.  Basically the only way to validate these
patches would be to run nullstone. :-(

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-15 16:50                       ` Paolo Bonzini
@ 2009-11-15 21:36                         ` Richard Sandiford
  2009-11-28 10:53                           ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2009-11-15 21:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Paolo Bonzini <bonzini@gnu.org> writes:
>> which is sensible enough.  But if an invariant is only used in places
>> where it is "cheap", why aren't we allowing fwprop2 to propagate them?
>> I.e. why don't we let loop-invariant.c act as normal and relax the
>> fwprop2 condition:
>>
>>  /* Do not propagate loop invariant definitions inside the loop.  */
>>  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
>>    return;
>>
>> in cases where we think propagating an invariant address is "cheap"?
>> This feels like a more general fix, since it would also cope with
>> hoisting done at the tree level or by GCSE.
>
> This makes sense.  However it is tricky because of PR30907 which
> we don't want to pessimize.  Basically the only way to validate these
> patches would be to run nullstone. :-(

Hmm.  Looking at the example in that PR, am I right in thinking that the
x86 backend says that the (%rdi,%rax,4) and 16(%rsp,%rax,4) addresses
have the same cost?  It looks that way from the code, and fwprop
wouldn't have made the substitution otherwise, right?

If so, why do they have equal cost, given that replacing one address
with the other led to such a drastic performance drop?

(I know I'm missing something here, sorry.)

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-10-31 13:53               ` Richard Sandiford
  2009-10-31 14:31                 ` Paolo Bonzini
@ 2009-11-17 18:02                 ` Mark Mitchell
  1 sibling, 0 replies; 24+ messages in thread
From: Mark Mitchell @ 2009-11-17 18:02 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Guenther, gcc-patches, rdsandiford

Richard Sandiford wrote:

>>> But then you should document that address-cost is supposed to be
>>> insn-cost based.  Otherwise comparing apples with oranges doesn't
>>> make sense, even if it might look less magic to you.
>> That's true.
> 
> Yeah.  We'd also have to document that "half an insn" is considered cheap.
> And what does that really mean?  Are we saying that "at most half of an
> insn can be given over to address calculation"?  Seems a bit of a weird
> concept to hard-wire into the compiler.

I'm late to this party, but I agree.

I've been bothered by the COST_N_INSNS costs vs. random-integer costs
discrepancy in the past, and at one point I tracked down some bad
optimization decisions to a case where we comparing between the two.
(Maybe we weren't supposed to be doing that, but we were.)

I think COST_N_INSNS, while an approximation of reality (not all
instructions have the same cost of course), is a plausible framework,
but it seems more general just to use integers for costs, with all such
integers coming from the back end, and the target-independent code
limited to combining those integers using formulas.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-15 21:36                         ` Richard Sandiford
@ 2009-11-28 10:53                           ` Richard Sandiford
  2009-12-01 13:05                             ` Michael Matz
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2009-11-28 10:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches

Richard Sandiford <rdsandiford@googlemail.com> writes:
> Paolo Bonzini <bonzini@gnu.org> writes:
>>> which is sensible enough.  But if an invariant is only used in places
>>> where it is "cheap", why aren't we allowing fwprop2 to propagate them?
>>> I.e. why don't we let loop-invariant.c act as normal and relax the
>>> fwprop2 condition:
>>>
>>>  /* Do not propagate loop invariant definitions inside the loop.  */
>>>  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
>>>    return;
>>>
>>> in cases where we think propagating an invariant address is "cheap"?
>>> This feels like a more general fix, since it would also cope with
>>> hoisting done at the tree level or by GCSE.
>>
>> This makes sense.  However it is tricky because of PR30907 which
>> we don't want to pessimize.  Basically the only way to validate these
>> patches would be to run nullstone. :-(
>
> Hmm.  Looking at the example in that PR, am I right in thinking that the
> x86 backend says that the (%rdi,%rax,4) and 16(%rsp,%rax,4) addresses
> have the same cost?  It looks that way from the code, and fwprop
> wouldn't have made the substitution otherwise, right?
>
> If so, why do they have equal cost, given that replacing one address
> with the other led to such a drastic performance drop?
>
> (I know I'm missing something here, sorry.)

For the record, I decided to leave things be.  I still think the
prototype patch I posted is an improvement on the current:

      /* ??? Try to determine cheapness of address computation.  Unfortunately
         the address cost is only a relative measure, we can't really compare
	 it with any absolute number, but only with other address costs.
	 But here we don't have any other addresses, so compare with a magic
	 number anyway.  It has to be large enough to not regress PR33928
	 (by avoiding to move reg+8,reg+16,reg+24 invariants), but small
	 enough to not regress 410.bwaves either (by still moving reg+reg
	 invariants).
	 See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
      inv->cheap_address = address_cost (SET_SRC (set), word_mode,
					 ADDR_SPACE_GENERIC, speed) < 3;

But without understanding the answer to the question in my earlier mail,
it still seems to me that this whole idea of checking for cheap
addresses in loop-invariant.c is tackling things in the wrong way.
I suppose I'll just to have to live it, and with the code above.

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-11-28 10:53                           ` Richard Sandiford
@ 2009-12-01 13:05                             ` Michael Matz
  2009-12-02 22:24                               ` Richard Sandiford
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Matz @ 2009-12-01 13:05 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Paolo Bonzini, Richard Guenther, gcc-patches

Hi,

On Sat, 28 Nov 2009, Richard Sandiford wrote:

> > Hmm.  Looking at the example in that PR, am I right in thinking that 
> > the x86 backend says that the (%rdi,%rax,4) and 16(%rsp,%rax,4) 
> > addresses have the same cost?  It looks that way from the code, and 
> > fwprop wouldn't have made the substitution otherwise, right?
> >
> > If so, why do they have equal cost, given that replacing one address 
> > with the other led to such a drastic performance drop?
> >
> > (I know I'm missing something here, sorry.)
> 
> But without understanding the answer to the question in my earlier mail,
> it still seems to me that this whole idea of checking for cheap
> addresses in loop-invariant.c is tackling things in the wrong way.

The answer to your questions are: yes, you're right.  It is so to generate 
better code, because as the comment in front of ix86_address_cost says, 
it's better on x86 to generate more complex addresses than to load them 
into registers, hence a constant offset isn't taken into account, neither 
is a scaling.  (To say the truth, on some micro architectures addresses 
that use all components are tougher to the decoder).  In the past we even 
made more complex addresses _cheaper_ than simple one (!), which is, well 
..., wrong ;-) but once generated better code.

Unfortunately these target knobs are more an art than a science, and 
defining them in a logical way derivable from hardware manuals often leads 
to suboptimal code :-/

Having said that, I'm not sure why the answer to this question matters 
to you: neither the PR33928 testcase, nor 410.bwaves was about (reg,reg,4) 
vs. ofs(reg,reg,4).  The former was about ofs(reg) vs ofs2(reg) and the 
latter was about (reg) vs (reg,reg).

FWIW I'd also say that fwprop2 would be a good place to sink invariants 
generally (so that loop-invariant could hoist them as it wants).  But I 
don't see how that will get rid of having to compute an absolute cheapness 
(as fwprop2 would only want to sink the invariantes into cheap places).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-12-01 13:05                             ` Michael Matz
@ 2009-12-02 22:24                               ` Richard Sandiford
  2009-12-03 12:57                                 ` Michael Matz
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2009-12-02 22:24 UTC (permalink / raw)
  To: Michael Matz; +Cc: Paolo Bonzini, Richard Guenther, gcc-patches

Hi Michael,

Michael Matz <matz@suse.de> writes:
>> > Hmm.  Looking at the example in that PR, am I right in thinking that 
>> > the x86 backend says that the (%rdi,%rax,4) and 16(%rsp,%rax,4) 
>> > addresses have the same cost?  It looks that way from the code, and 
>> > fwprop wouldn't have made the substitution otherwise, right?
>> >
>> > If so, why do they have equal cost, given that replacing one address 
>> > with the other led to such a drastic performance drop?
>> >
>> > (I know I'm missing something here, sorry.)
>> 
>> But without understanding the answer to the question in my earlier mail,
>> it still seems to me that this whole idea of checking for cheap
>> addresses in loop-invariant.c is tackling things in the wrong way.
>
> The answer to your questions are: yes, you're right.  It is so to generate 
> better code, because as the comment in front of ix86_address_cost says, 
> it's better on x86 to generate more complex addresses than to load them 
> into registers, hence a constant offset isn't taken into account, neither 
> is a scaling.  (To say the truth, on some micro architectures addresses 
> that use all components are tougher to the decoder).  In the past we even 
> made more complex addresses _cheaper_ than simple one (!), which is, well 
> ..., wrong ;-) but once generated better code.

Ah, thanks, I see now.

> Unfortunately these target knobs are more an art than a science, and 
> defining them in a logical way derivable from hardware manuals often leads 
> to suboptimal code :-/

Agreed.  I've hit that too.  And I realise that to some extent it's
inevitable.

> Having said that, I'm not sure why the answer to this question matters 
> to you: neither the PR33928 testcase, nor 410.bwaves was about (reg,reg,4) 
> vs. ofs(reg,reg,4).  The former was about ofs(reg) vs ofs2(reg) and the 
> latter was about (reg) vs (reg,reg).

Well, the point was that if fwprop2 is allowed to make the substitution,
it would solve both of the problems you list.  And that seems like
conceptually the right fix.  But as Paolo says, allowing fwprop2 to make
the substitution would regress PR30907, because of the costs issue above.
So the answer mattered because I wanted a patch that involved fwprop2 but
that didn't regress 30907.

> FWIW I'd also say that fwprop2 would be a good place to sink invariants 
> generally (so that loop-invariant could hoist them as it wants).  But I 
> don't see how that will get rid of having to compute an absolute cheapness 
> (as fwprop2 would only want to sink the invariantes into cheap places).

But fwprop.c compares "before" and "after" costs, and I was thinking it
should do the same in this case too (if allowed to).  No absolute cost
checks should be needed.

Before 30907, fwprop2 already sank invariants into cheap places.
The problem was that the x86 backend called a substitution cheap
when in fact it wasn't.  It would be nice if

  (a) fwprop could once again propagate invariants into loops and
  (b) when it did so, it could ask for a stricter cost, so that we don't
      inadvertently replace one loop instruction with a more expensive
      loop instruction (as we did in 30907).

In other words, it sounds from your message that the current x86 costs
are mostly geared for two things:

  (1) stopping forms of CSE, hoisting, etc., from introducing unnecessary
      temporaries
  (2) allowing forward propagation within basic blocks of roughly
      the same frequency, to remove unnecessary temporaries

Which is good.  But it might be nice to have a separate mode that's
suitable for forward-propagating values into blocks of higher frequency,
along the lines of (b).  Does that sound reasonable?

[ Having said that, although I could try to write a patch along
  those lines, I'm not sure I could test it very well. ;(  I don't
  have access to things like SPEC. ]

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: RFC patch: invariant addresses too cheap
  2009-12-02 22:24                               ` Richard Sandiford
@ 2009-12-03 12:57                                 ` Michael Matz
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Matz @ 2009-12-03 12:57 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Paolo Bonzini, Richard Guenther, gcc-patches

Hi,

On Wed, 2 Dec 2009, Richard Sandiford wrote:

> > Having said that, I'm not sure why the answer to this question matters 
> > to you: neither the PR33928 testcase, nor 410.bwaves was about (reg,reg,4) 
> > vs. ofs(reg,reg,4).  The former was about ofs(reg) vs ofs2(reg) and the 
> > latter was about (reg) vs (reg,reg).
> 
> Well, the point was that if fwprop2 is allowed to make the substitution,
> it would solve both of the problems you list.  And that seems like
> conceptually the right fix.  But as Paolo says, allowing fwprop2 to make
> the substitution would regress PR30907, because of the costs issue above.

Ah, that was what the ofs(reg,reg,4) was about.  Yeah, according to the 
optimization guide this shouldn't be slower on K8 or K10, but well, 
reality trumps paper :)

> But fwprop.c compares "before" and "after" costs, and I was thinking it 
> should do the same in this case too (if allowed to).  No absolute cost 
> checks should be needed.

Oh, right, of course.

>   (a) fwprop could once again propagate invariants into loops and
>   (b) when it did so, it could ask for a stricter cost, so that we don't
>       inadvertently replace one loop instruction with a more expensive
>       loop instruction (as we did in 30907).
...
> But it might be nice to have a separate mode that's
> suitable for forward-propagating values into blocks of higher frequency,
> along the lines of (b).  Does that sound reasonable?

IMO yes.

> [ Having said that, although I could try to write a patch along
>   those lines, I'm not sure I could test it very well. ;(  I don't
>   have access to things like SPEC. ]

We could test SPEC test patches, but we don't have Nullstone anymore.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-12-03 12:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-19 16:17 RFC patch: invariant addresses too cheap Michael Matz
2009-10-19 16:45 ` Paolo Bonzini
2009-10-20  9:04   ` Richard Guenther
2009-10-20 12:07     ` Paolo Bonzini
2009-10-20 16:00       ` Michael Matz
2009-10-20 16:06         ` Richard Guenther
2009-10-31 10:05       ` Richard Sandiford
2009-10-31 11:14         ` Paolo Bonzini
2009-10-31 11:16           ` Richard Guenther
2009-10-31 12:56             ` Paolo Bonzini
2009-10-31 13:53               ` Richard Sandiford
2009-10-31 14:31                 ` Paolo Bonzini
2009-11-01 10:24                   ` Richard Sandiford
2009-11-01 10:25                     ` Paolo Bonzini
2009-11-14 11:19                     ` Richard Sandiford
2009-11-15 16:50                       ` Paolo Bonzini
2009-11-15 21:36                         ` Richard Sandiford
2009-11-28 10:53                           ` Richard Sandiford
2009-12-01 13:05                             ` Michael Matz
2009-12-02 22:24                               ` Richard Sandiford
2009-12-03 12:57                                 ` Michael Matz
2009-11-17 18:02                 ` Mark Mitchell
2009-10-31 19:23         ` Michael Matz
2009-11-01 10:13           ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).