public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Help with ivopts
@ 2011-07-06 11:35 Richard Sandiford
  2011-07-06 13:10 ` Michael Matz
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 11:35 UTC (permalink / raw)
  To: gcc

Consider:

  void
  loop (unsigned char *restrict ydst,
        unsigned char *restrict udst,
        unsigned char *restrict vdst,
        unsigned char *restrict src,
        int n)
  {
    int i;

    for (i = 0; i < n; i++)
      {
        ydst[2*i+0] = src[4*i+0];
        udst[i] = src[4*i+1];
        ydst[2*i+1] = src[4*i+2];
        vdst[i] = src[4*i+3];
      }
  }

(based on libav's yuy2toyv12 code).  Compiled for ARM with:

  -std=c99 -O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp

the loop gets vectorised.  The code before ivopts looks like this:

<bb 4>:
  vect_p.31_116 = src_10(D);
  vect_p.40_123 = udst_14(D);
  vect_p.44_126 = ydst_6(D);
  vect_p.49_129 = vdst_31(D);
  [...]

<bb 5>:
  # vect_p.28_117 = PHI <vect_p.28_118(10), vect_p.31_116(4)>
  # vect_p.37_124 = PHI <vect_p.37_125(10), vect_p.40_123(4)>
  # vect_p.41_127 = PHI <vect_p.41_128(10), vect_p.44_126(4)>
  # vect_p.46_130 = PHI <vect_p.46_131(10), vect_p.49_129(4)>
  # ivtmp.50_132 = PHI <ivtmp.50_133(10), 0(4)>
  vect_array.32 = LOAD_LANES (MEM[(unsigned char *)vect_p.28_117]);
  vect_var_.33_119 = vect_array.32_I_lsm0.54_47;
  vect_var_.34_120 = vect_array.32_I_lsm0.53_109;
  vect_var_.35_121 = vect_array.32_I_lsm0.52_114;
  vect_var_.36_122 = vect_array.32_I_lsm0.51_113;
  MEM[(unsigned char *)vect_p.37_124] = vect_var_.34_120;
  vect_array.45[0] = vect_var_.33_119;
  vect_array.45[1] = vect_var_.35_121;
  MEM[(unsigned char *)vect_p.41_127] = STORE_LANES (vect_array.45);
  MEM[(unsigned char *)vect_p.46_130] = vect_var_.36_122;
  vect_p.28_118 = vect_p.28_117 + 32;
  vect_p.37_125 = vect_p.37_124 + 8;
  vect_p.41_128 = vect_p.41_127 + 16;
  vect_p.46_131 = vect_p.46_130 + 8;
  ivtmp.50_133 = ivtmp.50_132 + 1;
  if (ivtmp.50_133 < bnd.25_70)
    goto <bb 10>;
  else
    goto <bb 6>;
[...]
<bb 10>:
  goto <bb 5>;

We record these uses:

use 1
  generic
  in statement vect_p.37_124 = PHI <vect_p.37_125(10), vect_p.40_123(4)>

  at position 
  type vector(8) unsigned char * restrict
  base (vector(8) unsigned char * restrict) udst_14(D)
  step 8
  base object (void *) udst_14(D)
  is a biv
  related candidates 
use 3
  generic
  in statement vect_p.46_130 = PHI <vect_p.46_131(10), vect_p.49_129(4)>

  at position 
  type vector(8) unsigned char * restrict
  base (vector(8) unsigned char * restrict) vdst_31(D)
  step 8
  base object (void *) vdst_31(D)
  is a biv
  related candidates 

Note how they are "generic" rather than "address"es.

We also have the candidates:

candidate 5 (important)
  var_before ivtmp.81
  var_after ivtmp.81
  incremented before exit test
  type unsigned int
  base (unsigned int) udst_14(D)
  step 8
  base object (void *) udst_14(D)
[...]
candidate 7 (important)
  original biv
  type unsigned int
  base (unsigned int) udst_14(D)
  step 8
  base object (void *) udst_14(D)
[...]
candidate 13 (important)
  var_before ivtmp.87
  var_after ivtmp.87
  incremented before exit test
  type unsigned int
  base (unsigned int) vdst_31(D)
  step 8
  base object (void *) vdst_31(D)
candidate 14 (important)
  original biv
  type unsigned int
  base (unsigned int) vdst_31(D)
  step 8
  base object (void *) vdst_31(D)

The problem (from an ARM POV) is that we end using candidate 5 for use 3:

<bb 5>:
  [...]
  D.3678_95 = (unsigned int) vdst_31(D);
  D.3679_93 = (unsigned int) udst_14(D);
  D.3680_89 = D.3678_95 - D.3679_93;
  D.3681_87 = D.3680_89 + ivtmp.81_94;
  D.3682_83 = (vector(8) unsigned char * restrict) D.3681_87;
[vdst + i:]
  vect_p.46_130 = D.3682_83;
[udst + i:]
  vect_p.37_124 = (vector(8) unsigned char * restrict) ivtmp.81_94;
  [...]

This is based on the following costs:

Use 3:
  cand	cost	compl.	depends on
  0	8	0	 inv_expr:18
  5	4	0	 inv_expr:19
  6	4	0	 inv_expr:18
  7	4	0	 inv_expr:19
  8	4	1	 inv_expr:20
  13	0	0	
  14	0	0	
  15	4	1	
  16	8	0	 inv_expr:18
  17	8	1	 inv_expr:21

The cost of using candidate 5 for use 3 is calculated as:

  /* use = ubase + ratio * (var - cbase).  If either cbase is a constant
     or ratio == 1, it is better to handle this like

     ubase - ratio * cbase + ratio * var

     (also holds in the case ratio == -1, TODO.  */
  [...]
  else if (ratio == 1)
    {
      tree real_cbase = cbase;

      /* Check to see if any adjustment is needed.  */
      if (cstepi == 0 && stmt_is_after_inc)
        {
          aff_tree real_cbase_aff;
          aff_tree cstep_aff;

          tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
                                   &real_cbase_aff);
          tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);

          aff_combination_add (&real_cbase_aff, &cstep_aff);
          real_cbase = aff_combination_to_tree (&real_cbase_aff);
        }

      cost = difference_cost (data,
			      ubase, real_cbase,
			      &symbol_present, &var_present, &offset,
			      depends_on);
      cost.cost /= avg_loop_niter (data->current_loop);
    }
  [...]
  cost.cost += add_cost (TYPE_MODE (ctype), speed);

The individual difference_cost and add_cost seem reasonable (4 in each case).
I don't understand the reasoning behind the division though.  Is the idea
that this should be hoisted?  If so, then:

(a) That doesn't happen at the tree level.  The subtraction is still inside
    the loop at RTL generation time.

(b) What's the advantage of introducing a new hoisted subtraction that
    is going to be live throughout the loop, and then adding another IV
    to it inside the loop, over using the original IV and incrementing it
    in the normal way?

There is code to stop this happening for uses that are marked as addresses:

  if (address_p)
    {
      /* Do not try to express address of an object with computation based
	 on address of a different object.  This may cause problems in rtl
	 level alias analysis (that does not expect this to be happening,
	 as this is illegal in C), and would be unlikely to be useful
	 anyway.  */
      if (use->iv->base_object
	  && cand->iv->base_object
	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
	return infinite_cost;
    }

but it doesn't trigger for these uses because we treat them as "generic".
That could fixed by, e.g., treating them as addresses, or by using
POINTER_TYPE_P instead of or as well as address_p in the condition above.
But it seems like a slightly odd transformation in any case.

The reason I'm interested is that NEON has auto-increment instructions.
This transformation stops us using them for vdst and (because of
unfortunate instruction ordering) on udst as well.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 11:35 Help with ivopts Richard Sandiford
@ 2011-07-06 13:10 ` Michael Matz
  2011-07-06 13:32   ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Matz @ 2011-07-06 13:10 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Hi,

On Wed, 6 Jul 2011, Richard Sandiford wrote:

> The individual difference_cost and add_cost seem reasonable (4 in each 
> case). I don't understand the reasoning behind the division though.  Is 
> the idea that this should be hoisted?

Yes, it should be hoisted outside the loop.  The difference is between two 
loop-invariant values (the bases), and hence is also loop-invariant.  Some 
tree optimizer should do this already, possibly the casts confuse us.

> If so, then:
> 
> (a) That doesn't happen at the tree level.  The subtraction is still inside
>     the loop at RTL generation time.
> 
> (b) What's the advantage of introducing a new hoisted subtraction that
>     is going to be live throughout the loop, and then adding another IV
>     to it inside the loop, over using the original IV and incrementing it
>     in the normal way?

It can reduce address complexity for one of the addresses.  E.g. given:

 i=0; i < end; i+=4 
   p[i];
   q[i];

-->

 n=p; n < p+end; n+=4
   [n];
   (q-p)[n];

Here (q-p) is loop-invariant, and the complexity of the first address is 
lower (no offset).  In fact the register pressure is lower by one too 
(three instead of four, including the end/p+end bound).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 13:10 ` Michael Matz
@ 2011-07-06 13:32   ` Richard Sandiford
  2011-07-06 13:43     ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 13:32 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Michael Matz <matz@suse.de> writes:
> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>> The individual difference_cost and add_cost seem reasonable (4 in each 
>> case). I don't understand the reasoning behind the division though.  Is 
>> the idea that this should be hoisted?
>
> Yes, it should be hoisted outside the loop.  The difference is between two 
> loop-invariant values (the bases), and hence is also loop-invariant.  Some 
> tree optimizer should do this already, possibly the casts confuse us.

OK, thanks, suspected as much.

>> If so, then:
>> 
>> (a) That doesn't happen at the tree level.  The subtraction is still inside
>>     the loop at RTL generation time.
>> 
>> (b) What's the advantage of introducing a new hoisted subtraction that
>>     is going to be live throughout the loop, and then adding another IV
>>     to it inside the loop, over using the original IV and incrementing it
>>     in the normal way?
>
> It can reduce address complexity for one of the addresses.  E.g. given:
>
>  i=0; i < end; i+=4 
>    p[i];
>    q[i];
>
> -->
>
>  n=p; n < p+end; n+=4
>    [n];
>    (q-p)[n];
>
> Here (q-p) is loop-invariant, and the complexity of the first address is 
> lower (no offset).  In fact the register pressure is lower by one too 
> (three instead of four, including the end/p+end bound).

But your second loop isn't what I was comparing it with.  I was comparing
it with:

n=p; n < p+end; n+=4, m+=4
  [n]
  [m]

That has the same number of registers (3) and the same number of
additions (2).  And the [m] is what we started with, so it was
actually:

  i=0; i<count; i+=1, n+=4, m+=4
    [n]
    [m]

-->

  i=0; i<count; i+=1, n+=4
    [n]
    (q-p)[n]

(we don't get rid of "i" or "count" in this case.

If the target allows (q-p)[n] to be used directly as an address, and if
the target has no post-increment instruction, then it might be better.
But I think it's a loss on other targets.  It might even be a loss on
targets (like PowerPC IIRC), that need base+index addresses to have
the "real" base first.  This sort of transformation seems to make us
lose track of which register is the base.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 13:32   ` Richard Sandiford
@ 2011-07-06 13:43     ` Richard Sandiford
  2011-07-06 14:40       ` Michael Matz
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 13:43 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Michael Matz <matz@suse.de> writes:
>> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>>> If so, then:
>>> 
>>> (a) That doesn't happen at the tree level.  The subtraction is still inside
>>>     the loop at RTL generation time.
>>> 
>>> (b) What's the advantage of introducing a new hoisted subtraction that
>>>     is going to be live throughout the loop, and then adding another IV
>>>     to it inside the loop, over using the original IV and incrementing it
>>>     in the normal way?
>>
>> It can reduce address complexity for one of the addresses.  E.g. given:
>>
>>  i=0; i < end; i+=4 
>>    p[i];
>>    q[i];
>>
>> -->
>>
>>  n=p; n < p+end; n+=4
>>    [n];
>>    (q-p)[n];
>>
>> Here (q-p) is loop-invariant, and the complexity of the first address is 
>> lower (no offset).  In fact the register pressure is lower by one too 
>> (three instead of four, including the end/p+end bound).
>
> But your second loop isn't what I was comparing it with.  I was comparing
> it with:
>
> n=p; n < p+end; n+=4, m+=4
>   [n]
>   [m]
>
> That has the same number of registers (3) and the same number of
> additions (2).  And the [m] is what we started with, so it was
> actually:
>
>   i=0; i<count; i+=1, n+=4, m+=4
>     [n]
>     [m]
>
> -->
>
>   i=0; i<count; i+=1, n+=4
>     [n]
>     (q-p)[n]
>
> (we don't get rid of "i" or "count" in this case.
>
> If the target allows (q-p)[n] to be used directly as an address, and if
> the target has no post-increment instruction, then it might be better.
> But I think it's a loss on other targets.  It might even be a loss on
> targets (like PowerPC IIRC), that need base+index addresses to have
> the "real" base first.  This sort of transformation seems to make us
> lose track of which register is the base.

Actually, I take that back.  Use of (p-q)[n] (hoisted_diff[n]) as an
address is precisely the case in which we've decided we _don't_ want to
apply the optimisation (see the address_p code I quoted).  So I'm not
sure when it's a win even on targets like x86.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 13:43     ` Richard Sandiford
@ 2011-07-06 14:40       ` Michael Matz
  2011-07-06 15:04         ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Matz @ 2011-07-06 14:40 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Hi,

On Wed, 6 Jul 2011, Richard Sandiford wrote:

> > If the target allows (q-p)[n] to be used directly as an address, and 
> > if the target has no post-increment instruction, then it might be 
> > better. But I think it's a loss on other targets.  It might even be a 
> > loss on targets (like PowerPC IIRC), that need base+index addresses to 
> > have the "real" base first.  This sort of transformation seems to make 
> > us lose track of which register is the base.
> 
> Actually, I take that back.  Use of (p-q)[n] (hoisted_diff[n]) as an 
> address is precisely the case in which we've decided we _don't_ want to 
> apply the optimisation (see the address_p code I quoted).

I know, I was referring to the general case of uses of IVs, not 
necessarily in address context, it was just easier to write.  The reason 
for disabling this in address context is not because of having no 
advantage, but rather because accessing objects without a useful base (and 
"&obj1 - &obj2" is no useful one) in the past lead to wrong aliasing 
answers.  Meanwhile it shouldn't lead to wrong answers anymore, but still 
to less precise ones.

I wrote the example only to show the situation in which such seemingly 
strange transformation (building the difference of unrelated entities) is 
actually helpful.

> So I'm not sure when it's a win even on targets like x86.

In my example (and imaging that p[i] actually stands for some arbitrary 
arithmetic involving the sum of p and i, or if we would remove the 
address_p special case) the transformation is universally a win.  In your 
example it isn't a win because the address operations are as simple as 
possible to start with, and become more complex due to the transformation, 
that would point towards a glitch in the cost analysis. It's still getting 
rid of one induction variable, but I'd agree that that's no advantage with 
post-increment targets, and even without postinc the increased complexity 
of the address makes the transformation dubious.  A better transformation 
would be to get rid of the i induction variable.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 14:40       ` Michael Matz
@ 2011-07-06 15:04         ` Richard Sandiford
  2011-07-06 15:17           ` Michael Matz
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 15:04 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Michael Matz <matz@suse.de> writes:
> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>> > If the target allows (q-p)[n] to be used directly as an address, and 
>> > if the target has no post-increment instruction, then it might be 
>> > better. But I think it's a loss on other targets.  It might even be a 
>> > loss on targets (like PowerPC IIRC), that need base+index addresses to 
>> > have the "real" base first.  This sort of transformation seems to make 
>> > us lose track of which register is the base.
>> 
>> Actually, I take that back.  Use of (p-q)[n] (hoisted_diff[n]) as an 
>> address is precisely the case in which we've decided we _don't_ want to 
>> apply the optimisation (see the address_p code I quoted).
>
> I know, I was referring to the general case of uses of IVs, not 
> necessarily in address context, it was just easier to write.  The reason 
> for disabling this in address context is not because of having no 
> advantage, but rather because accessing objects without a useful base (and 
> "&obj1 - &obj2" is no useful one) in the past lead to wrong aliasing 
> answers.

Right.

> I wrote the example only to show the situation in which such seemingly 
> strange transformation (building the difference of unrelated entities) is 
> actually helpful.

But what I mean is: even with your starting loop, I'm comparing the
transformation that this code does with the alternative, but rejected,
transformation of simply treating both addresses as separate ivs.  I.e.:

  i=0; i < end; i+=1
    p + i * step;
    q + i * step;
-->
  n=p; n < p+end; n+=step
    n;
    (q-p) + n;

vs.

  i=0; i < end; i+=1 
    p + i * step;
    q + i * step;
-->
  n=p; n < p+end; n+=step, m+=step
    n;
    m;

It seems like, with this extra code, we're going out of our way to do
the first, "clever", transformation, instead of doing the second,
even though both seem to have the same cost in terms of loop operations
and live registers.  So what I'm not sure of is when the first transformation
is a win over the second.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 15:04         ` Richard Sandiford
@ 2011-07-06 15:17           ` Michael Matz
  2011-07-06 15:34             ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Matz @ 2011-07-06 15:17 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Hi,

On Wed, 6 Jul 2011, Richard Sandiford wrote:

> But what I mean is: even with your starting loop, I'm comparing the
> transformation that this code does with the alternative, but rejected,
> transformation of simply treating both addresses as separate ivs.  I.e.:
> 
>   i=0; i < end; i+=1
>     p + i * step;
>     q + i * step;
> -->
>   n=p; n < p+end; n+=step
>     n;
>     (q-p) + n;
> 
> vs.
> 
>   i=0; i < end; i+=1 
>     p + i * step;
>     q + i * step;
> -->
>   n=p; n < p+end; n+=step, m+=step
>     n;
>     m;
> 
> It seems like, with this extra code, we're going out of our way to do 
> the first, "clever", transformation, instead of doing the second, even 
> though both seem to have the same cost in terms of loop operations and 
> live registers.  So what I'm not sure of is when the first 
> transformation is a win over the second.

It's only a strict win on targets where the addition in "(q-p) + n" can be 
hidden in either address generation, or combined with other arithmetic, or 
on all targets if (q-p) is a constant.
Otherwise it merely has the same number of adds and live variables.  But 
if it weren't for deficiencies in downstream optimizers (not hoisting the 
subtraction) the first variant is at least not worse than the second on 
targets without autoinc.  On targets with autoinc obviously the second 
variant is better (if the autoinc really comes for free).

So, sometimes the first is better, sometimes just the same, sometimes 
worse :-)  Probably the cost function in ivopts could use some 
improvements taking at least autoinc into account.  The valid address 
forms (i.e. if reg+reg is as cheap as reg) should be taken into account 
already.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 15:17           ` Michael Matz
@ 2011-07-06 15:34             ` Richard Sandiford
  2011-07-06 15:56               ` Michael Matz
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 15:34 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Michael Matz <matz@suse.de> writes:
> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>> But what I mean is: even with your starting loop, I'm comparing the
>> transformation that this code does with the alternative, but rejected,
>> transformation of simply treating both addresses as separate ivs.  I.e.:
>> 
>>   i=0; i < end; i+=1
>>     p + i * step;
>>     q + i * step;
>> -->
>>   n=p; n < p+end; n+=step
>>     n;
>>     (q-p) + n;
>> 
>> vs.
>> 
>>   i=0; i < end; i+=1 
>>     p + i * step;
>>     q + i * step;
>> -->
>>   n=p; n < p+end; n+=step, m+=step
>>     n;
>>     m;
>> 
>> It seems like, with this extra code, we're going out of our way to do 
>> the first, "clever", transformation, instead of doing the second, even 
>> though both seem to have the same cost in terms of loop operations and 
>> live registers.  So what I'm not sure of is when the first 
>> transformation is a win over the second.
>
> It's only a strict win on targets where the addition in "(q-p) + n" can be 
> hidden in either address generation, or combined with other arithmetic, or 
> on all targets if (q-p) is a constant.

Agreed on the constant thing.  But is it really valid to account for
address validity in the !address_p case?  I.e.

> Otherwise it merely has the same number of adds and live variables.  But 
> if it weren't for deficiencies in downstream optimizers (not hoisting the 
> subtraction) the first variant is at least not worse than the second on 
> targets without autoinc.  On targets with autoinc obviously the second 
> variant is better (if the autoinc really comes for free).
>
> So, sometimes the first is better, sometimes just the same, sometimes 
> worse :-)  Probably the cost function in ivopts could use some 
> improvements taking at least autoinc into account.  The valid address 
> forms (i.e. if reg+reg is as cheap as reg) should be taken into account 
> already.

...we currently only check for things like reg+reg in the address_p case,
which as discussed, is also the case in which we currently don't apply
this transformation for other reasons.  The address_p case is also the
one in which we take auto-increment into account.

That goes back to the question in the original message about whether
these uses should be treated as addresses.  I suppose the answer's
probably "yes", since only a cast is getting in the way.

But there's still the separate point that, when not considering
addresses, this transformation doesn't seem to be a win, except
in the constant case.  I suppose what I'm saying is that the:

      if (use->iv->base_object
	  && cand->iv->base_object
	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
	return infinite_cost;

condition seems to make sense in the !address_p case too.  It shouldn't
make things worse, and may make things better.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 15:34             ` Richard Sandiford
@ 2011-07-06 15:56               ` Michael Matz
  2011-07-06 17:55                 ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Matz @ 2011-07-06 15:56 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc

Hi,

On Wed, 6 Jul 2011, Richard Sandiford wrote:

> > It's only a strict win on targets where the addition in "(q-p) + n" 
> > can be hidden in either address generation, or combined with other 
> > arithmetic, or on all targets if (q-p) is a constant.
> 
> Agreed on the constant thing.  But is it really valid to account for 
> address validity in the !address_p case?

Well, some targets can hide this addition also in other instructions not 
necessarily only for accessing memory (e.g. the lea instruction on x86).
That has to be described in the cost function.

> > So, sometimes the first is better, sometimes just the same, sometimes 
> > worse :-)  Probably the cost function in ivopts could use some 
> > improvements taking at least autoinc into account.  The valid address 
> > forms (i.e. if reg+reg is as cheap as reg) should be taken into account 
> > already.
> 
> ...we currently only check for things like reg+reg in the address_p case,
> which as discussed, is also the case in which we currently don't apply
> this transformation for other reasons.  The address_p case is also the
> one in which we take auto-increment into account.
> 
> That goes back to the question in the original message about whether
> these uses should be treated as addresses.  I suppose the answer's
> probably "yes", since only a cast is getting in the way.

Yes, I think for your example this should be regarded as addresses.

> But there's still the separate point that, when not considering
> addresses, this transformation doesn't seem to be a win, except
> in the constant case.

My first example shows that on some targets it can be a win, also in the 
non-constant case, saving one IV update.  That is the case if the use of 
IVb can be replaced by IVb+somereg for "free".  Be that addresses or not.

> I suppose what I'm saying is that the:
> 
>       if (use->iv->base_object
> 	  && cand->iv->base_object
> 	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
> 	return infinite_cost;
> 
> condition seems to make sense in the !address_p case too.  It shouldn't
> make things worse, and may make things better.

It would make things worse for the above mentioned targets.  I actually 
think this whole special casing of address_p just hides problems 
elsewhere, namely in the alising machinery (as I hinted, this might 
actually be solved meanwhile), and certainly in the cost functions of 
IVopts.  If it's really better to not express a certain use of IVb via 
((base_b-base_a)+IVa), then the cost function should say so, not some 
after-the-fact hackery rejecting this transformation a posteriori.

If this rejection should still be needed for correctness for the aliasing 
machinery then it should be limited to that one purpose: avoiding wrong 
code.  It should not be used to avoid generating worse code on some 
targets.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with ivopts
  2011-07-06 15:56               ` Michael Matz
@ 2011-07-06 17:55                 ` Richard Sandiford
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2011-07-06 17:55 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Sandiford, gcc

Michael Matz <matz@suse.de> writes:
> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>> But there's still the separate point that, when not considering
>> addresses, this transformation doesn't seem to be a win, except
>> in the constant case.
>
> My first example shows that on some targets it can be a win, also in the 
> non-constant case, saving one IV update.  That is the case if the use of 
> IVb can be replaced by IVb+somereg for "free".  Be that addresses or not.
>
>> I suppose what I'm saying is that the:
>> 
>>       if (use->iv->base_object
>> 	  && cand->iv->base_object
>> 	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
>> 	return infinite_cost;
>> 
>> condition seems to make sense in the !address_p case too.  It shouldn't
>> make things worse, and may make things better.
>
> It would make things worse for the above mentioned targets.  I actually 
> think this whole special casing of address_p just hides problems 
> elsewhere, namely in the alising machinery (as I hinted, this might 
> actually be solved meanwhile), and certainly in the cost functions of 
> IVopts.  If it's really better to not express a certain use of IVb via 
> ((base_b-base_a)+IVa), then the cost function should say so, not some 
> after-the-fact hackery rejecting this transformation a posteriori.
>
> If this rejection should still be needed for correctness for the aliasing 
> machinery then it should be limited to that one purpose: avoiding wrong 
> code.  It should not be used to avoid generating worse code on some 
> targets.

OK, I suppose I'd better just live with things as they are then.
My main motivation for getting the uses to be considered as addresses
is precisely to trigger the address_p code that you'd like to remove,
so it wouldn't be a permanent fix.

With that code removed, I think we'll still hit the problem that,
all other things being equal (as they appear to be in this case),
we prefer to reuse ivs rather than create new ones.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-06 17:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-06 11:35 Help with ivopts Richard Sandiford
2011-07-06 13:10 ` Michael Matz
2011-07-06 13:32   ` Richard Sandiford
2011-07-06 13:43     ` Richard Sandiford
2011-07-06 14:40       ` Michael Matz
2011-07-06 15:04         ` Richard Sandiford
2011-07-06 15:17           ` Michael Matz
2011-07-06 15:34             ` Richard Sandiford
2011-07-06 15:56               ` Michael Matz
2011-07-06 17:55                 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).