* [patch] Partial fix for PR 18048
@ 2004-10-24 13:16 Zdenek Dvorak
2004-10-24 17:30 ` Andrew Pinski
2004-10-25 18:15 ` Zack Weinberg
0 siblings, 2 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-24 13:16 UTC (permalink / raw)
To: gcc-patches
Hello,
this patch contains several improvements to ivopts and related
optimizations that improve the code produced for mgrid benchmark quite a
bit (although even with the patch, ivopts still spoil the code).
The improvements are:
1) Ivopts produce expressions like
*(&a[start] + 8 * index)
and believe that the memory access created for it will include the
multiplication by 8. This fails if there are several similar
accesses, since dom will CSE the "8 * index" part (thus adding
unnecessary multiplication instruction, and increasing register
pressure).
The part of the patch in the fold-const.c ensures that such an
expression is folded to a[start + index], thus achieving what ivopts
intended. This solution unfortunately is not generic enough to catch
all cases when something like this may happen, and it does not work
on LP64 platforms where this optimization might be incorrect, but it
seems to be the only possibility within the current framework. I
will start a discussion about a more appropriate solution for 4.1 in
a separate thread.
2) The algorithm used for finding the best set of bivs for the loop is
quite primitive (and cannot be much better due to compile time
constraints), and as such it is likely to get stuck in local
minimums. Which happens in mgrid, causing ivopts to create about
30 bivs...
The patch to try_add_cand_for makes the initial set of candidates
be much smaller (based primarily on the important candidates).
The idea is that getting stuck in local minimum with too many bivs
is a disaster, but if it happens with a few bivs, it just means that
the iv uses cannot be expressed much cheaper than they are, so we
are not that far from optimum anyway.
3) A small improvement in number_of_iterations_cond.
build_int_cst (type, -1) is not suitable for producing unsigned
constant with all bits one, since the bits outside of the precision
of the type are set to 1 as well, which makes fold to fail to
perform some of the optimizations with such constants.
Bootstrapped & regtested on ppc, i686 and x86_64. Benchmark results
from i686 (-O2, base without the patch, peak with) (quick summary --
minor variations except for the huge gain on mgrid, on average the
results improve even if mgrid is not taken into account):
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 209 670 * 1400 211 663 *
175.vpr 1400 390 359 * 1400 395 354 *
176.gcc 1100 -- X 1100 -- X
181.mcf 1800 733 245 * 1800 728 247 *
186.crafty 1000 124 804 * 1000 123 814 *
197.parser 1800 401 449 * 1800 400 450 *
252.eon 1300 -- X 1300 -- X
253.perlbmk 1800 208 865 * 1800 205 879 *
254.gap 1100 179 616 * 1100 176 627 *
255.vortex 1900 231 822 * 1900 236 805 *
256.bzip2 1500 351 427 * 1500 351 427 *
300.twolf 3000 799 376 * 3000 800 375 *
168.wupwise 1600 295 543 * 1600 286 559 *
171.swim 3100 720 430 * 3100 725 428 *
172.mgrid 1800 574 314 * 1800 504 357 *
173.applu 2100 528 398 * 2100 520 404 *
177.mesa 1400 207 675 * 1400 207 675 *
178.galgel 2900 -- X 2900 -- X
179.art 2600 1458 178 * 2600 1463 178 *
183.equake 1300 294 442 * 1300 294 442 *
187.facerec 1900 473 401 * 1900 467 406 *
188.ammp 2200 615 357 * 2200 609 361 *
189.lucas 2000 436 458 * 2000 437 458 *
191.fma3d 2100 379 554 * 2100 382 549 *
200.sixtrack 1100 270 408 * 1100 265 415 *
301.apsi 2600 671 387 * 2600 665 391 *
Zdenek
PR tree-optimization/18048
* fold-const.c (try_move_mult_to_index): New function.
(fold): Use try_move_mult_to_index.
* tree-ssa-loop-ivopts.c (try_add_cand_for): Prefer common candidates.
* tree-ssa-loop-niter.c (number_of_iterations_cond): Produce
an all-ones unsigned constant without extra bits.
Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.468
diff -c -3 -p -r1.468 fold-const.c
*** fold-const.c 11 Oct 2004 03:42:00 -0000 1.468
--- fold-const.c 23 Oct 2004 15:13:53 -0000
*************** tree_swap_operands_p (tree arg0, tree ar
*** 5967,5972 ****
--- 5967,6050 ----
return 0;
}
+ /* Tries to replace &a[idx] CODE s * delta with &a[idx CODE delta], if s is
+ step of the array. TYPE is the type of the expression. ADDR is the address.
+ MULT is the multiplicative expression. If the function succeeds, the new
+ address expression is returned. Otherwise NULL_TREE is returned. */
+
+ static tree
+ try_move_mult_to_index (tree type, enum tree_code code, tree addr, tree mult)
+ {
+ tree s, delta, step;
+ tree arg0 = TREE_OPERAND (mult, 0), arg1 = TREE_OPERAND (mult, 1);
+ tree ref = TREE_OPERAND (addr, 0), pref;
+ tree ret, pos;
+ tree itype;
+
+ STRIP_NOPS (arg0);
+ STRIP_NOPS (arg1);
+
+ if (TREE_CODE (arg0) == INTEGER_CST)
+ {
+ s = arg0;
+ delta = arg1;
+ }
+ else if (TREE_CODE (arg1) == INTEGER_CST)
+ {
+ s = arg1;
+ delta = arg0;
+ }
+ else
+ return NULL_TREE;
+
+ for (;; ref = TREE_OPERAND (ref, 0))
+ {
+ if (TREE_CODE (ref) == ARRAY_REF)
+ {
+ step = array_ref_element_size (ref);
+
+ if (TREE_CODE (step) != INTEGER_CST)
+ continue;
+
+ itype = TREE_TYPE (step);
+
+ /* If the type sizes do not match, we might run into problems
+ when one of them would overflow. */
+ if (TYPE_PRECISION (itype) != TYPE_PRECISION (type))
+ continue;
+
+ if (!operand_equal_p (step, fold_convert (itype, s), 0))
+ continue;
+
+ delta = fold_convert (itype, delta);
+ break;
+ }
+
+ if (!handled_component_p (ref))
+ return NULL_TREE;
+ }
+
+ /* We found the suitable array reference. So copy everything up to it,
+ and replace the index. */
+
+ pref = TREE_OPERAND (addr, 0);
+ ret = copy_node (pref);
+ pos = ret;
+
+ while (pref != ref)
+ {
+ pref = TREE_OPERAND (pref, 0);
+ TREE_OPERAND (pos, 0) = copy_node (pref);
+ pos = TREE_OPERAND (pos, 0);
+ }
+
+ TREE_OPERAND (pos, 1) = fold (build2 (code, itype,
+ TREE_OPERAND (pos, 1),
+ delta));
+
+ return build1 (ADDR_EXPR, type, ret);
+ }
+
/* Perform constant folding and related simplification of EXPR.
The related simplifications include x*1 => x, x*0 => 0, etc.,
and application of the associative law.
*************** fold (tree expr)
*** 6602,6607 ****
--- 6680,6703 ----
alt0, alt1)),
same));
}
+
+ /* Try replacing &a[i1] + c * i2 with &a[i1 + i2], if c is step
+ of the array. Loop optimizer sometimes produce this type of
+ expressions. */
+ if (TREE_CODE (arg0) == ADDR_EXPR
+ && TREE_CODE (arg1) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, PLUS_EXPR, arg0, arg1);
+ if (tem)
+ return fold (tem);
+ }
+ else if (TREE_CODE (arg1) == ADDR_EXPR
+ && TREE_CODE (arg0) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, PLUS_EXPR, arg1, arg0);
+ if (tem)
+ return fold (tem);
+ }
}
else
{
*************** fold (tree expr)
*** 6974,6979 ****
--- 7070,7086 ----
&diff))
return build_int_cst_type (type, diff);
}
+
+ /* Try replacing &a[i1] - c * i2 with &a[i1 - i2], if c is step
+ of the array. Loop optimizer sometimes produce this type of
+ expressions. */
+ if (TREE_CODE (arg0) == ADDR_EXPR
+ && TREE_CODE (arg1) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, MINUS_EXPR, arg0, arg1);
+ if (tem)
+ return fold (tem);
+ }
if (TREE_CODE (arg0) == MULT_EXPR
&& TREE_CODE (arg1) == MULT_EXPR
Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.20
diff -c -3 -p -r2.20 tree-ssa-loop-ivopts.c
*** tree-ssa-loop-ivopts.c 18 Oct 2004 18:01:09 -0000 2.20
--- tree-ssa-loop-ivopts.c 23 Oct 2004 15:13:53 -0000
*************** try_add_cand_for (struct ivopts_data *da
*** 3603,3622 ****
bitmap act_inv = BITMAP_XMALLOC ();
unsigned i;
struct cost_pair *cp;
bitmap_copy (best_ivs, ivs);
bitmap_copy (best_inv, inv);
! for (i = 0; i < use->n_map_members; i++)
{
! cp = use->cost_map + i;
! if (cp->cost == INFTY)
continue;
bitmap_copy (act_ivs, ivs);
! bitmap_set_bit (act_ivs, cp->cand->id);
! if (cp->depends_on)
! bitmap_a_or_b (act_inv, inv, cp->depends_on);
else
bitmap_copy (act_inv, inv);
act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
--- 3603,3634 ----
bitmap act_inv = BITMAP_XMALLOC ();
unsigned i;
struct cost_pair *cp;
+ bitmap_iterator bi;
+ struct iv_cand *cand;
+ bitmap depends_on;
bitmap_copy (best_ivs, ivs);
bitmap_copy (best_inv, inv);
! /* First try important candidates. Only if it fails, try the specific ones.
! Rationale -- in loops with many variables the best choice often is to use
! just one generic biv. If we added here many ivs specific to the uses,
! the optimization algorithm later would be likely to get stuck in a local
! minimum, thus causing us to create too many ivs. The approach from
! few ivs to more seems more likely to be succesful -- starting from few
! ivs, replacing an expensive use by a specific iv should always be a
! win. */
! EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
{
! cand = iv_cand (data, i);
!
! if (get_use_iv_cost (data, use, cand, &depends_on) == INFTY)
continue;
bitmap_copy (act_ivs, ivs);
! bitmap_set_bit (act_ivs, cand->id);
! if (depends_on)
! bitmap_a_or_b (act_inv, inv, depends_on);
else
bitmap_copy (act_inv, inv);
act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
*************** try_add_cand_for (struct ivopts_data *da
*** 3629,3634 ****
--- 3641,3675 ----
}
}
+ if (best_cost == INFTY)
+ {
+ for (i = 0; i < use->n_map_members; i++)
+ {
+ cp = use->cost_map + i;
+ if (cp->cost == INFTY)
+ continue;
+
+ /* Already tried this. */
+ if (cp->cand->important)
+ continue;
+
+ bitmap_copy (act_ivs, ivs);
+ bitmap_set_bit (act_ivs, cp->cand->id);
+ if (cp->depends_on)
+ bitmap_a_or_b (act_inv, inv, cp->depends_on);
+ else
+ bitmap_copy (act_inv, inv);
+ act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
+
+ if (act_cost < best_cost)
+ {
+ best_cost = act_cost;
+ bitmap_copy (best_ivs, act_ivs);
+ bitmap_copy (best_inv, act_inv);
+ }
+ }
+ }
+
bitmap_copy (ivs, best_ivs);
bitmap_copy (inv, best_inv);
Index: tree-ssa-loop-niter.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-niter.c,v
retrieving revision 2.13
diff -c -3 -p -r2.13 tree-ssa-loop-niter.c
*** tree-ssa-loop-niter.c 22 Oct 2004 17:05:08 -0000 2.13
--- tree-ssa-loop-niter.c 23 Oct 2004 15:13:53 -0000
*************** number_of_iterations_cond (tree type, tr
*** 419,427 ****
d = EXEC_BINARY (LSHIFT_EXPR, niter_type,
build_int_cst_type (niter_type, 1), bits);
s = EXEC_BINARY (RSHIFT_EXPR, niter_type, step0, bits);
! bound = EXEC_BINARY (RSHIFT_EXPR, niter_type,
! build_int_cst (niter_type, -1),
! bits);
assumption = fold (build2 (FLOOR_MOD_EXPR, niter_type, base1, d));
assumption = fold (build2 (EQ_EXPR, boolean_type_node,
--- 419,427 ----
d = EXEC_BINARY (LSHIFT_EXPR, niter_type,
build_int_cst_type (niter_type, 1), bits);
s = EXEC_BINARY (RSHIFT_EXPR, niter_type, step0, bits);
! bound = EXEC_UNARY (BIT_NOT_EXPR, niter_type,
! build_int_cst_type (niter_type, 0));
! bound = EXEC_BINARY (RSHIFT_EXPR, niter_type, bound, bits);
assumption = fold (build2 (FLOOR_MOD_EXPR, niter_type, base1, d));
assumption = fold (build2 (EQ_EXPR, boolean_type_node,
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-24 13:16 [patch] Partial fix for PR 18048 Zdenek Dvorak
@ 2004-10-24 17:30 ` Andrew Pinski
2004-10-24 17:57 ` Zdenek Dvorak
2004-10-25 18:15 ` Zack Weinberg
1 sibling, 1 reply; 14+ messages in thread
From: Andrew Pinski @ 2004-10-24 17:30 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: gcc-patches
On Oct 24, 2004, at 5:41 AM, Zdenek Dvorak wrote:
> Hello,
>
> this patch contains several improvements to ivopts and related
> optimizations that improve the code produced for mgrid benchmark quite
> a
> bit (although even with the patch, ivopts still spoil the code).
>
> The improvements are:
>
> 1) Ivopts produce expressions like
>
> *(&a[start] + 8 * index)
>
> and believe that the memory access created for it will include the
> multiplication by 8. This fails if there are several similar
> accesses, since dom will CSE the "8 * index" part (thus adding
> unnecessary multiplication instruction, and increasing register
> pressure).
Isn't this already done by fold_stmt?
-- Pinski
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-24 17:30 ` Andrew Pinski
@ 2004-10-24 17:57 ` Zdenek Dvorak
0 siblings, 0 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-24 17:57 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches
Hello,
> >this patch contains several improvements to ivopts and related
> >optimizations that improve the code produced for mgrid benchmark quite
> >a
> >bit (although even with the patch, ivopts still spoil the code).
> >
> >The improvements are:
> >
> >1) Ivopts produce expressions like
> >
> > *(&a[start] + 8 * index)
> >
> > and believe that the memory access created for it will include the
> > multiplication by 8. This fails if there are several similar
> > accesses, since dom will CSE the "8 * index" part (thus adding
> > unnecessary multiplication instruction, and increasing register
> > pressure).
>
> Isn't this already done by fold_stmt?
no, as far as I can tell. The only thing that seems relevant
is maybe_fold_offset_to_array_ref, which however only handles
constant offsets (and from some reason I do not quite understand
these optimizations are not included in fold, so they cannot be used
easily anyway).
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-24 13:16 [patch] Partial fix for PR 18048 Zdenek Dvorak
2004-10-24 17:30 ` Andrew Pinski
@ 2004-10-25 18:15 ` Zack Weinberg
2004-10-25 22:55 ` Zdenek Dvorak
2004-10-27 15:38 ` Zdenek Dvorak
1 sibling, 2 replies; 14+ messages in thread
From: Zack Weinberg @ 2004-10-25 18:15 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: gcc-patches
Zdenek Dvorak <rakdver@atrey.karlin.mff.cuni.cz> writes:
> 3) A small improvement in number_of_iterations_cond.
> build_int_cst (type, -1) is not suitable for producing unsigned
> constant with all bits one, since the bits outside of the precision
> of the type are set to 1 as well, which makes fold to fail to
> perform some of the optimizations with such constants.
...
> ! bound = EXEC_UNARY (BIT_NOT_EXPR, niter_type,
> ! build_int_cst_type (niter_type, 0));
> ! bound = EXEC_BINARY (RSHIFT_EXPR, niter_type, bound, bits);
While this is a good conceptual change, this is not the appropriate
way to do it. You are creating trees just to get fold() to do
arithmetic for you, which wastes memory, and you risk having overflow
bits set on the constants, which will also inhibit optimization.
It would be nice if force_fit_type could be used for this purpose, but
it also creates junk trees and sets overflow bits. The most efficient
way to code this logic is like so:
/* Construct an INTEGER_CST with type niter_type, whose low
(TYPE_PRECISION (niter_type) - bits) bits are all 1 and whose
remaining bits are all 0. */
{
unsigned int precision = TYPE_PRECISION (niter_type) - bits;
unsigned HOST_WIDE_INT blow;
HOST_WIDE_INT bhigh;
if (precision < HOST_BITS_PER_WIDE_INT)
{
bhigh = 0;
blow = (((unsigned HOST_WIDE_INT) 1) << precision) - 1;
}
else
{
precision -= HOST_BITS_PER_WIDE_INT;
blow = ~ (unsigned HOST_WIDE_INT) 0;
bhigh = (((unsigned HOST_WIDE_INT) 1) << precision) - 1;
}
bound = build_int_cst_wide (niter_type, blow, bhigh);
}
This replaces both the statements quoted from your patch. It would be
a good idea to do similar things to all calculations where you know
that a call to fold() will always spit out an INTEGER_CST. The
*_double functions defined in fold-const.c are frequently useful for
this.
(There may be an off-by-one error in the above, I'm not sure.)
(Needing to construct a CONST_INT for a mask value is a common thing,
and the code is tricky. Maybe we should have
extern tree tree_int_cst_{low,high}_mask (tree type, unsigned int bits);
primitives which produce masks filled from the low or the high end of
the constant.)
zw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-25 18:15 ` Zack Weinberg
@ 2004-10-25 22:55 ` Zdenek Dvorak
2004-10-25 23:01 ` Zack Weinberg
2004-10-27 15:38 ` Zdenek Dvorak
1 sibling, 1 reply; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-25 22:55 UTC (permalink / raw)
To: Zack Weinberg; +Cc: gcc-patches
Hello,
> > 3) A small improvement in number_of_iterations_cond.
> > build_int_cst (type, -1) is not suitable for producing unsigned
> > constant with all bits one, since the bits outside of the precision
> > of the type are set to 1 as well, which makes fold to fail to
> > perform some of the optimizations with such constants.
> ...
>
> > ! bound = EXEC_UNARY (BIT_NOT_EXPR, niter_type,
> > ! build_int_cst_type (niter_type, 0));
> > ! bound = EXEC_BINARY (RSHIFT_EXPR, niter_type, bound, bits);
>
> While this is a good conceptual change, this is not the appropriate
> way to do it. You are creating trees just to get fold() to do
> arithmetic for you, which wastes memory,
no, I am not. 0 is a shared constant, so no allocation for
build_int_cst_type (niter_type, 0)), and EXEC_UNARY is
nondestructive_fold_unary_to_constant that also does not build
any trees (except for the result, of course).
> and you risk having overflow bits set on the constants, which will
> also inhibit optimization.
Evaluation of BIT_NOT_EXPR should not set any overflow bits (if it does,
it should be fixed).
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-25 22:55 ` Zdenek Dvorak
@ 2004-10-25 23:01 ` Zack Weinberg
2004-10-25 23:02 ` Zdenek Dvorak
2004-10-25 23:10 ` Zdenek Dvorak
0 siblings, 2 replies; 14+ messages in thread
From: Zack Weinberg @ 2004-10-25 23:01 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: gcc-patches
Zdenek Dvorak <rakdver@atrey.karlin.mff.cuni.cz> writes:
>> While this is a good conceptual change, this is not the appropriate
>> way to do it. You are creating trees just to get fold() to do
>> arithmetic for you, which wastes memory,
>
> no, I am not. 0 is a shared constant, so no allocation for
> build_int_cst_type (niter_type, 0)), and EXEC_UNARY is
> nondestructive_fold_unary_to_constant that also does not build any
> trees (except for the result, of course).
That's nice. You're still doing unnecessary work.
>> and you risk having overflow bits set on the constants, which will
>> also inhibit optimization.
>
> Evaluation of BIT_NOT_EXPR should not set any overflow bits (if it does,
> it should be fixed).
Wouldn't you rather not have to deal with the problem at all?
zw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-25 23:01 ` Zack Weinberg
@ 2004-10-25 23:02 ` Zdenek Dvorak
2004-10-25 23:10 ` Zdenek Dvorak
1 sibling, 0 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-25 23:02 UTC (permalink / raw)
To: Zack Weinberg; +Cc: gcc-patches
Hello,
> >> While this is a good conceptual change, this is not the appropriate
> >> way to do it. You are creating trees just to get fold() to do
> >> arithmetic for you, which wastes memory,
> >
> > no, I am not. 0 is a shared constant, so no allocation for
> > build_int_cst_type (niter_type, 0)), and EXEC_UNARY is
> > nondestructive_fold_unary_to_constant that also does not build any
> > trees (except for the result, of course).
>
> That's nice. You're still doing unnecessary work.
yes, however creating duplicated code just for purpose of saving a very
small amount of time on place that clearly is not critical does not
seem very useful to me. I.e. I would rather stick with the simple 2
lines solution that is a bit suboptimal, rather than to implement your
20-lines one.
Of course if you want to do the change, you won't spoil anything by it.
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-25 23:01 ` Zack Weinberg
2004-10-25 23:02 ` Zdenek Dvorak
@ 2004-10-25 23:10 ` Zdenek Dvorak
1 sibling, 0 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-25 23:10 UTC (permalink / raw)
To: Zack Weinberg; +Cc: gcc-patches
Hello,
> >> and you risk having overflow bits set on the constants, which will
> >> also inhibit optimization.
> >
> > Evaluation of BIT_NOT_EXPR should not set any overflow bits (if it does,
> > it should be fixed).
>
> Wouldn't you rather not have to deal with the problem at all?
no. If the code as it is sets overflow flags, it is clearly buggy and
has to be fixed anyway.
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-25 18:15 ` Zack Weinberg
2004-10-25 22:55 ` Zdenek Dvorak
@ 2004-10-27 15:38 ` Zdenek Dvorak
2004-10-27 16:22 ` Andrew Pinski
2004-10-27 19:00 ` Richard Henderson
1 sibling, 2 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-27 15:38 UTC (permalink / raw)
To: Zack Weinberg; +Cc: gcc-patches
Hello,
here is the patch with the Zack's comment reflected. Bootstrapped &
regtested on x86_64.
Zdenek
PR tree-optimization/18048
* fold-const.c (try_move_mult_to_index): New function.
(fold): Use try_move_mult_to_index.
* tree-ssa-loop-ivopts.c (try_add_cand_for): Prefer common candidates.
* tree-ssa-loop-niter.c (number_of_iterations_cond): Produce
an all-ones unsigned constant without extra bits.
* tree.c (build_low_bits_mask): New function.
* tree.h (build_low_bits_mask): Declare.
Index: fold-const.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/fold-const.c,v
retrieving revision 1.468
diff -c -3 -p -r1.468 fold-const.c
*** fold-const.c 11 Oct 2004 03:42:00 -0000 1.468
--- fold-const.c 27 Oct 2004 11:36:47 -0000
*************** tree_swap_operands_p (tree arg0, tree ar
*** 5967,5972 ****
--- 5967,6050 ----
return 0;
}
+ /* Tries to replace &a[idx] CODE s * delta with &a[idx CODE delta], if s is
+ step of the array. TYPE is the type of the expression. ADDR is the address.
+ MULT is the multiplicative expression. If the function succeeds, the new
+ address expression is returned. Otherwise NULL_TREE is returned. */
+
+ static tree
+ try_move_mult_to_index (tree type, enum tree_code code, tree addr, tree mult)
+ {
+ tree s, delta, step;
+ tree arg0 = TREE_OPERAND (mult, 0), arg1 = TREE_OPERAND (mult, 1);
+ tree ref = TREE_OPERAND (addr, 0), pref;
+ tree ret, pos;
+ tree itype;
+
+ STRIP_NOPS (arg0);
+ STRIP_NOPS (arg1);
+
+ if (TREE_CODE (arg0) == INTEGER_CST)
+ {
+ s = arg0;
+ delta = arg1;
+ }
+ else if (TREE_CODE (arg1) == INTEGER_CST)
+ {
+ s = arg1;
+ delta = arg0;
+ }
+ else
+ return NULL_TREE;
+
+ for (;; ref = TREE_OPERAND (ref, 0))
+ {
+ if (TREE_CODE (ref) == ARRAY_REF)
+ {
+ step = array_ref_element_size (ref);
+
+ if (TREE_CODE (step) != INTEGER_CST)
+ continue;
+
+ itype = TREE_TYPE (step);
+
+ /* If the type sizes do not match, we might run into problems
+ when one of them would overflow. */
+ if (TYPE_PRECISION (itype) != TYPE_PRECISION (type))
+ continue;
+
+ if (!operand_equal_p (step, fold_convert (itype, s), 0))
+ continue;
+
+ delta = fold_convert (itype, delta);
+ break;
+ }
+
+ if (!handled_component_p (ref))
+ return NULL_TREE;
+ }
+
+ /* We found the suitable array reference. So copy everything up to it,
+ and replace the index. */
+
+ pref = TREE_OPERAND (addr, 0);
+ ret = copy_node (pref);
+ pos = ret;
+
+ while (pref != ref)
+ {
+ pref = TREE_OPERAND (pref, 0);
+ TREE_OPERAND (pos, 0) = copy_node (pref);
+ pos = TREE_OPERAND (pos, 0);
+ }
+
+ TREE_OPERAND (pos, 1) = fold (build2 (code, itype,
+ TREE_OPERAND (pos, 1),
+ delta));
+
+ return build1 (ADDR_EXPR, type, ret);
+ }
+
/* Perform constant folding and related simplification of EXPR.
The related simplifications include x*1 => x, x*0 => 0, etc.,
and application of the associative law.
*************** fold (tree expr)
*** 6602,6607 ****
--- 6680,6703 ----
alt0, alt1)),
same));
}
+
+ /* Try replacing &a[i1] + c * i2 with &a[i1 + i2], if c is step
+ of the array. Loop optimizer sometimes produce this type of
+ expressions. */
+ if (TREE_CODE (arg0) == ADDR_EXPR
+ && TREE_CODE (arg1) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, PLUS_EXPR, arg0, arg1);
+ if (tem)
+ return fold (tem);
+ }
+ else if (TREE_CODE (arg1) == ADDR_EXPR
+ && TREE_CODE (arg0) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, PLUS_EXPR, arg1, arg0);
+ if (tem)
+ return fold (tem);
+ }
}
else
{
*************** fold (tree expr)
*** 6974,6979 ****
--- 7070,7086 ----
&diff))
return build_int_cst_type (type, diff);
}
+
+ /* Try replacing &a[i1] - c * i2 with &a[i1 - i2], if c is step
+ of the array. Loop optimizer sometimes produce this type of
+ expressions. */
+ if (TREE_CODE (arg0) == ADDR_EXPR
+ && TREE_CODE (arg1) == MULT_EXPR)
+ {
+ tem = try_move_mult_to_index (type, MINUS_EXPR, arg0, arg1);
+ if (tem)
+ return fold (tem);
+ }
if (TREE_CODE (arg0) == MULT_EXPR
&& TREE_CODE (arg1) == MULT_EXPR
Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.20
diff -c -3 -p -r2.20 tree-ssa-loop-ivopts.c
*** tree-ssa-loop-ivopts.c 18 Oct 2004 18:01:09 -0000 2.20
--- tree-ssa-loop-ivopts.c 27 Oct 2004 11:36:47 -0000
*************** try_add_cand_for (struct ivopts_data *da
*** 3603,3622 ****
bitmap act_inv = BITMAP_XMALLOC ();
unsigned i;
struct cost_pair *cp;
bitmap_copy (best_ivs, ivs);
bitmap_copy (best_inv, inv);
! for (i = 0; i < use->n_map_members; i++)
{
! cp = use->cost_map + i;
! if (cp->cost == INFTY)
continue;
bitmap_copy (act_ivs, ivs);
! bitmap_set_bit (act_ivs, cp->cand->id);
! if (cp->depends_on)
! bitmap_a_or_b (act_inv, inv, cp->depends_on);
else
bitmap_copy (act_inv, inv);
act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
--- 3603,3634 ----
bitmap act_inv = BITMAP_XMALLOC ();
unsigned i;
struct cost_pair *cp;
+ bitmap_iterator bi;
+ struct iv_cand *cand;
+ bitmap depends_on;
bitmap_copy (best_ivs, ivs);
bitmap_copy (best_inv, inv);
! /* First try important candidates. Only if it fails, try the specific ones.
! Rationale -- in loops with many variables the best choice often is to use
! just one generic biv. If we added here many ivs specific to the uses,
! the optimization algorithm later would be likely to get stuck in a local
! minimum, thus causing us to create too many ivs. The approach from
! few ivs to more seems more likely to be succesful -- starting from few
! ivs, replacing an expensive use by a specific iv should always be a
! win. */
! EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
{
! cand = iv_cand (data, i);
!
! if (get_use_iv_cost (data, use, cand, &depends_on) == INFTY)
continue;
bitmap_copy (act_ivs, ivs);
! bitmap_set_bit (act_ivs, cand->id);
! if (depends_on)
! bitmap_a_or_b (act_inv, inv, depends_on);
else
bitmap_copy (act_inv, inv);
act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
*************** try_add_cand_for (struct ivopts_data *da
*** 3629,3634 ****
--- 3641,3675 ----
}
}
+ if (best_cost == INFTY)
+ {
+ for (i = 0; i < use->n_map_members; i++)
+ {
+ cp = use->cost_map + i;
+ if (cp->cost == INFTY)
+ continue;
+
+ /* Already tried this. */
+ if (cp->cand->important)
+ continue;
+
+ bitmap_copy (act_ivs, ivs);
+ bitmap_set_bit (act_ivs, cp->cand->id);
+ if (cp->depends_on)
+ bitmap_a_or_b (act_inv, inv, cp->depends_on);
+ else
+ bitmap_copy (act_inv, inv);
+ act_cost = set_cost_up_to (data, act_ivs, act_inv, use->id + 1);
+
+ if (act_cost < best_cost)
+ {
+ best_cost = act_cost;
+ bitmap_copy (best_ivs, act_ivs);
+ bitmap_copy (best_inv, act_inv);
+ }
+ }
+ }
+
bitmap_copy (ivs, best_ivs);
bitmap_copy (inv, best_inv);
Index: tree-ssa-loop-niter.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-niter.c,v
retrieving revision 2.13
diff -c -3 -p -r2.13 tree-ssa-loop-niter.c
*** tree-ssa-loop-niter.c 22 Oct 2004 17:05:08 -0000 2.13
--- tree-ssa-loop-niter.c 27 Oct 2004 11:36:47 -0000
*************** number_of_iterations_cond (tree type, tr
*** 419,427 ****
d = EXEC_BINARY (LSHIFT_EXPR, niter_type,
build_int_cst_type (niter_type, 1), bits);
s = EXEC_BINARY (RSHIFT_EXPR, niter_type, step0, bits);
! bound = EXEC_BINARY (RSHIFT_EXPR, niter_type,
! build_int_cst (niter_type, -1),
! bits);
assumption = fold (build2 (FLOOR_MOD_EXPR, niter_type, base1, d));
assumption = fold (build2 (EQ_EXPR, boolean_type_node,
--- 419,428 ----
d = EXEC_BINARY (LSHIFT_EXPR, niter_type,
build_int_cst_type (niter_type, 1), bits);
s = EXEC_BINARY (RSHIFT_EXPR, niter_type, step0, bits);
!
! bound = build_low_bits_mask (niter_type,
! (TYPE_PRECISION (niter_type)
! - tree_low_cst (bits, 1)));
assumption = fold (build2 (FLOOR_MOD_EXPR, niter_type, base1, d));
assumption = fold (build2 (EQ_EXPR, boolean_type_node,
Index: tree.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.c,v
retrieving revision 1.440
diff -c -3 -p -r1.440 tree.c
*** tree.c 25 Oct 2004 13:27:25 -0000 1.440
--- tree.c 27 Oct 2004 11:36:47 -0000
*************** build_int_cst_wide (tree type, unsigned
*** 609,614 ****
--- 609,648 ----
return t;
}
+ /* Builds an integer constant in TYPE such that lowest BITS bits are ones
+ and the rest are zeros. */
+
+ tree
+ build_low_bits_mask (tree type, unsigned bits)
+ {
+ unsigned HOST_WIDE_INT low;
+ HOST_WIDE_INT high;
+ unsigned HOST_WIDE_INT all_ones = ~(unsigned HOST_WIDE_INT) 0;
+
+ gcc_assert (bits <= TYPE_PRECISION (type));
+
+ if (bits == TYPE_PRECISION (type)
+ && !TYPE_UNSIGNED (type))
+ {
+ /* Sign extended all-ones mask. */
+ low = all_ones;
+ high = -1;
+ }
+ else if (bits <= HOST_BITS_PER_WIDE_INT)
+ {
+ low = all_ones >> (HOST_BITS_PER_WIDE_INT - bits);
+ high = 0;
+ }
+ else
+ {
+ bits -= HOST_BITS_PER_WIDE_INT;
+ low = all_ones;
+ high = all_ones >> (HOST_BITS_PER_WIDE_INT - bits);
+ }
+
+ return build_int_cst_wide (type, low, high);
+ }
+
/* Checks that X is integer constant that can be expressed in (unsigned)
HOST_WIDE_INT without loss of precision. */
Index: tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree.h,v
retrieving revision 1.640
diff -c -3 -p -r1.640 tree.h
*** tree.h 22 Oct 2004 18:52:19 -0000 1.640
--- tree.h 27 Oct 2004 11:36:47 -0000
*************** tree fold_build_cleanup_point_expr (tree
*** 3543,3548 ****
--- 3543,3549 ----
extern tree build_fold_addr_expr_with_type (tree, tree);
extern tree build_fold_indirect_ref (tree);
extern tree constant_boolean_node (int, tree);
+ extern tree build_low_bits_mask (tree, unsigned);
extern bool tree_swap_operands_p (tree, tree, bool);
extern enum tree_code swap_tree_comparison (enum tree_code);
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-27 15:38 ` Zdenek Dvorak
@ 2004-10-27 16:22 ` Andrew Pinski
2004-10-27 16:24 ` Zdenek Dvorak
2004-10-27 19:00 ` Richard Henderson
1 sibling, 1 reply; 14+ messages in thread
From: Andrew Pinski @ 2004-10-27 16:22 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: gcc-patches, Zack Weinberg
On Oct 27, 2004, at 11:22 AM, Zdenek Dvorak wrote:
> Hello,
>
> here is the patch with the Zack's comment reflected. Bootstrapped &
> regtested on x86_64.
This should also help Java code where we get:
<L1>:;
i.3 = (unsigned int) i;
if ((unsigned int) D.376 > i.3) goto <L5>; else goto <L2>;
<L2>:;
D.385 = _Jv_ThrowBadArrayIndex (i);
<L5>:;
*(&a->data[0] + (int *) i.3 * 4B) = 0;
;
i = (int) (i.3 + 1);
D.376 = a->length;
if (i >= D.376) goto <L8>; else goto <L1>;
<L8>:;
But the real question is why is &a->data[0] not pulled out of the loop
in the first place.
-- Pinski
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-27 16:22 ` Andrew Pinski
@ 2004-10-27 16:24 ` Zdenek Dvorak
2004-10-27 17:13 ` Dale Johannesen
0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-27 16:24 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches, Zack Weinberg
Hello,
> >here is the patch with the Zack's comment reflected. Bootstrapped &
> >regtested on x86_64.
>
> This should also help Java code where we get:
> <L1>:;
> i.3 = (unsigned int) i;
> if ((unsigned int) D.376 > i.3) goto <L5>; else goto <L2>;
>
> <L2>:;
> D.385 = _Jv_ThrowBadArrayIndex (i);
>
> <L5>:;
> *(&a->data[0] + (int *) i.3 * 4B) = 0;
>
> ;
> i = (int) (i.3 + 1);
> D.376 = a->length;
> if (i >= D.376) goto <L8>; else goto <L1>;
>
> <L8>:;
>
> But the real question is why is &a->data[0] not pulled out of the loop
> in the first place.
&a->data[0] is a constant (symbol reference + offset), so moving it out of loop would
very likely just cause the register pressure to increase.
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-27 16:24 ` Zdenek Dvorak
@ 2004-10-27 17:13 ` Dale Johannesen
2004-10-27 20:31 ` Zdenek Dvorak
0 siblings, 1 reply; 14+ messages in thread
From: Dale Johannesen @ 2004-10-27 17:13 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: gcc-patches, Dale Johannesen, Andrew Pinski, Zack Weinberg
On Oct 27, 2004, at 9:03 AM, Zdenek Dvorak wrote:
>> But the real question is why is &a->data[0] not pulled out of the loop
>> in the first place.
>
> &a->data[0] is a constant (symbol reference + offset), so moving it
> out of loop would
> very likely just cause the register pressure to increase.
Of course it increases the register pressure, but it is still
beneficial to hoist in
some cases. It should be considered as a hoisting candidate. (mgrid
is not
the only specmark where not enough candidates are being hoisted on ppc;
just the easiest to analyze, as it spends 99% of its time in one loop.
I could
report others as PRs but my expectation is that fixing mgrid will fix
most of
the rest as well.)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-27 15:38 ` Zdenek Dvorak
2004-10-27 16:22 ` Andrew Pinski
@ 2004-10-27 19:00 ` Richard Henderson
1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2004-10-27 19:00 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: Zack Weinberg, gcc-patches
On Wed, Oct 27, 2004 at 05:22:25PM +0200, Zdenek Dvorak wrote:
> PR tree-optimization/18048
> * fold-const.c (try_move_mult_to_index): New function.
> (fold): Use try_move_mult_to_index.
> * tree-ssa-loop-ivopts.c (try_add_cand_for): Prefer common candidates.
> * tree-ssa-loop-niter.c (number_of_iterations_cond): Produce
> an all-ones unsigned constant without extra bits.
> * tree.c (build_low_bits_mask): New function.
> * tree.h (build_low_bits_mask): Declare.
Ok.
r~
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch] Partial fix for PR 18048
2004-10-27 17:13 ` Dale Johannesen
@ 2004-10-27 20:31 ` Zdenek Dvorak
0 siblings, 0 replies; 14+ messages in thread
From: Zdenek Dvorak @ 2004-10-27 20:31 UTC (permalink / raw)
To: Dale Johannesen; +Cc: gcc-patches, Andrew Pinski, Zack Weinberg
Hello,
> >>But the real question is why is &a->data[0] not pulled out of the loop
> >>in the first place.
> >
> >&a->data[0] is a constant (symbol reference + offset), so moving it
> >out of loop would
> >very likely just cause the register pressure to increase.
>
> Of course it increases the register pressure, but it is still
> beneficial to hoist in
> some cases. It should be considered as a hoisting candidate. (mgrid
> is not
> the only specmark where not enough candidates are being hoisted on ppc;
> just the easiest to analyze, as it spends 99% of its time in one loop.
> I could
> report others as PRs but my expectation is that fixing mgrid will fix
> most of
> the rest as well.)
definitely there are cases where hoisting them is useful; I am not
really sure why rtl loop optimizer does not catch this.
Zdenek
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-10-27 20:25 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-24 13:16 [patch] Partial fix for PR 18048 Zdenek Dvorak
2004-10-24 17:30 ` Andrew Pinski
2004-10-24 17:57 ` Zdenek Dvorak
2004-10-25 18:15 ` Zack Weinberg
2004-10-25 22:55 ` Zdenek Dvorak
2004-10-25 23:01 ` Zack Weinberg
2004-10-25 23:02 ` Zdenek Dvorak
2004-10-25 23:10 ` Zdenek Dvorak
2004-10-27 15:38 ` Zdenek Dvorak
2004-10-27 16:22 ` Andrew Pinski
2004-10-27 16:24 ` Zdenek Dvorak
2004-10-27 17:13 ` Dale Johannesen
2004-10-27 20:31 ` Zdenek Dvorak
2004-10-27 19:00 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).