From: "Kewen.Lin" <linkw@linux.ibm.com>
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>,
Segher Boessenkool <segher@kernel.crashing.org>,
Bill Schmidt <wschmidt@linux.ibm.com>,
Richard Guenther <rguenther@suse.de>
Subject: Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts
Date: Thu, 22 Aug 2019 09:16:00 -0000 [thread overview]
Message-ID: <43a3b3c1-9c41-82fa-a5db-1a7a1a5ceae1@linux.ibm.com> (raw)
In-Reply-To: <CAHFci28q5sK2KYEyNy7MgtpJ8f42itH4kKOFvnjHr1Pb8a_8QQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 10743 bytes --]
Hi Bin,
on 2019/8/22 ä¸å1:46, Bin.Cheng wrote:
> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Bin,
>>
>> Thanks for your time!
>>
>> on 2019/8/21 ä¸å8:32, Bin.Cheng wrote:
>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> Comparing to the previous versions of implementation mainly based on the
>>>> existing IV cands but zeroing the related group/use cost, this new one is based
>>>> on Richard and Segher's suggestion introducing one doloop dedicated IV cand.
>>>>
>>>> Some key points are listed below:
>>>> 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand.
>>>> 2) Special name "doloop" assigned.
>>>> 3) Doloop IV cand with form (niter+1, +, -1)
>>>> 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step.
>>>> 5) Support may_be_zero (regressed PR is in this case), the base of doloop IV
>>>> can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
>>>> 6) Add more expr support in force_expr_to_var_cost for reasonable cost
>>>> calculation on the IV base with may_be_zero (like COND_EXPR).
>>>> 7) Set zero cost when using doloop IV cand for doloop use.
>>>> 8) Add three hooks (should we merge _generic and _address?).
>>>> *) have_count_reg_decr_p, is to indicate the target has special hardware
>>>> count register, we shouldn't consider the impact of doloop IV when
>>>> calculating register pressures.
>>>> *) doloop_cost_for_generic, is the extra cost when using doloop IV cand for
>>>> generic type IV use.
>>>> *) doloop_cost_for_address, is the extra cost when using doloop IV cand for
>>>> address type IV use.
>>> What will happen if doloop IV cand be used for generic/address type iv
>>> use? Can RTL doloop can still perform doloop optimization in this
>>> case?
>>>
>>
>> On Power, we put the iteration count into hardware count register, it takes very
>> high cost to move the count to GPR, so the cost is set as INF to make it impossible
>> to use it for generic/address type iv use. But as some discussion before, on some
>> targets using GPR instead of hardware count register, they probably want to use this
>> doloop iv used for other uses if profitable. These two hooks offer the possibility.
>> In that case, I think RTL doloop can still perform since it can still get the
>> pattern and transform. The generic/address uses can still use it.
>>>>
>>>> Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting
>>>> for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is tracked
>>>> by PR89983.
>>>>
>>>> Any comments and suggestions are highly appreciated. Thanks!
>>> Not sure if I understand the patch correctly, some comments embedded.
>>>
>>> + /* The number of doloop candidate in the set. */
>>> + unsigned n_doloop_cands;
>>> +
>>> This is unnecessary. See below comments.
>>>
>>> - add_candidate_1 (data, base, step, important,
>>> - IP_NORMAL, use, NULL, orig_iv);
>>> + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop,
>>> + orig_iv);
>>> if (ip_end_pos (data->current_loop)
>>> && allow_ip_end_pos_p (data->current_loop))
>>> - add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv);
>>> + add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop,
>>> + orig_iv);
>>> Do we need to skip ip_end_pos case for doloop candidate? Because the
>>> candidate increment will be inserted in latch, i.e, increment position
>>> is after exit condition.
>>>
>>
>> Yes, we should skip it. Currently function find_doloop_use has the check on an
>> empty latch and gimple_cond to latch, partially excluding it. But it's still good
>> to guard it directly here.
>>
>>> - tree_to_aff_combination (iv->base, type, val);
>>> + tree base = iv->base;
>>> + /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to extract
>>> + the value under !may_be_zero to get the compact bound which also well fits
>>> + for may_be_zero since we ensure the value for it is const one. */
>>> + if (cand->doloop_p && desc->may_be_zero && !integer_zerop
>>> (desc->may_be_zero))
>>> + base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
>>> + unshare_expr (rewrite_to_non_trapping_overflow (niter)),
>>> + build_int_cst (TREE_TYPE (niter), 1));
>>> + tree_to_aff_combination (base, type, val);
>>> I don't quite follow here. The iv->base is computed from niter, I
>>> suppose compact bound is for cheaper candidate initialization? Why
>>> it's possible to extract !may_be_zero niter for may_be_zero here? The
>>> niter under !may_be_zero has no indication about the real niter under
>>> may_be_zero.
>>>
>>
>> As you note below, the cand_value for doloop would be zero, but for the case
>> may_be_zero set, the current calculation would take care of the whole niter
>> expression including the cond_expr introduced by may_be_zero check, it's
>> unexpected. The purpose is to use the value under condition !may_be_zero
>> for the calculation, and yes, to get expected zero finally.
>>
>>> - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
>>> + cand_value_at (loop, cand, use->stmt, desc, &bnd);
>>> If I understand correctly, doloop use/cand will only be
>>> identified/added for single exit loop, and there will be only one
>>> cond(doloop) iv_use and only one doloop cand for doloop loop. So the
>>> cand_value at niter at use position would be 0. If that's the case,
>>> we can skip calling cand_value_at here for doloop cand. The change to
>>> cand_value_at would be unnecessary neither.
>>>
>>
>> Exactly, I'll add the early return with zero bound for doloop.
>>
>>> - expensive. */
>>> - if (!integer_zerop (desc->may_be_zero))
>>> + expensive.
>>> +
>>> + For doloop candidate, we have considered MAY_BE_ZERO for IV base, need to
>>> + support MAY_BE_ZERO ? 0 : NITER, so simply bypass this check. */
>>> + if (!integer_zerop (desc->may_be_zero) && !cand->doloop_p)
>>> return iv_elimination_compare_lt (data, cand, comp, desc);
>>> And we can early return before this?
>>>
>>
>> OK.
>>
>>> + if (may_be_zero)
>>> + {
>>> + if (COMPARISON_CLASS_P (may_be_zero))
>>> + {
>>> + niter = fold_build3 (COND_EXPR, ntype, may_be_zero,
>>> + build_int_cst (ntype, 0),
>>> + rewrite_to_non_trapping_overflow (niter));
>>> + }
>>> + /* Don't try to obtain the iteration count expression when may_be_zero is
>>> + integer_nonzerop (actually iteration count is one) or else. */
>>> + else
>>> + return;
>>> + }
>>> +
>>> + tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
>>> + build_int_cst (ntype, 1));
>>> niter is the number of latch executions, so niter + 1 could wrap here,
>>> but guess it's not a problem the similar issue is not handled in
>>> vectorizer neither.
>>>
>>
>> OK.
>>
>>> + unsigned n_old = data->regs_used, n_spr_for_doloop = 0;
>>> + /* If target supports count register for doloop, it doesn't take GPR. */
>>> + if (targetm.have_count_reg_decr_p)
>>> + n_spr_for_doloop = n_doloop_cands;
>>> + unsigned n_new = n_invs + n_cands - n_spr_for_doloop;
>>> Not necessary. See below.
>>
>>> - cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands);
>>> + cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands,
>>> + ivs->n_doloop_cands);
>>> Also.
>>>
>>> ivs->n_cands--;
>>> + if (cp->cand->doloop_p)
>>> + ivs->n_doloop_cands--;
>>>
>>> ivs->n_cands++;
>>> + if (cp->cand->doloop_p)
>>> + ivs->n_doloop_cands++;
>>> You can just book n_cands under condition !cp->cand->doloop_p.
>>
>> If my understanding is correct, you are suggesting the code like:
>>
>> if (!cp->cand->doloop_p)
>> ivs->n_cands++;
>>
>> But I'm afraid that it can NOT satisfy the need in function
>> ivopts_estimate_reg_pressure. As the comments, "if target supports
>> count register for doloop it doesn't take GPR.". If we make doloop
>> cand invisible in n_cands, it's fine for target with count register,
>> but we may miss to count them on targets without count register.
> Why not one more step do checks:
> if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p)
> ivs->n_cands++;
>
Yes, it works. Thanks!
The new patch addressing the comments is attached.
Could you please have a look again? Thanks in advance!
Kewen
---------
gcc/ChangeLog
2019-08-22 Kewen Lin <linkw@gcc.gnu.org>
PR middle-end/80791
* config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* target.def (have_count_reg_decr_p): New hook.
(doloop_cost_for_generic): Likewise.
(doloop_cost_for_address): Likewise.
* doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise.
(TARGET_DOLOOP_COST_FOR_GENERIC): Likewise.
(TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise.
* doc/tm.texi: Regenerate.
* tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost
addend.
(record_group): Init doloop_p.
(add_candidate_1): Add optional argument doloop, change the handlings
accordingly.
(add_candidate): Likewise.
(add_iv_candidate_for_biv): Update the call to add_candidate.
(generic_predict_doloop_p): Update attribute.
(force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/
LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/
UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/
MIN_EXPR.
(determine_group_iv_cost_generic): Update for doloop IV cand.
(determine_group_iv_cost_address): Likewise.
(determine_group_iv_cost_cond): Likewise.
(determine_iv_cost): Likewise.
(ivopts_estimate_reg_pressure): Likewise.
(may_eliminate_iv): Likewise.
(add_iv_candidate_for_doloop): New function.
(find_iv_candidates): Call function add_iv_candidate_for_doloop.
(iv_ca_set_no_cp): Update for doloop IV cand.
(iv_ca_set_cp): Likewise.
(iv_ca_dump): Dump register cost.
(find_doloop_use): New function.
(predict_and_process_doloop): Likewise.
(tree_ssa_iv_optimize_loop): Call function predict_and_process_doloop.
gcc/testsuite/ChangeLog
2019-08-22 Kewen Lin <linkw@gcc.gnu.org>
PR middle-end/80791
* gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change.
* gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
* gcc.dg/tree-ssa/pr32044.c: Likewise.
[-- Attachment #2: doloop_dedicated_iv2.diff --]
[-- Type: text/plain, Size: 21639 bytes --]
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6667cd0..5eccbdc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1912,6 +1912,16 @@ static const struct attribute_spec rs6000_attribute_table[] =
#undef TARGET_PREDICT_DOLOOP_P
#define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
+#undef TARGET_HAVE_COUNT_REG_DECR_P
+#define TARGET_HAVE_COUNT_REG_DECR_P true
+
+/* 1000000000 is infinite cost in IVOPTs. */
+#undef TARGET_DOLOOP_COST_FOR_GENERIC
+#define TARGET_DOLOOP_COST_FOR_GENERIC 1000000000
+
+#undef TARGET_DOLOOP_COST_FOR_ADDRESS
+#define TARGET_DOLOOP_COST_FOR_ADDRESS 1000000000
+
#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c2aa4d0..9f3a08a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11618,6 +11618,29 @@ loops, and will help ivopts to make some decisions.
The default version of this hook returns false.
@end deftypefn
+@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P
+Return true if the target supports hardware count register for decrement
+and branch. This count register can't be used as general register since
+moving to/from a general register from/to it is very expensive.
+The default value is false.
+@end deftypevr
+
+@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_GENERIC
+IVOPTs introduces one doloop dedicated IV candidate, this hook offers
+ target owner a way to adjust cost when selecting doloop IV candidate for a
+ generic IV use. At calcuation, this value will be added on normal cost
+ already calculated by current implementation.
+The default value is zero.
+@end deftypevr
+
+@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_ADDRESS
+IVOPTs introduces one doloop dedicated IV candidate, this hook offers
+ target owner a way to adjust cost when selecting doloop IV candidate for an
+ address IV use. At calcuation, this value will be added on normal cost
+ already calculated by current implementation.
+The default value is zero.
+@end deftypevr
+
@deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top})
Return true if it is possible to use low-overhead loops (@code{doloop_end}
and @code{doloop_begin}) for a particular loop. @var{iterations} gives the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b4d57b8..4346773 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7946,6 +7946,12 @@ to by @var{ce_info}.
@hook TARGET_PREDICT_DOLOOP_P
+@hook TARGET_HAVE_COUNT_REG_DECR_P
+
+@hook TARGET_DOLOOP_COST_FOR_GENERIC
+
+@hook TARGET_DOLOOP_COST_FOR_ADDRESS
+
@hook TARGET_CAN_USE_DOLOOP_P
@hook TARGET_INVALID_WITHIN_DOLOOP
diff --git a/gcc/target.def b/gcc/target.def
index 71b6972..69e2844 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4246,6 +4246,32 @@ The default version of this hook returns false.",
bool, (struct loop *loop),
default_predict_doloop_p)
+DEFHOOKPOD
+(have_count_reg_decr_p,
+ "Return true if the target supports hardware count register for decrement\n\
+and branch. This count register can't be used as general register since\n\
+moving to/from a general register from/to it is very expensive.\n\
+The default value is false.",
+ bool, false)
+
+DEFHOOKPOD
+(doloop_cost_for_generic,
+ "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\
+ target owner a way to adjust cost when selecting doloop IV candidate for a\n\
+ generic IV use. At calcuation, this value will be added on normal cost\n\
+ already calculated by current implementation.\n\
+The default value is zero.",
+ int64_t, 0)
+
+DEFHOOKPOD
+(doloop_cost_for_address,
+ "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\
+ target owner a way to adjust cost when selecting doloop IV candidate for an\n\
+ address IV use. At calcuation, this value will be added on normal cost\n\
+ already calculated by current implementation.\n\
+The default value is zero.",
+ int64_t, 0)
+
DEFHOOK
(can_use_doloop_p,
"Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c
index 214e6a7..ce4b1d0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c
@@ -10,4 +10,6 @@ int main (void)
f2 ();
}
-/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" } } */
+/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" { target { ! powerpc*-*-* } } } } */
+/* More debug information emitted for doloop on powerpc. */
+/* { dg-final { scan-tree-dump-times "!= 0" 6 "ivopts" { target { powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
index 7d5859b..71d7f67 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
@@ -17,6 +17,7 @@ f1 (char *p, uintptr_t i, uintptr_t n)
while (i < n);
}
-/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */
-/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts"} } */
-/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" } } */
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "PHI" 2 "ivopts" { target { powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" { target { ! powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c b/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c
index 8a8977a..06c27b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c
@@ -1,6 +1,10 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* For powerpc, disable doloop IV cand generation in IVOPTs to avoid unexpected
+ division operation for its base setup. */
+/* { dg-additional-options "-fno-branch-count-reg" { target { powerpc*-*-* } } } */
+
int foo (int n)
{
while (n >= 45)
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 530ea4a..be3b0b5 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -275,6 +275,9 @@ comp_cost::operator+= (comp_cost cost)
comp_cost
comp_cost::operator+= (HOST_WIDE_INT c)
{
+ if (c >= INFTY)
+ this->cost = INFTY;
+
if (infinite_cost_p ())
return *this;
@@ -399,6 +402,8 @@ struct iv_group
struct cost_pair *cost_map;
/* The selected candidate for the group. */
struct iv_cand *selected;
+ /* To indicate this is a doloop use group. */
+ bool doloop_p;
/* Uses in the group. */
vec<struct iv_use *> vuses;
};
@@ -439,6 +444,7 @@ struct iv_cand
be hoisted out of loop. */
struct iv *orig_iv; /* The original iv if this cand is added from biv with
smaller type. */
+ bool doloop_p; /* Whether this is a doloop candidate. */
};
/* Hashtable entry for common candidate derived from iv uses. */
@@ -612,6 +618,9 @@ struct ivopts_data
/* Whether the loop body can only be exited via single exit. */
bool loop_single_exit_p;
+
+ /* Whether the loop has doloop comparison use. */
+ bool doloop_use_p;
};
/* An assignment of iv candidates to uses. */
@@ -1528,6 +1537,7 @@ record_group (struct ivopts_data *data, enum use_type type)
group->type = type;
group->related_cands = BITMAP_ALLOC (NULL);
group->vuses.create (1);
+ group->doloop_p = false;
data->vgroups.safe_push (group);
return group;
@@ -3017,9 +3027,9 @@ get_loop_invariant_expr (struct ivopts_data *data, tree inv_expr)
replacement of the final value of the iv by a direct computation. */
static struct iv_cand *
-add_candidate_1 (struct ivopts_data *data,
- tree base, tree step, bool important, enum iv_position pos,
- struct iv_use *use, gimple *incremented_at,
+add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important,
+ enum iv_position pos, struct iv_use *use,
+ gimple *incremented_at, bool doloop = false,
struct iv *orig_iv = NULL)
{
unsigned i;
@@ -3079,11 +3089,15 @@ add_candidate_1 (struct ivopts_data *data,
cand->pos = pos;
if (pos != IP_ORIGINAL)
{
- cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp");
+ if (doloop)
+ cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "doloop");
+ else
+ cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp");
cand->var_after = cand->var_before;
}
cand->important = important;
cand->incremented_at = incremented_at;
+ cand->doloop_p = doloop;
data->vcands.safe_push (cand);
if (!poly_int_tree_p (step))
@@ -3116,6 +3130,7 @@ add_candidate_1 (struct ivopts_data *data,
}
cand->important |= important;
+ cand->doloop_p |= doloop;
/* Relate candidate to the group for which it is added. */
if (use)
@@ -3209,16 +3224,17 @@ add_autoinc_candidates (struct ivopts_data *data, tree base, tree step,
the end of loop. */
static void
-add_candidate (struct ivopts_data *data,
- tree base, tree step, bool important, struct iv_use *use,
+add_candidate (struct ivopts_data *data, tree base, tree step, bool important,
+ struct iv_use *use, bool doloop = false,
struct iv *orig_iv = NULL)
{
if (ip_normal_pos (data->current_loop))
- add_candidate_1 (data, base, step, important,
- IP_NORMAL, use, NULL, orig_iv);
- if (ip_end_pos (data->current_loop)
+ add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop,
+ orig_iv);
+ if (!doloop && ip_end_pos (data->current_loop)
&& allow_ip_end_pos_p (data->current_loop))
- add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv);
+ add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop,
+ orig_iv);
}
/* Adds standard iv candidates. */
@@ -3262,7 +3278,7 @@ add_iv_candidate_for_biv (struct ivopts_data *data, struct iv *iv)
tree step = fold_convert (sizetype, iv->step);
/* Add iv cand of same precision as index part in TARGET_MEM_REF. */
- add_candidate (data, base, step, true, NULL, iv);
+ add_candidate (data, base, step, true, NULL, false, iv);
/* Add iv cand of the original type only if it has nonlinear use. */
if (iv->nonlin_use)
add_candidate (data, iv->base, iv->step, true, NULL);
@@ -3724,7 +3740,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data)
Some RTL specific checks seems unable to be checked in gimple, if any new
checks or easy checks _are_ missing here, please add them. */
-static bool ATTRIBUTE_UNUSED
+static bool
generic_predict_doloop_p (struct ivopts_data *data)
{
struct loop *loop = data->current_loop;
@@ -4177,6 +4193,36 @@ force_expr_to_var_cost (tree expr, bool speed)
STRIP_NOPS (op0);
op1 = NULL_TREE;
break;
+ /* See add_iv_candidate_for_doloop, for doloop may_be_zero case, we
+ introduce COND_EXPR for IV base, need to support better cost estimation
+ for this COND_EXPR and tcc_comparison. */
+ case COND_EXPR:
+ op0 = TREE_OPERAND (expr, 1);
+ STRIP_NOPS (op0);
+ op1 = TREE_OPERAND (expr, 2);
+ STRIP_NOPS (op1);
+ break;
+ case LT_EXPR:
+ case LE_EXPR:
+ case GT_EXPR:
+ case GE_EXPR:
+ case EQ_EXPR:
+ case NE_EXPR:
+ case UNORDERED_EXPR:
+ case ORDERED_EXPR:
+ case UNLT_EXPR:
+ case UNLE_EXPR:
+ case UNGT_EXPR:
+ case UNGE_EXPR:
+ case UNEQ_EXPR:
+ case LTGT_EXPR:
+ case MAX_EXPR:
+ case MIN_EXPR:
+ op0 = TREE_OPERAND (expr, 0);
+ STRIP_NOPS (op0);
+ op1 = TREE_OPERAND (expr, 1);
+ STRIP_NOPS (op1);
+ break;
default:
/* Just an arbitrary value, FIXME. */
@@ -4258,6 +4304,35 @@ force_expr_to_var_cost (tree expr, bool speed)
case RSHIFT_EXPR:
cost = comp_cost (add_cost (speed, mode), 0);
break;
+ case COND_EXPR:
+ op0 = TREE_OPERAND (expr, 0);
+ STRIP_NOPS (op0);
+ if (op0 == NULL_TREE || TREE_CODE (op0) == SSA_NAME
+ || CONSTANT_CLASS_P (op0))
+ cost = no_cost;
+ else
+ cost = force_expr_to_var_cost (op0, speed);
+ break;
+ case LT_EXPR:
+ case LE_EXPR:
+ case GT_EXPR:
+ case GE_EXPR:
+ case EQ_EXPR:
+ case NE_EXPR:
+ case UNORDERED_EXPR:
+ case ORDERED_EXPR:
+ case UNLT_EXPR:
+ case UNLE_EXPR:
+ case UNGT_EXPR:
+ case UNGE_EXPR:
+ case UNEQ_EXPR:
+ case LTGT_EXPR:
+ case MAX_EXPR:
+ case MIN_EXPR:
+ /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate
+ cost evaluation way. */
+ cost = comp_cost (1.5 * add_cost (speed, mode), 0);
+ break;
default:
gcc_unreachable ();
@@ -4706,8 +4781,12 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt)
cost = no_cost;
else
- cost = get_computation_cost (data, use, cand, false,
- &inv_vars, NULL, &inv_expr);
+ {
+ cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL,
+ &inv_expr);
+ if (cand->doloop_p)
+ cost += targetm.doloop_cost_for_generic;
+ }
if (inv_expr)
{
@@ -4735,6 +4814,9 @@ determine_group_iv_cost_address (struct ivopts_data *data,
cost = get_computation_cost (data, use, cand, true,
&inv_vars, &can_autoinc, &inv_expr);
+ if (cand->doloop_p)
+ cost += targetm.doloop_cost_for_address;
+
if (inv_expr)
{
inv_exprs = BITMAP_ALLOC (NULL);
@@ -5142,6 +5224,15 @@ may_eliminate_iv (struct ivopts_data *data,
}
}
+ /* For doloop IV cand, the bound would be zero. It's safe whether
+ may_be_zero set or not. */
+ if (cand->doloop_p)
+ {
+ *bound = build_int_cst (TREE_TYPE (cand->iv->base), 0);
+ *comp = iv_elimination_compare (data, use);
+ return true;
+ }
+
cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
*bound = fold_convert (TREE_TYPE (cand->iv->base),
@@ -5264,6 +5355,9 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
inv_vars = inv_vars_elim;
inv_vars_elim = NULL;
inv_expr = inv_expr_elim;
+ /* For doloop candidate/use pair, adjust to zero cost. */
+ if (group->doloop_p && cand->doloop_p)
+ cost = no_cost;
}
else
{
@@ -5390,6 +5484,42 @@ relate_compare_use_with_all_cands (struct ivopts_data *data)
}
}
+/* Add one doloop dedicated IV candidate:
+ - Base is (may_be_zero ? 1 : (niter + 1)).
+ - Step is -1. */
+
+static void
+add_iv_candidate_for_doloop (struct ivopts_data *data)
+{
+ tree_niter_desc *niter_desc = niter_for_single_dom_exit (data);
+ gcc_assert (niter_desc && niter_desc->assumptions);
+
+ tree niter = niter_desc->niter;
+ tree ntype = TREE_TYPE (niter);
+ gcc_assert (TREE_CODE (ntype) == INTEGER_TYPE);
+
+ tree may_be_zero = niter_desc->may_be_zero;
+ if (may_be_zero && integer_zerop (may_be_zero))
+ may_be_zero = NULL_TREE;
+ if (may_be_zero)
+ {
+ if (COMPARISON_CLASS_P (may_be_zero))
+ {
+ niter = fold_build3 (COND_EXPR, ntype, may_be_zero,
+ build_int_cst (ntype, 0),
+ rewrite_to_non_trapping_overflow (niter));
+ }
+ /* Don't try to obtain the iteration count expression when may_be_zero is
+ integer_nonzerop (actually iteration count is one) or else. */
+ else
+ return;
+ }
+
+ tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
+ build_int_cst (ntype, 1));
+ add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, true);
+}
+
/* Finds the candidates for the induction variables. */
static void
@@ -5398,6 +5528,10 @@ find_iv_candidates (struct ivopts_data *data)
/* Add commonly used ivs. */
add_standard_iv_candidates (data);
+ /* Add doloop dedicate ivs. */
+ if (data->doloop_use_p)
+ add_iv_candidate_for_doloop (data);
+
/* Add old induction variables. */
add_iv_candidate_for_bivs (data);
@@ -5578,16 +5712,21 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
or a const set. */
if (cost_base.cost == 0)
cost_base.cost = COSTS_N_INSNS (1);
- cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
-
+ /* Doloop decrement should be considered as zero cost. */
+ if (cand->doloop_p)
+ cost_step = 0;
+ else
+ cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
cost = cost_step + adjust_setup_cost (data, cost_base.cost);
/* Prefer the original ivs unless we may gain something by replacing it.
The reason is to make debugging simpler; so this is not relevant for
artificial ivs created by other optimization passes. */
- if (cand->pos != IP_ORIGINAL
- || !SSA_NAME_VAR (cand->var_before)
- || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
+ if ((cand->pos != IP_ORIGINAL
+ || !SSA_NAME_VAR (cand->var_before)
+ || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
+ /* Prefer doloop as well. */
+ && !cand->doloop_p)
cost++;
/* Prefer not to insert statements into latch unless there are some
@@ -5832,7 +5971,8 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
if (ivs->n_cand_uses[cid] == 0)
{
bitmap_clear_bit (ivs->cands, cid);
- ivs->n_cands--;
+ if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p)
+ ivs->n_cands--;
ivs->cand_cost -= cp->cand->cost;
iv_ca_set_remove_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
iv_ca_set_remove_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses);
@@ -5889,7 +6029,8 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
if (ivs->n_cand_uses[cid] == 1)
{
bitmap_set_bit (ivs->cands, cid);
- ivs->n_cands++;
+ if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p)
+ ivs->n_cands++;
ivs->cand_cost += cp->cand->cost;
iv_ca_set_add_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses);
iv_ca_set_add_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses);
@@ -6134,6 +6275,8 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
fprintf (file, " cost: %" PRId64 " (complexity %d)\n", cost.cost,
cost.complexity);
+ fprintf (file, " reg_cost: %d\n",
+ ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands));
fprintf (file, " cand_cost: %" PRId64 "\n cand_group_cost: "
"%" PRId64 " (complexity %d)\n", ivs->cand_cost,
ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
@@ -7568,6 +7711,75 @@ determine_scaling_factor (struct ivopts_data *data, basic_block *body)
}
}
+/* Find doloop comparison use and set its doloop_p on if found. */
+
+static bool
+find_doloop_use (struct ivopts_data *data)
+{
+ struct loop *loop = data->current_loop;
+
+ for (unsigned i = 0; i < data->vgroups.length (); i++)
+ {
+ struct iv_group *group = data->vgroups[i];
+ if (group->type == USE_COMPARE)
+ {
+ gcc_assert (group->vuses.length () == 1);
+ struct iv_use *use = group->vuses[0];
+ gimple *stmt = use->stmt;
+ if (gimple_code (stmt) == GIMPLE_COND)
+ {
+ basic_block bb = gimple_bb (stmt);
+ edge true_edge, false_edge;
+ extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+ /* This comparison is used for loop latch. Require latch is empty
+ for now. */
+ if ((loop->latch == true_edge->dest
+ || loop->latch == false_edge->dest)
+ && empty_block_p (loop->latch))
+ {
+ group->doloop_p = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Doloop cmp iv use: ");
+ print_gimple_stmt (dump_file, stmt, TDF_DETAILS);
+ }
+ return true;
+ }
+ }
+ }
+ }
+
+ return false;
+}
+
+/* For the targets which support doloop, to predict whether later RTL doloop
+ transformation will perform on this loop, further detect the doloop use and
+ mark the flag doloop_use_p if predicted. */
+
+void
+predict_and_process_doloop (struct ivopts_data *data)
+{
+ if (!flag_branch_on_count_reg)
+ return;
+
+ if (!generic_predict_doloop_p (data))
+ return;
+
+ if (find_doloop_use (data))
+ {
+ data->doloop_use_p = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ struct loop *loop = data->current_loop;
+ fprintf (dump_file,
+ "Predict loop %d can perform"
+ " doloop optimization later.\n",
+ loop->num);
+ flow_loop_dump (loop, dump_file, NULL, 1);
+ }
+ }
+}
+
/* Optimizes the LOOP. Returns true if anything changed. */
static bool
@@ -7580,6 +7792,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop,
basic_block *body;
gcc_assert (!data->niters);
+ data->doloop_use_p = false;
data->current_loop = loop;
data->loop_loc = find_loop_location (loop).get_location_t ();
data->speed = optimize_loop_for_speed_p (loop);
@@ -7622,6 +7835,9 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop,
/* Determine cost scaling factor for basic blocks in loop. */
determine_scaling_factor (data, body);
+ /* Predict doloop and find the doloop use if predicted. */
+ predict_and_process_doloop (data);
+
/* Finds candidates for the induction variables (item 2). */
find_iv_candidates (data);
next prev parent reply other threads:[~2019-08-22 7:09 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-14 3:10 [PATCH v2 3/3] " linkw
2019-05-14 7:26 ` Richard Biener
2019-05-15 5:03 ` Kewen.Lin
2019-05-15 8:47 ` Richard Biener
2019-05-15 16:17 ` Segher Boessenkool
2019-05-16 7:25 ` Richard Biener
2019-05-16 17:35 ` Segher Boessenkool
2019-05-16 3:53 ` Kewen.Lin
2019-05-16 18:41 ` Jeff Law
2019-05-16 21:42 ` Segher Boessenkool
2019-06-19 11:47 ` [PATCH v3 3/3] PR80791 " Kewen.Lin
2019-06-20 9:09 ` Segher Boessenkool
2019-06-20 12:08 ` Kewen.Lin
2019-06-20 12:17 ` Kewen.Lin
2019-07-10 2:31 ` [PING^1][PATCH v4 " Kewen.Lin
2019-07-12 12:40 ` Richard Biener
2019-07-12 14:10 ` Segher Boessenkool
2019-07-15 6:40 ` Kewen.Lin
2019-07-15 6:50 ` Bin.Cheng
2019-07-21 9:06 ` [PATCH v3 " Bin.Cheng
2019-07-22 5:42 ` Kewen.Lin
2019-07-22 6:53 ` Segher Boessenkool
2019-07-22 7:18 ` Kewen.Lin
2019-07-22 8:02 ` Richard Biener
2019-07-22 21:47 ` Segher Boessenkool
2019-07-23 6:14 ` Kewen.Lin
2019-07-23 7:38 ` Richard Biener
2019-07-23 6:09 ` Kewen.Lin
2019-07-23 8:05 ` Richard Biener
2019-07-23 6:28 ` [PATCH v5 " Kewen.Lin
2019-08-14 7:48 ` [PATCH v6 " Kewen.Lin
2019-08-21 13:42 ` Bin.Cheng
2019-08-22 7:09 ` Kewen.Lin
2019-08-22 8:07 ` Bin.Cheng
2019-08-22 9:16 ` Kewen.Lin [this message]
2019-08-23 5:31 ` Bin.Cheng
2019-08-23 9:57 ` Kewen.Lin
2019-08-23 10:43 ` Bin.Cheng
2019-08-23 11:02 ` Segher Boessenkool
2019-09-11 6:18 ` Kewen.Lin
2019-09-12 8:14 ` Richard Biener
2019-09-14 9:35 ` Kewen.Lin
2019-08-24 22:43 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43a3b3c1-9c41-82fa-a5db-1a7a1a5ceae1@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=amker.cheng@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).