* [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes.
@ 2022-10-21 13:52 Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure Dimitrije Milosevic
0 siblings, 2 replies; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-21 13:52 UTC (permalink / raw)
To: gcc-patches; +Cc: djordje.todorovic
Architectures like Mips are very limited when it comes to addressing modes. Therefore, the expected
behavior would be that, for the BASE + OFFSET addressing mode, complexity is lower, while, for more
complex addressing modes (e.g. BASE + INDEX << SCALE), which are not supported, complexity is
higher. Currently, the complexity calculation algorithm bails out if BASE + INDEX addressing mode
is not supported by the target architecture, resuling in 0-complexities for all candidates, which
leads to non-optimal candidate selection, especially in scenarios where there are multiple nested
loops.
Additionally, when bumping up the register pressure cost, the number of invariants should also be
considered, in addition to the number of candidates.
Dimitrije Milosevic (2):
ivopts: Revert computation of address cost complexity.
ivopts: Consider number of invariants when calculating register pressure.
gcc/tree-ssa-address.cc | 2 +-
gcc/tree-ssa-address.h | 2 +
gcc/tree-ssa-loop-ivopts.cc | 220 +++++++++++++++++++++++++++++++++++++++++---
3 files changed, 210 insertions(+), 14 deletions(-)
---
2.25.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
@ 2022-10-21 13:52 ` Dimitrije Milosevic
2022-10-25 11:08 ` Richard Biener
2022-10-27 23:02 ` Jeff Law
2022-10-21 13:52 ` [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure Dimitrije Milosevic
1 sibling, 2 replies; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-21 13:52 UTC (permalink / raw)
To: gcc-patches; +Cc: djordje.todorovic, Dimitrije Milošević
From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
This patch reverts the computation of address cost complexity
to the legacy one. After f9f69dd, complexity is calculated
using the valid_mem_ref_p target hook. Architectures like
Mips only allow BASE + OFFSET addressing modes, which in turn
prevents the calculation of complexity for other addressing
modes, resulting in non-optimal candidate selection.
gcc/ChangeLog:
* tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
to non-static.
* tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
* tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
(compute_min_and_max_offset): Likewise.
(get_address_cost): Revert
complexity calculation.
Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
---
gcc/tree-ssa-address.cc | 2 +-
gcc/tree-ssa-address.h | 2 +
gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
3 files changed, 207 insertions(+), 11 deletions(-)
diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
index ba7b7c93162..442f54f0165 100644
--- a/gcc/tree-ssa-address.cc
+++ b/gcc/tree-ssa-address.cc
@@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
validity for a memory reference accessing memory of mode MODE in address
space AS. */
-static bool
+bool
multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
addr_space_t as)
{
diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
index 95143a099b9..09f36ee2f19 100644
--- a/gcc/tree-ssa-address.h
+++ b/gcc/tree-ssa-address.h
@@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
class aff_tree *, tree, tree, tree, bool);
extern void copy_ref_info (tree, tree);
tree maybe_fold_tmr (tree);
+bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
+ addr_space_t as);
extern unsigned int preferred_mem_scale_factor (tree base,
machine_mode mem_mode,
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index a6f926a68ef..d53ba05a4f6 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
return infinite_cost;
}
+static void
+compute_symbol_and_var_present (tree e1, tree e2,
+ bool *symbol_present, bool *var_present)
+{
+ poly_uint64_pod off1, off2;
+
+ e1 = strip_offset (e1, &off1);
+ e2 = strip_offset (e2, &off2);
+
+ STRIP_NOPS (e1);
+ STRIP_NOPS (e2);
+
+ if (TREE_CODE (e1) == ADDR_EXPR)
+ {
+ poly_int64_pod diff;
+ if (ptr_difference_const (e1, e2, &diff))
+ {
+ *symbol_present = false;
+ *var_present = false;
+ return;
+ }
+
+ if (integer_zerop (e2))
+ {
+ tree core;
+ poly_int64_pod bitsize;
+ poly_int64_pod bitpos;
+ widest_int mul;
+ tree toffset;
+ machine_mode mode;
+ int unsignedp, reversep, volatilep;
+
+ core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
+ &toffset, &mode, &unsignedp, &reversep, &volatilep);
+
+ if (toffset != 0
+ || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
+ || reversep
+ || !VAR_P (core))
+ {
+ *symbol_present = false;
+ *var_present = true;
+ return;
+ }
+
+ if (TREE_STATIC (core)
+ || DECL_EXTERNAL (core))
+ {
+ *symbol_present = true;
+ *var_present = false;
+ return;
+ }
+
+ *symbol_present = false;
+ *var_present = true;
+ return;
+ }
+
+ *symbol_present = false;
+ *var_present = true;
+ }
+ *symbol_present = false;
+
+ if (operand_equal_p (e1, e2, 0))
+ {
+ *var_present = false;
+ return;
+ }
+
+ *var_present = true;
+}
+
+static void
+compute_min_and_max_offset (addr_space_t as,
+ machine_mode mem_mode, poly_int64_pod *min_offset,
+ poly_int64_pod *max_offset)
+{
+ machine_mode address_mode = targetm.addr_space.address_mode (as);
+ HOST_WIDE_INT i;
+ poly_int64_pod off, width;
+ rtx addr;
+ rtx reg1;
+
+ reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
+
+ width = GET_MODE_BITSIZE (address_mode) - 1;
+ if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
+ width = HOST_BITS_PER_WIDE_INT - 1;
+ gcc_assert (width.is_constant ());
+ addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
+
+ off = 0;
+ for (i = width.to_constant (); i >= 0; i--)
+ {
+ off = -(HOST_WIDE_INT_1U << i);
+ XEXP (addr, 1) = gen_int_mode (off, address_mode);
+ if (memory_address_addr_space_p (mem_mode, addr, as))
+ break;
+ }
+ if (i == -1)
+ *min_offset = 0;
+ else
+ *min_offset = off;
+ // *min_offset = (i == -1? 0 : off);
+
+ for (i = width.to_constant (); i >= 0; i--)
+ {
+ off = (HOST_WIDE_INT_1U << i) - 1;
+ XEXP (addr, 1) = gen_int_mode (off, address_mode);
+ if (memory_address_addr_space_p (mem_mode, addr, as))
+ break;
+ /* For some strict-alignment targets, the offset must be naturally
+ aligned. Try an aligned offset if mem_mode is not QImode. */
+ off = mem_mode != QImode
+ ? (HOST_WIDE_INT_1U << i)
+ - (GET_MODE_SIZE (mem_mode))
+ : 0;
+ if (known_gt (off, 0))
+ {
+ XEXP (addr, 1) = gen_int_mode (off, address_mode);
+ if (memory_address_addr_space_p (mem_mode, addr, as))
+ break;
+ }
+ }
+ if (i == -1)
+ off = 0;
+ *max_offset = off;
+}
+
/* Return cost of computing USE's address expression by using CAND.
AFF_INV and AFF_VAR represent invariant and variant parts of the
address expression, respectively. If AFF_INV is simple, store
@@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
/* Only true if ratio != 1. */
bool ok_with_ratio_p = false;
bool ok_without_ratio_p = false;
+ tree ubase = use->iv->base;
+ tree cbase = cand->iv->base, cstep = cand->iv->step;
+ tree utype = TREE_TYPE (ubase), ctype;
+ unsigned HOST_WIDE_INT cstepi;
+ bool symbol_present = false, var_present = false, stmt_is_after_increment;
+ poly_int64_pod min_offset, max_offset;
+ bool offset_p, ratio_p;
if (!aff_combination_const_p (aff_inv))
{
@@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
cost += address_cost (addr, mem_mode, as, speed);
- if (parts.symbol != NULL_TREE)
- cost.complexity += 1;
- /* Don't increase the complexity of adding a scaled index if it's
- the only kind of index that the target allows. */
- if (parts.step != NULL_TREE && ok_without_ratio_p)
- cost.complexity += 1;
- if (parts.base != NULL_TREE && parts.index != NULL_TREE)
- cost.complexity += 1;
- if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
- cost.complexity += 1;
+ if (cst_and_fits_in_hwi (cstep))
+ cstepi = int_cst_value (cstep);
+ else
+ cstepi = 0;
+
+ STRIP_NOPS (cbase);
+ ctype = TREE_TYPE (cbase);
+
+ stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
+ use->stmt);
+
+ if (cst_and_fits_in_hwi (cbase))
+ compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
+ &symbol_present, &var_present);
+ else if (ratio == 1)
+ {
+ tree real_cbase = cbase;
+
+ /* Check to see if any adjustment is needed. */
+ if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
+ {
+ aff_tree real_cbase_aff;
+ aff_tree cstep_aff;
+
+ tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
+ &real_cbase_aff);
+ tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
+
+ aff_combination_add (&real_cbase_aff, &cstep_aff);
+ real_cbase = aff_combination_to_tree (&real_cbase_aff);
+ }
+ compute_symbol_and_var_present (ubase, real_cbase,
+ &symbol_present, &var_present);
+ }
+ else if (!POINTER_TYPE_P (ctype)
+ && multiplier_allowed_in_address_p
+ (ratio, mem_mode,
+ TYPE_ADDR_SPACE (TREE_TYPE (utype))))
+ {
+ tree real_cbase = cbase;
+
+ if (cstepi == 0 && stmt_is_after_increment)
+ {
+ if (POINTER_TYPE_P (ctype))
+ real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
+ else
+ real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
+ }
+ real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
+ build_int_cst (ctype, ratio));
+ compute_symbol_and_var_present (ubase, real_cbase,
+ &symbol_present, &var_present);
+ }
+ else
+ {
+ compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
+ &symbol_present, &var_present);
+ }
+
+ compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
+ offset_p = maybe_ne (aff_inv->offset, 0)
+ && known_le (min_offset, aff_inv->offset)
+ && known_le (aff_inv->offset, max_offset);
+ ratio_p = (ratio != 1
+ && multiplier_allowed_in_address_p (ratio, mem_mode, as));
+
+ cost.complexity = (symbol_present != 0) + (var_present != 0)
+ + offset_p + ratio_p;
return cost;
}
--
2.25.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
@ 2022-10-21 13:52 ` Dimitrije Milosevic
2022-10-25 11:07 ` Richard Biener
1 sibling, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-21 13:52 UTC (permalink / raw)
To: gcc-patches; +Cc: djordje.todorovic, Dimitrije Milošević
From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
This patch slightly modifies register pressure model function to consider
both the number of invariants and the number of candidates, rather than
just the number of candidates. This used to be the case before c18101f.
gcc/ChangeLog:
* tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
---
gcc/tree-ssa-loop-ivopts.cc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index d53ba05a4f6..9d0b669d671 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
+ target_spill_cost [speed] * (n_cands - available_regs) * 2
+ target_spill_cost [speed] * (regs_needed - n_cands);
- /* Finally, add the number of candidates, so that we prefer eliminating
- induction variables if possible. */
- return cost + n_cands;
+ /* Finally, add the number of invariants and the number of candidates,
+ so that we prefer eliminating induction variables if possible. */
+ return cost + n_invs + n_cands;
}
/* For each size of the induction variable set determine the penalty. */
--
2.25.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-21 13:52 ` [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure Dimitrije Milosevic
@ 2022-10-25 11:07 ` Richard Biener
2022-10-25 13:00 ` Dimitrije Milosevic
0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-10-25 11:07 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: gcc-patches, djordje.todorovic
On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch slightly modifies register pressure model function to consider
> both the number of invariants and the number of candidates, rather than
> just the number of candidates. This used to be the case before c18101f.
don't you add n_invs twice now given
unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
?
> gcc/ChangeLog:
>
> * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
> gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index d53ba05a4f6..9d0b669d671 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> + target_spill_cost [speed] * (n_cands - available_regs) * 2
> + target_spill_cost [speed] * (regs_needed - n_cands);
>
> - /* Finally, add the number of candidates, so that we prefer eliminating
> - induction variables if possible. */
> - return cost + n_cands;
> + /* Finally, add the number of invariants and the number of candidates,
> + so that we prefer eliminating induction variables if possible. */
> + return cost + n_invs + n_cands;
> }
>
> /* For each size of the induction variable set determine the penalty. */
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
@ 2022-10-25 11:08 ` Richard Biener
2022-10-25 13:00 ` Dimitrije Milosevic
2022-10-27 23:02 ` Jeff Law
1 sibling, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-10-25 11:08 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: gcc-patches, djordje.todorovic
On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
I don't follow how only having BASE + OFFSET addressing prevents
calculation of complexity for other addressing modes? Can you explain?
Do you have a testcase that shows how both changes improve IV selection
for MIPS?
>
> gcc/ChangeLog:
>
> * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> to non-static.
> * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> (compute_min_and_max_offset): Likewise.
> (get_address_cost): Revert
> complexity calculation.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
> gcc/tree-ssa-address.cc | 2 +-
> gcc/tree-ssa-address.h | 2 +
> gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
> 3 files changed, 207 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
> index ba7b7c93162..442f54f0165 100644
> --- a/gcc/tree-ssa-address.cc
> +++ b/gcc/tree-ssa-address.cc
> @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
> validity for a memory reference accessing memory of mode MODE in address
> space AS. */
>
> -static bool
> +bool
> multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> addr_space_t as)
> {
> diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
> index 95143a099b9..09f36ee2f19 100644
> --- a/gcc/tree-ssa-address.h
> +++ b/gcc/tree-ssa-address.h
> @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
> class aff_tree *, tree, tree, tree, bool);
> extern void copy_ref_info (tree, tree);
> tree maybe_fold_tmr (tree);
> +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> + addr_space_t as);
>
> extern unsigned int preferred_mem_scale_factor (tree base,
> machine_mode mem_mode,
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index a6f926a68ef..d53ba05a4f6 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
> return infinite_cost;
> }
>
> +static void
> +compute_symbol_and_var_present (tree e1, tree e2,
> + bool *symbol_present, bool *var_present)
> +{
> + poly_uint64_pod off1, off2;
> +
> + e1 = strip_offset (e1, &off1);
> + e2 = strip_offset (e2, &off2);
> +
> + STRIP_NOPS (e1);
> + STRIP_NOPS (e2);
> +
> + if (TREE_CODE (e1) == ADDR_EXPR)
> + {
> + poly_int64_pod diff;
> + if (ptr_difference_const (e1, e2, &diff))
> + {
> + *symbol_present = false;
> + *var_present = false;
> + return;
> + }
> +
> + if (integer_zerop (e2))
> + {
> + tree core;
> + poly_int64_pod bitsize;
> + poly_int64_pod bitpos;
> + widest_int mul;
> + tree toffset;
> + machine_mode mode;
> + int unsignedp, reversep, volatilep;
> +
> + core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
> + &toffset, &mode, &unsignedp, &reversep, &volatilep);
> +
> + if (toffset != 0
> + || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
> + || reversep
> + || !VAR_P (core))
> + {
> + *symbol_present = false;
> + *var_present = true;
> + return;
> + }
> +
> + if (TREE_STATIC (core)
> + || DECL_EXTERNAL (core))
> + {
> + *symbol_present = true;
> + *var_present = false;
> + return;
> + }
> +
> + *symbol_present = false;
> + *var_present = true;
> + return;
> + }
> +
> + *symbol_present = false;
> + *var_present = true;
> + }
> + *symbol_present = false;
> +
> + if (operand_equal_p (e1, e2, 0))
> + {
> + *var_present = false;
> + return;
> + }
> +
> + *var_present = true;
> +}
> +
> +static void
> +compute_min_and_max_offset (addr_space_t as,
> + machine_mode mem_mode, poly_int64_pod *min_offset,
> + poly_int64_pod *max_offset)
> +{
> + machine_mode address_mode = targetm.addr_space.address_mode (as);
> + HOST_WIDE_INT i;
> + poly_int64_pod off, width;
> + rtx addr;
> + rtx reg1;
> +
> + reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
> +
> + width = GET_MODE_BITSIZE (address_mode) - 1;
> + if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
> + width = HOST_BITS_PER_WIDE_INT - 1;
> + gcc_assert (width.is_constant ());
> + addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> +
> + off = 0;
> + for (i = width.to_constant (); i >= 0; i--)
> + {
> + off = -(HOST_WIDE_INT_1U << i);
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + }
> + if (i == -1)
> + *min_offset = 0;
> + else
> + *min_offset = off;
> + // *min_offset = (i == -1? 0 : off);
> +
> + for (i = width.to_constant (); i >= 0; i--)
> + {
> + off = (HOST_WIDE_INT_1U << i) - 1;
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + /* For some strict-alignment targets, the offset must be naturally
> + aligned. Try an aligned offset if mem_mode is not QImode. */
> + off = mem_mode != QImode
> + ? (HOST_WIDE_INT_1U << i)
> + - (GET_MODE_SIZE (mem_mode))
> + : 0;
> + if (known_gt (off, 0))
> + {
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + }
> + }
> + if (i == -1)
> + off = 0;
> + *max_offset = off;
> +}
> +
> /* Return cost of computing USE's address expression by using CAND.
> AFF_INV and AFF_VAR represent invariant and variant parts of the
> address expression, respectively. If AFF_INV is simple, store
> @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
> /* Only true if ratio != 1. */
> bool ok_with_ratio_p = false;
> bool ok_without_ratio_p = false;
> + tree ubase = use->iv->base;
> + tree cbase = cand->iv->base, cstep = cand->iv->step;
> + tree utype = TREE_TYPE (ubase), ctype;
> + unsigned HOST_WIDE_INT cstepi;
> + bool symbol_present = false, var_present = false, stmt_is_after_increment;
> + poly_int64_pod min_offset, max_offset;
> + bool offset_p, ratio_p;
>
> if (!aff_combination_const_p (aff_inv))
> {
> @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
> gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
> cost += address_cost (addr, mem_mode, as, speed);
>
> - if (parts.symbol != NULL_TREE)
> - cost.complexity += 1;
> - /* Don't increase the complexity of adding a scaled index if it's
> - the only kind of index that the target allows. */
> - if (parts.step != NULL_TREE && ok_without_ratio_p)
> - cost.complexity += 1;
> - if (parts.base != NULL_TREE && parts.index != NULL_TREE)
> - cost.complexity += 1;
> - if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
> - cost.complexity += 1;
> + if (cst_and_fits_in_hwi (cstep))
> + cstepi = int_cst_value (cstep);
> + else
> + cstepi = 0;
> +
> + STRIP_NOPS (cbase);
> + ctype = TREE_TYPE (cbase);
> +
> + stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
> + use->stmt);
> +
> + if (cst_and_fits_in_hwi (cbase))
> + compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> + &symbol_present, &var_present);
> + else if (ratio == 1)
> + {
> + tree real_cbase = cbase;
> +
> + /* Check to see if any adjustment is needed. */
> + if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
> + {
> + aff_tree real_cbase_aff;
> + aff_tree cstep_aff;
> +
> + tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
> + &real_cbase_aff);
> + tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
> +
> + aff_combination_add (&real_cbase_aff, &cstep_aff);
> + real_cbase = aff_combination_to_tree (&real_cbase_aff);
> + }
> + compute_symbol_and_var_present (ubase, real_cbase,
> + &symbol_present, &var_present);
> + }
> + else if (!POINTER_TYPE_P (ctype)
> + && multiplier_allowed_in_address_p
> + (ratio, mem_mode,
> + TYPE_ADDR_SPACE (TREE_TYPE (utype))))
> + {
> + tree real_cbase = cbase;
> +
> + if (cstepi == 0 && stmt_is_after_increment)
> + {
> + if (POINTER_TYPE_P (ctype))
> + real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
> + else
> + real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
> + }
> + real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
> + build_int_cst (ctype, ratio));
> + compute_symbol_and_var_present (ubase, real_cbase,
> + &symbol_present, &var_present);
> + }
> + else
> + {
> + compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> + &symbol_present, &var_present);
> + }
> +
> + compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
> + offset_p = maybe_ne (aff_inv->offset, 0)
> + && known_le (min_offset, aff_inv->offset)
> + && known_le (aff_inv->offset, max_offset);
> + ratio_p = (ratio != 1
> + && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> +
> + cost.complexity = (symbol_present != 0) + (var_present != 0)
> + + offset_p + ratio_p;
>
> return cost;
> }
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-25 11:08 ` Richard Biener
@ 2022-10-25 13:00 ` Dimitrije Milosevic
0 siblings, 0 replies; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-25 13:00 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, Djordje Todorovic
Hi Richard,
> I don't follow how only having BASE + OFFSET addressing prevents
> calculation of complexity for other addressing modes? Can you explain?
It's the valid_mem_ref_p target hook that prevents complexity calculation
for other addressing modes (for Mips and RISC-V).
Here's the snippet of the algorithm (after f9f69dd) for the complexity
calculation, which is located at the beginning of the get_address_cost
function in tree-ssa-loop-ivopts.cc:
if (!aff_combination_const_p (aff_inv))
{
parts.index = integer_one_node;
/* Addressing mode "base + index". */
ok_without_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
if (ratio != 1)
{
parts.step = wide_int_to_tree (type, ratio);
/* Addressing mode "base + index << scale". */
ok_with_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
if (!ok_with_ratio_p)
parts.step = NULL_TREE;
}
...
The algorithm "builds up" the actual addressing mode step-by-step,
starting from BASE + INDEX. However, if valid_mem_ref_p returns
negative, parts.* is set to NULL_TREE and we bail out. For Mips (or
RISC-V), it always returns negative (given we are building the addressing
mode up from BASE + INDEX), since Mips allows BASE + OFFSET only
(see the case PLUS in mips_classify_address in config/mips/mips.cc).
The result is that all addressing modes besides BASE + OFFSET, for Mips
(or RISC-V) have complexities of 0. f9f69dd introduced calls to valid_mem_ref_p
target hook, which were not there before, and I'm not sure why exactly.
> Do you have a testcase that shows how both changes improve IV selection
> for MIPS?
Certainly, consider the following test case:
void daxpy(float *vector1, float *vector2, int n, float fp_const)
{
for (int i = 0; i < n; ++i)
vector1[i] += fp_const * vector2[i];
}
void dgefa(float *vector, int m, int n, int l)
{
for (int i = 0; i < n - 1; ++i)
{
for (int j = i + 1; j < n; ++j)
{
float t = vector[m * j + l];
daxpy(&vector[m * i + i + 1], &vector[m * j + i + 1], n - (i + 1), t);
}
}
}
At the third inner loop (which gets inlined from daxpy), an unoptimal candidate
selection takes place.
Worth noting is that f9f69dd doesn't change the costs (they are, however, multiplied by
a factor, but what was lesser/greater before is lesser/greater after). Here's how complexities
stand:
===== Before f9f69dd =====
Group 1:
cand cost compl. inv.expr. inv.vars
1 13 1 3; NIL;
2 13 2 4; NIL;
4 9 1 5; NIL;
5 1 0 NIL; NIL;
7 9 1 3; NIL;
===== Before f9f69dd =====
===== After f9f69dd =====
Group 1:
cand cost compl. inv.expr. inv.vars
1 10 0 4; NIL;
2 10 0 5; NIL;
4 6 0 6; NIL;
5 1 0 NIL; NIL;
7 6 0 4; NIL;
===== After f9f69dd =====
Notice how all complexities are zero, even though the candidates have
different addressing modes.
For this particular example, this leads to a different candidate selection:
===== Before f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
Candidate 4:
Var befor: ivtmp.17_52
Var after: ivtmp.17_103
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) (vector_27(D) + _10)
Step: 4
Object: (void *) vector_27(D)
Biv: N
Overflowness wrto loop niter: Overflow
Candidate 5:
Var befor: ivtmp.18_99
Var after: ivtmp.18_98
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) (vector_27(D) + _14)
Step: 4
Object: (void *) vector_27(D)
Biv: N
Overflowness wrto loop niter: Overflow
===== Before f9f69dd =====
===== After f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 1 IVs:
Candidate 4:
Var befor: ivtmp.17_52
Var after: ivtmp.17_103
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) (vector_27(D) + _10)
Step: 4
Object: (void *) vector_27(D)
Biv: N
Overflowness wrto loop niter: Overflow
===== After f9f69dd =====
which, in turn, leads to the following assembly sequence:
===== Before f9f69dd =====
.L83:
lwc1 $f5,0($3)
lwc1 $f8,0($2)
lwc1 $f7,4($2)
lwc1 $f6,8($2)
lwc1 $f9,12($2)
lwc1 $f10,16($2)
maddf.s $f8,$f0,$f5
lwc1 $f11,20($2)
lwc1 $f12,24($2)
lwc1 $f13,28($2)
ld $12,72($sp)
swc1 $f8,0($2)
lwc1 $f14,4($3)
maddf.s $f7,$f0,$f14
swc1 $f7,4($2)
lwc1 $f15,8($3)
maddf.s $f6,$f0,$f15
swc1 $f6,8($2)
lwc1 $f16,12($3)
maddf.s $f9,$f0,$f16
swc1 $f9,12($2)
lwc1 $f17,16($3)
maddf.s $f10,$f0,$f17
swc1 $f10,16($2)
lwc1 $f18,20($3)
maddf.s $f11,$f0,$f18
swc1 $f11,20($2)
lwc1 $f19,24($3)
maddf.s $f12,$f0,$f19
swc1 $f12,24($2)
lwc1 $f20,28($3)
maddf.s $f13,$f0,$f20
swc1 $f13,28($2)
daddiu $2,$2,32
bne $2,$12,.L83
daddiu $3,$3,32
...
===== Before f9f69dd =====
===== After f9f69dd =====
.L93:
dsubu $18,$2,$4
lwc1 $f13,0($2)
daddu $19,$18,$5
daddiu $16,$2,4
lwc1 $f14,0($19)
dsubu $17,$16,$4
daddu $25,$17,$5
lwc1 $f15,4($2)
daddiu $19,$2,12
daddiu $20,$2,8
maddf.s $f13,$f1,$f14
dsubu $16,$19,$4
daddiu $17,$2,16
dsubu $18,$20,$4
daddu $19,$16,$5
daddiu $16,$2,20
lwc1 $f10,8($2)
daddu $20,$18,$5
lwc1 $f16,12($2)
dsubu $18,$17,$4
lwc1 $f17,16($2)
dsubu $17,$16,$4
lwc1 $f18,20($2)
daddiu $16,$2,24
lwc1 $f20,24($2)
daddu $18,$18,$5
swc1 $f13,0($2)
daddu $17,$17,$5
lwc1 $f19,0($25)
daddiu $25,$2,28
lwc1 $f11,28($2)
daddiu $2,$2,32
dsubu $16,$16,$4
dsubu $25,$25,$4
maddf.s $f15,$f1,$f19
daddu $16,$16,$5
daddu $25,$25,$5
swc1 $f15,-28($2)
lwc1 $f21,0($20)
ld $20,48($sp)
maddf.s $f10,$f1,$f21
swc1 $f10,-24($2)
lwc1 $f22,0($19)
maddf.s $f16,$f1,$f22
swc1 $f16,-20($2)
lwc1 $f23,0($18)
maddf.s $f17,$f1,$f23
swc1 $f17,-16($2)
lwc1 $f0,0($17)
maddf.s $f18,$f1,$f0
swc1 $f18,-12($2)
lwc1 $f7,0($16)
maddf.s $f20,$f1,$f7
swc1 $f20,-8($2)
lwc1 $f12,0($25)
maddf.s $f11,$f1,$f12
bne $2,$20,.L93
swc1 $f11,-4($2)
...
===== After f9f69dd =====
Notice the additional instructions used for index calculation, due to
unoptimal candidate selection.
Regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, October 25, 2022 1:08 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
I don't follow how only having BASE + OFFSET addressing prevents
calculation of complexity for other addressing modes? Can you explain?
Do you have a testcase that shows how both changes improve IV selection
for MIPS?
>
> gcc/ChangeLog:
>
> * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> to non-static.
> * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> (compute_min_and_max_offset): Likewise.
> (get_address_cost): Revert
> complexity calculation.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
> gcc/tree-ssa-address.cc | 2 +-
> gcc/tree-ssa-address.h | 2 +
> gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
> 3 files changed, 207 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
> index ba7b7c93162..442f54f0165 100644
> --- a/gcc/tree-ssa-address.cc
> +++ b/gcc/tree-ssa-address.cc
> @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
> validity for a memory reference accessing memory of mode MODE in address
> space AS. */
>
> -static bool
> +bool
> multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> addr_space_t as)
> {
> diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
> index 95143a099b9..09f36ee2f19 100644
> --- a/gcc/tree-ssa-address.h
> +++ b/gcc/tree-ssa-address.h
> @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
> class aff_tree *, tree, tree, tree, bool);
> extern void copy_ref_info (tree, tree);
> tree maybe_fold_tmr (tree);
> +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> + addr_space_t as);
>
> extern unsigned int preferred_mem_scale_factor (tree base,
> machine_mode mem_mode,
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index a6f926a68ef..d53ba05a4f6 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
> return infinite_cost;
> }
>
> +static void
> +compute_symbol_and_var_present (tree e1, tree e2,
> + bool *symbol_present, bool *var_present)
> +{
> + poly_uint64_pod off1, off2;
> +
> + e1 = strip_offset (e1, &off1);
> + e2 = strip_offset (e2, &off2);
> +
> + STRIP_NOPS (e1);
> + STRIP_NOPS (e2);
> +
> + if (TREE_CODE (e1) == ADDR_EXPR)
> + {
> + poly_int64_pod diff;
> + if (ptr_difference_const (e1, e2, &diff))
> + {
> + *symbol_present = false;
> + *var_present = false;
> + return;
> + }
> +
> + if (integer_zerop (e2))
> + {
> + tree core;
> + poly_int64_pod bitsize;
> + poly_int64_pod bitpos;
> + widest_int mul;
> + tree toffset;
> + machine_mode mode;
> + int unsignedp, reversep, volatilep;
> +
> + core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
> + &toffset, &mode, &unsignedp, &reversep, &volatilep);
> +
> + if (toffset != 0
> + || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
> + || reversep
> + || !VAR_P (core))
> + {
> + *symbol_present = false;
> + *var_present = true;
> + return;
> + }
> +
> + if (TREE_STATIC (core)
> + || DECL_EXTERNAL (core))
> + {
> + *symbol_present = true;
> + *var_present = false;
> + return;
> + }
> +
> + *symbol_present = false;
> + *var_present = true;
> + return;
> + }
> +
> + *symbol_present = false;
> + *var_present = true;
> + }
> + *symbol_present = false;
> +
> + if (operand_equal_p (e1, e2, 0))
> + {
> + *var_present = false;
> + return;
> + }
> +
> + *var_present = true;
> +}
> +
> +static void
> +compute_min_and_max_offset (addr_space_t as,
> + machine_mode mem_mode, poly_int64_pod *min_offset,
> + poly_int64_pod *max_offset)
> +{
> + machine_mode address_mode = targetm.addr_space.address_mode (as);
> + HOST_WIDE_INT i;
> + poly_int64_pod off, width;
> + rtx addr;
> + rtx reg1;
> +
> + reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
> +
> + width = GET_MODE_BITSIZE (address_mode) - 1;
> + if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
> + width = HOST_BITS_PER_WIDE_INT - 1;
> + gcc_assert (width.is_constant ());
> + addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> +
> + off = 0;
> + for (i = width.to_constant (); i >= 0; i--)
> + {
> + off = -(HOST_WIDE_INT_1U << i);
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + }
> + if (i == -1)
> + *min_offset = 0;
> + else
> + *min_offset = off;
> + // *min_offset = (i == -1? 0 : off);
> +
> + for (i = width.to_constant (); i >= 0; i--)
> + {
> + off = (HOST_WIDE_INT_1U << i) - 1;
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + /* For some strict-alignment targets, the offset must be naturally
> + aligned. Try an aligned offset if mem_mode is not QImode. */
> + off = mem_mode != QImode
> + ? (HOST_WIDE_INT_1U << i)
> + - (GET_MODE_SIZE (mem_mode))
> + : 0;
> + if (known_gt (off, 0))
> + {
> + XEXP (addr, 1) = gen_int_mode (off, address_mode);
> + if (memory_address_addr_space_p (mem_mode, addr, as))
> + break;
> + }
> + }
> + if (i == -1)
> + off = 0;
> + *max_offset = off;
> +}
> +
> /* Return cost of computing USE's address expression by using CAND.
> AFF_INV and AFF_VAR represent invariant and variant parts of the
> address expression, respectively. If AFF_INV is simple, store
> @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
> /* Only true if ratio != 1. */
> bool ok_with_ratio_p = false;
> bool ok_without_ratio_p = false;
> + tree ubase = use->iv->base;
> + tree cbase = cand->iv->base, cstep = cand->iv->step;
> + tree utype = TREE_TYPE (ubase), ctype;
> + unsigned HOST_WIDE_INT cstepi;
> + bool symbol_present = false, var_present = false, stmt_is_after_increment;
> + poly_int64_pod min_offset, max_offset;
> + bool offset_p, ratio_p;
>
> if (!aff_combination_const_p (aff_inv))
> {
> @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
> gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
> cost += address_cost (addr, mem_mode, as, speed);
>
> - if (parts.symbol != NULL_TREE)
> - cost.complexity += 1;
> - /* Don't increase the complexity of adding a scaled index if it's
> - the only kind of index that the target allows. */
> - if (parts.step != NULL_TREE && ok_without_ratio_p)
> - cost.complexity += 1;
> - if (parts.base != NULL_TREE && parts.index != NULL_TREE)
> - cost.complexity += 1;
> - if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
> - cost.complexity += 1;
> + if (cst_and_fits_in_hwi (cstep))
> + cstepi = int_cst_value (cstep);
> + else
> + cstepi = 0;
> +
> + STRIP_NOPS (cbase);
> + ctype = TREE_TYPE (cbase);
> +
> + stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
> + use->stmt);
> +
> + if (cst_and_fits_in_hwi (cbase))
> + compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> + &symbol_present, &var_present);
> + else if (ratio == 1)
> + {
> + tree real_cbase = cbase;
> +
> + /* Check to see if any adjustment is needed. */
> + if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
> + {
> + aff_tree real_cbase_aff;
> + aff_tree cstep_aff;
> +
> + tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
> + &real_cbase_aff);
> + tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
> +
> + aff_combination_add (&real_cbase_aff, &cstep_aff);
> + real_cbase = aff_combination_to_tree (&real_cbase_aff);
> + }
> + compute_symbol_and_var_present (ubase, real_cbase,
> + &symbol_present, &var_present);
> + }
> + else if (!POINTER_TYPE_P (ctype)
> + && multiplier_allowed_in_address_p
> + (ratio, mem_mode,
> + TYPE_ADDR_SPACE (TREE_TYPE (utype))))
> + {
> + tree real_cbase = cbase;
> +
> + if (cstepi == 0 && stmt_is_after_increment)
> + {
> + if (POINTER_TYPE_P (ctype))
> + real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
> + else
> + real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
> + }
> + real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
> + build_int_cst (ctype, ratio));
> + compute_symbol_and_var_present (ubase, real_cbase,
> + &symbol_present, &var_present);
> + }
> + else
> + {
> + compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> + &symbol_present, &var_present);
> + }
> +
> + compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
> + offset_p = maybe_ne (aff_inv->offset, 0)
> + && known_le (min_offset, aff_inv->offset)
> + && known_le (aff_inv->offset, max_offset);
> + ratio_p = (ratio != 1
> + && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> +
> + cost.complexity = (symbol_present != 0) + (var_present != 0)
> + + offset_p + ratio_p;
>
> return cost;
> }
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-25 11:07 ` Richard Biener
@ 2022-10-25 13:00 ` Dimitrije Milosevic
2022-10-28 7:38 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-25 13:00 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, Djordje Todorovic
Hi Richard,
> don't you add n_invs twice now given
>
> unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>
> ?
If you are referring to the "If we have enough registers." case, correct. After c18101f,
for that case, the returned cost is equal to 2 * n_invs + n_cands. Before c18101f, for
that case, the returned cost is equal to n_invs + n_cands. Another solution would be
to just return n_invs + n_cands if we have enough registers.
Regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, October 25, 2022 1:07 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch slightly modifies register pressure model function to consider
> both the number of invariants and the number of candidates, rather than
> just the number of candidates. This used to be the case before c18101f.
don't you add n_invs twice now given
unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
?
> gcc/ChangeLog:
>
> * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
> gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index d53ba05a4f6..9d0b669d671 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> + target_spill_cost [speed] * (n_cands - available_regs) * 2
> + target_spill_cost [speed] * (regs_needed - n_cands);
>
> - /* Finally, add the number of candidates, so that we prefer eliminating
> - induction variables if possible. */
> - return cost + n_cands;
> + /* Finally, add the number of invariants and the number of candidates,
> + so that we prefer eliminating induction variables if possible. */
> + return cost + n_invs + n_cands;
> }
>
> /* For each size of the induction variable set determine the penalty. */
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
2022-10-25 11:08 ` Richard Biener
@ 2022-10-27 23:02 ` Jeff Law
2022-10-28 6:43 ` Dimitrije Milosevic
1 sibling, 1 reply; 24+ messages in thread
From: Jeff Law @ 2022-10-27 23:02 UTC (permalink / raw)
To: Dimitrije Milosevic, gcc-patches; +Cc: djordje.todorovic
On 10/21/22 07:52, Dimitrije Milosevic wrote:
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
>
> gcc/ChangeLog:
>
> * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> to non-static.
> * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> (compute_min_and_max_offset): Likewise.
> (get_address_cost): Revert
> complexity calculation.
THe part I don't understand is, if you only have BASE+OFF, why does
preventing the calculation of more complex addressing modes matter? ie,
what's the point of computing the cost of something like base + off +
scaled index when the target can't utilize it?
jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-27 23:02 ` Jeff Law
@ 2022-10-28 6:43 ` Dimitrije Milosevic
2022-10-28 7:00 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-28 6:43 UTC (permalink / raw)
To: Jeff Law, gcc-patches; +Cc: Djordje Todorovic, richard.guenther
Hi Jeff,
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter? ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
Well, the complexities of all addressing modes other than BASE + OFFSET are
equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
than a candidate with BASE + INDEX, for example, as it has to compensate
the lack of other addressing modes somehow. If complexities for both of
those are equal to 0, in cases where complexities decide which candidate is
to be chosen, a more complex candidate may be picked.
Regards,
Dimitrije
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Friday, October 28, 2022 1:02 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On 10/21/22 07:52, Dimitrije Milosevic wrote:
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
>
> gcc/ChangeLog:
>
> * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> to non-static.
> * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> (compute_min_and_max_offset): Likewise.
> (get_address_cost): Revert
> complexity calculation.
THe part I don't understand is, if you only have BASE+OFF, why does
preventing the calculation of more complex addressing modes matter? ie,
what's the point of computing the cost of something like base + off +
scaled index when the target can't utilize it?
jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-28 6:43 ` Dimitrije Milosevic
@ 2022-10-28 7:00 ` Richard Biener
2022-10-28 13:39 ` Dimitrije Milosevic
2022-11-01 18:46 ` Jeff Law
0 siblings, 2 replies; 24+ messages in thread
From: Richard Biener @ 2022-10-28 7:00 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter? ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.
But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported? So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?
The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?
Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?
> Regards,
> Dimitrije
>
>
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> > to non-static.
> > * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> > * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> > (compute_min_and_max_offset): Likewise.
> > (get_address_cost): Revert
> > complexity calculation.
>
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter? ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
>
>
> jeff
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-25 13:00 ` Dimitrije Milosevic
@ 2022-10-28 7:38 ` Richard Biener
2022-10-28 13:39 ` Dimitrije Milosevic
0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-10-28 7:38 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: gcc-patches, Djordje Todorovic
On Tue, Oct 25, 2022 at 3:00 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> > don't you add n_invs twice now given
> >
> > unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> > unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> >
> > ?
>
> If you are referring to the "If we have enough registers." case, correct. After c18101f,
> for that case, the returned cost is equal to 2 * n_invs + n_cands.
It's n_invs + 2 * n_cands? And the comment states the reasoning.
Before c18101f, for
> that case, the returned cost is equal to n_invs + n_cands. Another solution would be
> to just return n_invs + n_cands if we have enough registers.
The comment says we want to prefer eliminating IVs over invariants. Your patch
undoes that by weighting invariants the same so it does no longer have
the effect
of the comment.
> Regards,
> Dimitrije
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, October 25, 2022 1:07 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
>
> On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
> <dimitrije.milosevic@syrmia.com> wrote:
> >
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch slightly modifies register pressure model function to consider
> > both the number of invariants and the number of candidates, rather than
> > just the number of candidates. This used to be the case before c18101f.
>
> don't you add n_invs twice now given
>
> unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>
> ?
>
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> > ---
> > gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index d53ba05a4f6..9d0b669d671 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> > + target_spill_cost [speed] * (n_cands - available_regs) * 2
> > + target_spill_cost [speed] * (regs_needed - n_cands);
> >
> > - /* Finally, add the number of candidates, so that we prefer eliminating
> > - induction variables if possible. */
> > - return cost + n_cands;
> > + /* Finally, add the number of invariants and the number of candidates,
> > + so that we prefer eliminating induction variables if possible. */
> > + return cost + n_invs + n_cands;
> > }
> >
> > /* For each size of the induction variable set determine the penalty. */
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-28 7:00 ` Richard Biener
@ 2022-10-28 13:39 ` Dimitrije Milosevic
2022-11-01 18:46 ` Jeff Law
1 sibling, 0 replies; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-28 13:39 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
Hi Richard,
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?
Test case I presented in [0] only has non-"BASE + OFFSET" candidates. Correct me
if I'm wrong, but the candidate selection algorithm doesn't take into account
which addressing modes are supported by the target?
> So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?
I just took it as an example, but yes.
> The function tries to compensate for that, maybe you can point out
> where it goes wrong?
> That is, at the end it adjusts cost and complexity based on what it
> scrapped before, maybe
> that is just a bit incomplete?
I think the cost.cost part is mostly okay, as the costs are just scaled
(what was lesser/higher before f9f69dd is lesser/higher after f9f69dd).
As far as the adjustments go, I don't think they are complete.
On the other hand, as complexity is a valid part of address costs, and
it can be used as a tie-breaker, I feel like it should serve a purpose,
even for targets like Mips which are limited when it comes to
addressing modes, rather than being equal to 0.
I guess an alternative would be to fully cover cost.cost adjustments, and
leave the complexities to be 0 for non-supported addressing modes.
Currently, they are implemented as follows:
if (simple_inv)
simple_inv = (aff_inv == NULL
|| aff_combination_const_p (aff_inv)
|| aff_combination_singleton_var_p (aff_inv));
if (!aff_combination_zero_p (aff_inv))
comp_inv = aff_combination_to_tree (aff_inv);
if (comp_inv != NULL_TREE)
cost = force_var_cost (data, comp_inv, inv_vars);
if (ratio != 1 && parts.step == NULL_TREE)
var_cost += mult_by_coeff_cost (ratio, addr_mode, speed);
if (comp_inv != NULL_TREE && parts.index == NULL_TREE)
var_cost += add_cost (speed, addr_mode);
> Note the original author of this is not available so it would help
> (maybe also yourself) to
> walk through the function with a specific candidate / use where you
> think the complexity
> (or cost) is wrong?
I'd like to refer to [0] where candidate costs didn't get adjusted to
cover the lack of complexity calculation.
Would love to hear your thoughts.
[0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html
Regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, October 28, 2022 9:00 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter? ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.
But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported? So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?
The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?
Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?
> Regards,
> Dimitrije
>
>
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> > to non-static.
> > * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> > * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> > (compute_min_and_max_offset): Likewise.
> > (get_address_cost): Revert
> > complexity calculation.
>
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter? ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
>
>
> jeff
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-28 7:38 ` Richard Biener
@ 2022-10-28 13:39 ` Dimitrije Milosevic
2022-11-07 12:56 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-10-28 13:39 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, Djordje Todorovic
Hi Richard,
> It's n_invs + 2 * n_cands?
Correct, n_invs + 2 * n_cands, my apologies.
> The comment says we want to prefer eliminating IVs over invariants. Your patch
> undoes that by weighting invariants the same so it does no longer have
> the effect
> of the comment.
I see how my patch may have confused you.
My concern is the "If we have enough registers." case - if we do have
enough registers to store both the invariants and induction variables, I think the cost
should be equal to the sum of those.
I understand that adding another n_cands could be used as a tie-breaker for the two
cases where we do have enough registers and the sum of n_invs and n_cands is equal,
however I think there are two problems with that:
- How often does it happen that we have two cases where we do have enough registers,
n_invs + n_cands sums are equal, and n_cands differ? I think that's pretty rare.
- Bumping up the cost by another n_cands may lead to cost for the "If we do have
enough registers." case to be higher than for other cases, which doesn't make sense.
I can refer to the test case that I presented in [0] for the second point.
Also worth noting is that the estimate_reg_pressure_cost function (used before c18101f)
follows this:
/* If we have enough registers, we should use them and not restrict
the transformations unnecessarily. */
if (regs_needed + target_res_regs <= available_regs)
return 0;
As far as preferring to eliminate induction variables if possible, don't we already do that,
for example:
/* If the number of candidates runs out available registers, we penalize
extra candidate registers using target_spill_cost * 2. Because it is
more expensive to spill induction variable than invariant. */
else
cost = target_reg_cost [speed] * available_regs
+ target_spill_cost [speed] * (n_cands - available_regs) * 2
+ target_spill_cost [speed] * (regs_needed - n_cands);
To clarify, what my patch did was that it gave every case a base cost of
n_invs + n_cands. This base cost gets bumped up accordingly, for each
one of the cases (by the amount equal to "cost = ..." statement prior to
the return statement in the ivopts_estimate_reg_pressure function).
I agree that my patch isn't clear on my intention, and that it also does
not correspond to the comment.
What I could do is just return n_new as the cost for the
"If we do have enough registers." case, but I would love to hear your
thoughts, if I clarified my intention a little bit.
[0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html
Regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, October 28, 2022 9:38 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
On Tue, Oct 25, 2022 at 3:00 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> > don't you add n_invs twice now given
> >
> > unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> > unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> >
> > ?
>
> If you are referring to the "If we have enough registers." case, correct. After c18101f,
> for that case, the returned cost is equal to 2 * n_invs + n_cands.
It's n_invs + 2 * n_cands? And the comment states the reasoning.
Before c18101f, for
> that case, the returned cost is equal to n_invs + n_cands. Another solution would be
> to just return n_invs + n_cands if we have enough registers.
The comment says we want to prefer eliminating IVs over invariants. Your patch
undoes that by weighting invariants the same so it does no longer have
the effect
of the comment.
> Regards,
> Dimitrije
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Tuesday, October 25, 2022 1:07 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
>
> On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
> <dimitrije.milosevic@syrmia.com> wrote:
> >
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch slightly modifies register pressure model function to consider
> > both the number of invariants and the number of candidates, rather than
> > just the number of candidates. This used to be the case before c18101f.
>
> don't you add n_invs twice now given
>
> unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>
> ?
>
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> > ---
> > gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index d53ba05a4f6..9d0b669d671 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> > + target_spill_cost [speed] * (n_cands - available_regs) * 2
> > + target_spill_cost [speed] * (regs_needed - n_cands);
> >
> > - /* Finally, add the number of candidates, so that we prefer eliminating
> > - induction variables if possible. */
> > - return cost + n_cands;
> > + /* Finally, add the number of invariants and the number of candidates,
> > + so that we prefer eliminating induction variables if possible. */
> > + return cost + n_invs + n_cands;
> > }
> >
> > /* For each size of the induction variable set determine the penalty. */
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-10-28 7:00 ` Richard Biener
2022-10-28 13:39 ` Dimitrije Milosevic
@ 2022-11-01 18:46 ` Jeff Law
2022-11-02 8:40 ` Dimitrije Milosevic
1 sibling, 1 reply; 24+ messages in thread
From: Jeff Law @ 2022-11-01 18:46 UTC (permalink / raw)
To: Richard Biener, Dimitrije Milosevic; +Cc: gcc-patches, Djordje Todorovic
On 10/28/22 01:00, Richard Biener wrote:
> On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
>> Hi Jeff,
>>
>>> THe part I don't understand is, if you only have BASE+OFF, why does
>>> preventing the calculation of more complex addressing modes matter? ie,
>>> what's the point of computing the cost of something like base + off +
>>> scaled index when the target can't utilize it?
>> Well, the complexities of all addressing modes other than BASE + OFFSET are
>> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
>> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
>> than a candidate with BASE + INDEX, for example, as it has to compensate
>> the lack of other addressing modes somehow. If complexities for both of
>> those are equal to 0, in cases where complexities decide which candidate is
>> to be chosen, a more complex candidate may be picked.
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported? So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?
This is exactly what I was trying to get to. If the addressing mode
isn't supported, then we shouldn't be picking it as a candidate. If it
is, then we've probably got a problem somewhere else in this code and
this patch is likely papering over it.
Jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-11-01 18:46 ` Jeff Law
@ 2022-11-02 8:40 ` Dimitrije Milosevic
2022-11-07 13:35 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-11-02 8:40 UTC (permalink / raw)
To: Jeff Law, Richard Biener; +Cc: gcc-patches, Djordje Todorovic
Hi Jeff,
> This is exactly what I was trying to get to. If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate. If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.
I'll take a deeper look into the candidate selection algorithm then. Will
get back to you.
Regards,
Dimitrije
________________________________________
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Tuesday, November 1, 2022 7:46 PM
To: Richard Biener; Dimitrije Milosevic
Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On 10/28/22 01:00, Richard Biener wrote:
> On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
>> Hi Jeff,
>>
>>> THe part I don't understand is, if you only have BASE+OFF, why does
>>> preventing the calculation of more complex addressing modes matter? ie,
>>> what's the point of computing the cost of something like base + off +
>>> scaled index when the target can't utilize it?
>> Well, the complexities of all addressing modes other than BASE + OFFSET are
>> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
>> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
>> than a candidate with BASE + INDEX, for example, as it has to compensate
>> the lack of other addressing modes somehow. If complexities for both of
>> those are equal to 0, in cases where complexities decide which candidate is
>> to be chosen, a more complex candidate may be picked.
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported? So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?
This is exactly what I was trying to get to. If the addressing mode
isn't supported, then we shouldn't be picking it as a candidate. If it
is, then we've probably got a problem somewhere else in this code and
this patch is likely papering over it.
Jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
2022-10-28 13:39 ` Dimitrije Milosevic
@ 2022-11-07 12:56 ` Richard Biener
0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2022-11-07 12:56 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: gcc-patches, Djordje Todorovic
On Fri, Oct 28, 2022 at 3:39 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> > It's n_invs + 2 * n_cands?
>
> Correct, n_invs + 2 * n_cands, my apologies.
>
> > The comment says we want to prefer eliminating IVs over invariants. Your patch
> > undoes that by weighting invariants the same so it does no longer have
> > the effect
> > of the comment.
>
> I see how my patch may have confused you.
> My concern is the "If we have enough registers." case - if we do have
> enough registers to store both the invariants and induction variables, I think the cost
> should be equal to the sum of those.
>
> I understand that adding another n_cands could be used as a tie-breaker for the two
> cases where we do have enough registers and the sum of n_invs and n_cands is equal,
> however I think there are two problems with that:
> - How often does it happen that we have two cases where we do have enough registers,
> n_invs + n_cands sums are equal, and n_cands differ? I think that's pretty rare.
> - Bumping up the cost by another n_cands may lead to cost for the "If we do have
> enough registers." case to be higher than for other cases, which doesn't make sense.
> I can refer to the test case that I presented in [0] for the second point.
The odd thing I notice is that for all cases but the "we have enough
registers" case we
scale cost by target_reg_cost[speed] or target_spill_cost[speed] but
in that special
case we use the plain sum of n_invs + n_cands.
That makes this case "extra cheap" which is possibly OK.
Instead of adding another n_invs it would have been clearer to remove the add of
n_cands, no? Or indeed as you say return n_new from the first case.
> Also worth noting is that the estimate_reg_pressure_cost function (used before c18101f)
> follows this:
>
> /* If we have enough registers, we should use them and not restrict
> the transformations unnecessarily. */
> if (regs_needed + target_res_regs <= available_regs)
> return 0;
>
> As far as preferring to eliminate induction variables if possible, don't we already do that,
> for example:
>
> /* If the number of candidates runs out available registers, we penalize
> extra candidate registers using target_spill_cost * 2. Because it is
> more expensive to spill induction variable than invariant. */
> else
> cost = target_reg_cost [speed] * available_regs
> + target_spill_cost [speed] * (n_cands - available_regs) * 2
> + target_spill_cost [speed] * (regs_needed - n_cands);
>
> To clarify, what my patch did was that it gave every case a base cost of
> n_invs + n_cands. This base cost gets bumped up accordingly, for each
> one of the cases (by the amount equal to "cost = ..." statement prior to
> the return statement in the ivopts_estimate_reg_pressure function).
> I agree that my patch isn't clear on my intention, and that it also does
> not correspond to the comment.
> What I could do is just return n_new as the cost for the
> "If we do have enough registers." case, but I would love to hear your
> thoughts, if I clarified my intention a little bit.
You did. Note IVOPTs costing is tricky at best and I don't have a very good
feeling why biasing things with n_invs should be a general improvement.
The situation where it makes a difference should be as rare as the
situation you describe above.
I'd love to see the function a bit simplified, it seems we are going to compare
'cost' as computed by this function from different cases so making them
less apples vs. oranges would be good - I just don't see how your patch
does that. It might be that the bias should be applied when computing
iv_ca_delta or when comparing two iv_ca rather than trying to magically
adjust 'cost'. When that is really the problem you are trying to solve.
>
> [0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html
>
> Regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, October 28, 2022 9:38 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
>
> On Tue, Oct 25, 2022 at 3:00 PM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Richard,
> >
> > > don't you add n_invs twice now given
> > >
> > > unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> > > unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> > >
> > > ?
> >
> > If you are referring to the "If we have enough registers." case, correct. After c18101f,
> > for that case, the returned cost is equal to 2 * n_invs + n_cands.
>
> It's n_invs + 2 * n_cands? And the comment states the reasoning.
>
> Before c18101f, for
> > that case, the returned cost is equal to n_invs + n_cands. Another solution would be
> > to just return n_invs + n_cands if we have enough registers.
>
> The comment says we want to prefer eliminating IVs over invariants. Your patch
> undoes that by weighting invariants the same so it does no longer have
> the effect
> of the comment.
>
> > Regards,
> > Dimitrije
> >
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Tuesday, October 25, 2022 1:07 PM
> > To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> > Subject: Re: [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure.
> >
> > On Fri, Oct 21, 2022 at 3:57 PM Dimitrije Milosevic
> > <dimitrije.milosevic@syrmia.com> wrote:
> > >
> > > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> > >
> > > This patch slightly modifies register pressure model function to consider
> > > both the number of invariants and the number of candidates, rather than
> > > just the number of candidates. This used to be the case before c18101f.
> >
> > don't you add n_invs twice now given
> >
> > unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> > unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> >
> > ?
> >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> > >
> > > Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> > > ---
> > > gcc/tree-ssa-loop-ivopts.cc | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index d53ba05a4f6..9d0b669d671 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -6409,9 +6409,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> > > + target_spill_cost [speed] * (n_cands - available_regs) * 2
> > > + target_spill_cost [speed] * (regs_needed - n_cands);
> > >
> > > - /* Finally, add the number of candidates, so that we prefer eliminating
> > > - induction variables if possible. */
> > > - return cost + n_cands;
> > > + /* Finally, add the number of invariants and the number of candidates,
> > > + so that we prefer eliminating induction variables if possible. */
> > > + return cost + n_invs + n_cands;
> > > }
> > >
> > > /* For each size of the induction variable set determine the penalty. */
> > > --
> > > 2.25.1
> > >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-11-02 8:40 ` Dimitrije Milosevic
@ 2022-11-07 13:35 ` Richard Biener
2022-12-15 15:26 ` Dimitrije Milosevic
0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-11-07 13:35 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > This is exactly what I was trying to get to. If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate. If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
I'm not sure this is accurate but at least the cost of using an unsupported
addressing mode should be at least that of the compensating code to
mangle it to a supported form.
> I'll take a deeper look into the candidate selection algorithm then. Will
> get back to you.
Thanks - as said the unfortunate situation is that both the original author and
the one who did the last bigger reworks of the code are gone.
Richard.
> Regards,
> Dimitrije
>
> ________________________________________
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Tuesday, November 1, 2022 7:46 PM
> To: Richard Biener; Dimitrije Milosevic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/28/22 01:00, Richard Biener wrote:
> > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> >> Hi Jeff,
> >>
> >>> THe part I don't understand is, if you only have BASE+OFF, why does
> >>> preventing the calculation of more complex addressing modes matter? ie,
> >>> what's the point of computing the cost of something like base + off +
> >>> scaled index when the target can't utilize it?
> >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> >> than a candidate with BASE + INDEX, for example, as it has to compensate
> >> the lack of other addressing modes somehow. If complexities for both of
> >> those are equal to 0, in cases where complexities decide which candidate is
> >> to be chosen, a more complex candidate may be picked.
> > But something is wrong then - it shouldn't ever pick a candidate with
> > an addressing
> > mode that isn't supported? So you say that the cost of expressing
> > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > accurately?
>
> This is exactly what I was trying to get to. If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate. If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.
>
>
> Jeff
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-11-07 13:35 ` Richard Biener
@ 2022-12-15 15:26 ` Dimitrije Milosevic
2022-12-16 9:58 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-12-15 15:26 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
Hi Richard,
Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.
I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
the target architecture. It does, however, adjust the cost for a subset of those.
The adjustment code modifies only the cost part of the address cost (which
consists of a cost and a complexity).
Having said this, I'd propose two approaches:
1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
sure they aren't already covered), leaving complexity for unsupported
addressing modes zero.
2. Revert the complexity calculation (which my initial patch does), leaving
everything else as it is.
3. A combination of both - if the control path gets into the adjustment code, we
use the reverted complexity calculation.
I'd love to get feedback regarding this, so I could focus on a concrete approach.
Kind regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, November 7, 2022 2:35 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > This is exactly what I was trying to get to. If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate. If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
I'm not sure this is accurate but at least the cost of using an unsupported
addressing mode should be at least that of the compensating code to
mangle it to a supported form.
> I'll take a deeper look into the candidate selection algorithm then. Will
> get back to you.
Thanks - as said the unfortunate situation is that both the original author and
the one who did the last bigger reworks of the code are gone.
Richard.
> Regards,
> Dimitrije
>
> ________________________________________
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Tuesday, November 1, 2022 7:46 PM
> To: Richard Biener; Dimitrije Milosevic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/28/22 01:00, Richard Biener wrote:
> > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> >> Hi Jeff,
> >>
> >>> THe part I don't understand is, if you only have BASE+OFF, why does
> >>> preventing the calculation of more complex addressing modes matter? ie,
> >>> what's the point of computing the cost of something like base + off +
> >>> scaled index when the target can't utilize it?
> >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> >> than a candidate with BASE + INDEX, for example, as it has to compensate
> >> the lack of other addressing modes somehow. If complexities for both of
> >> those are equal to 0, in cases where complexities decide which candidate is
> >> to be chosen, a more complex candidate may be picked.
> > But something is wrong then - it shouldn't ever pick a candidate with
> > an addressing
> > mode that isn't supported? So you say that the cost of expressing
> > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > accurately?
>
> This is exactly what I was trying to get to. If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate. If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.
>
>
> Jeff
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-12-15 15:26 ` Dimitrije Milosevic
@ 2022-12-16 9:58 ` Richard Biener
2022-12-16 11:37 ` Dimitrije Milosevic
0 siblings, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-12-16 9:58 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
>
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
>
> I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> the target architecture. It does, however, adjust the cost for a subset of those.
> The adjustment code modifies only the cost part of the address cost (which
> consists of a cost and a complexity).
> Having said this, I'd propose two approaches:
> 1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
> sure they aren't already covered), leaving complexity for unsupported
> addressing modes zero.
The only documentation on complexity I find is
int64_t cost; /* The runtime cost. */
unsigned complexity; /* The estimate of the complexity of the code for
the computation (in no concrete units --
complexity field should be larger for more
complex expressions and addressing modes). */
and complexity is used as tie-breaker only when cost is equal. Given that
shouldn't unsupported addressing modes have higher complexity? I'll note
that there's nothing "unsupported", each "unsupported" address computation
is lowered into supported pieces. "unsupported" maybe means that
"cost" isn't fully covered by address-cost and compensation stmts might
be costed in quantities not fully compatible with that?
That said, "complexity" seems to only complicate things :/ We do have the
tie-breaker on prefering less IVs. complexity was added in
r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
> 2. Revert the complexity calculation (which my initial patch does), leaving
> everything else as it is.
> 3. A combination of both - if the control path gets into the adjustment code, we
> use the reverted complexity calculation.
If it's really only about the "complexity" value then each
compensation step should
add to the complexity?
> I'd love to get feedback regarding this, so I could focus on a concrete approach.
>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, November 7, 2022 2:35 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Jeff,
> >
> > > This is exactly what I was trying to get to. If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate. If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
>
> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.
>
> > I'll take a deeper look into the candidate selection algorithm then. Will
> > get back to you.
>
> Thanks - as said the unfortunate situation is that both the original author and
> the one who did the last bigger reworks of the code are gone.
>
> Richard.
>
> > Regards,
> > Dimitrije
> >
> > ________________________________________
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Tuesday, November 1, 2022 7:46 PM
> > To: Richard Biener; Dimitrije Milosevic
> > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> >
> > On 10/28/22 01:00, Richard Biener wrote:
> > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >> Hi Jeff,
> > >>
> > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > >>> preventing the calculation of more complex addressing modes matter? ie,
> > >>> what's the point of computing the cost of something like base + off +
> > >>> scaled index when the target can't utilize it?
> > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > >> the lack of other addressing modes somehow. If complexities for both of
> > >> those are equal to 0, in cases where complexities decide which candidate is
> > >> to be chosen, a more complex candidate may be picked.
> > > But something is wrong then - it shouldn't ever pick a candidate with
> > > an addressing
> > > mode that isn't supported? So you say that the cost of expressing
> > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > accurately?
> >
> > This is exactly what I was trying to get to. If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate. If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
> >
> >
> > Jeff
> >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-12-16 9:58 ` Richard Biener
@ 2022-12-16 11:37 ` Dimitrije Milosevic
2022-12-16 11:58 ` Richard Biener
0 siblings, 1 reply; 24+ messages in thread
From: Dimitrije Milosevic @ 2022-12-16 11:37 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
Hi Richard,
> The only documentation on complexity I find is
>
> int64_t cost; /* The runtime cost. */
> unsigned complexity; /* The estimate of the complexity of the code for
> the computation (in no concrete units --
> complexity field should be larger for more
> complex expressions and addressing modes). */
>
> and complexity is used as tie-breaker only when cost is equal. Given that
> shouldn't unsupported addressing modes have higher complexity? I'll note
> that there's nothing "unsupported", each "unsupported" address computation
> is lowered into supported pieces. "unsupported" maybe means that
> "cost" isn't fully covered by address-cost and compensation stmts might
> be costed in quantities not fully compatible with that?
Correct, that's what I was aiming for initially - before f9f69dd that was the case,
"unsupported" addressing modes had higher complexities.
Also, that's what I meant by "unsupported" as well, thanks.
> That said, "complexity" seems to only complicate things :/ We do have the
> tie-breaker on preferring less IVs. complexity was added in
> r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
I agree that the complexity part is just (kind of) out there, not really strongly
defined. I'm not sure how to feel about merging complexity into the cost part
of an address cost, though.
> If it's really only about the "complexity" value then each
> compensation step should
> add to the complexity?
That could be the way to go. Also worth verifying is that we compensate for
each case of an unsupported addressing mode.
Kind regards,
Dimitrije
From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, December 16, 2022 10:58 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
>
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
>
> I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> the target architecture. It does, however, adjust the cost for a subset of those.
> The adjustment code modifies only the cost part of the address cost (which
> consists of a cost and a complexity).
> Having said this, I'd propose two approaches:
> 1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
> sure they aren't already covered), leaving complexity for unsupported
> addressing modes zero.
The only documentation on complexity I find is
int64_t cost; /* The runtime cost. */
unsigned complexity; /* The estimate of the complexity of the code for
the computation (in no concrete units --
complexity field should be larger for more
complex expressions and addressing modes). */
and complexity is used as tie-breaker only when cost is equal. Given that
shouldn't unsupported addressing modes have higher complexity? I'll note
that there's nothing "unsupported", each "unsupported" address computation
is lowered into supported pieces. "unsupported" maybe means that
"cost" isn't fully covered by address-cost and compensation stmts might
be costed in quantities not fully compatible with that?
That said, "complexity" seems to only complicate things :/ We do have the
tie-breaker on prefering less IVs. complexity was added in
r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
> 2. Revert the complexity calculation (which my initial patch does), leaving
> everything else as it is.
> 3. A combination of both - if the control path gets into the adjustment code, we
> use the reverted complexity calculation.
If it's really only about the "complexity" value then each
compensation step should
add to the complexity?
> I'd love to get feedback regarding this, so I could focus on a concrete approach.
>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, November 7, 2022 2:35 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Jeff,
> >
> > > This is exactly what I was trying to get to. If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate. If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
>
> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.
>
> > I'll take a deeper look into the candidate selection algorithm then. Will
> > get back to you.
>
> Thanks - as said the unfortunate situation is that both the original author and
> the one who did the last bigger reworks of the code are gone.
>
> Richard.
>
> > Regards,
> > Dimitrije
> >
> > ________________________________________
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Tuesday, November 1, 2022 7:46 PM
> > To: Richard Biener; Dimitrije Milosevic
> > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> >
> > On 10/28/22 01:00, Richard Biener wrote:
> > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >> Hi Jeff,
> > >>
> > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > >>> preventing the calculation of more complex addressing modes matter? ie,
> > >>> what's the point of computing the cost of something like base + off +
> > >>> scaled index when the target can't utilize it?
> > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > >> the lack of other addressing modes somehow. If complexities for both of
> > >> those are equal to 0, in cases where complexities decide which candidate is
> > >> to be chosen, a more complex candidate may be picked.
> > > But something is wrong then - it shouldn't ever pick a candidate with
> > > an addressing
> > > mode that isn't supported? So you say that the cost of expressing
> > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > accurately?
> >
> > This is exactly what I was trying to get to. If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate. If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
> >
> >
> > Jeff
> >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2022-12-16 11:37 ` Dimitrije Milosevic
@ 2022-12-16 11:58 ` Richard Biener
0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2022-12-16 11:58 UTC (permalink / raw)
To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic
On Fri, Dec 16, 2022 at 12:37 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
>
> Hi Richard,
>
> > The only documentation on complexity I find is
> >
> > int64_t cost; /* The runtime cost. */
> > unsigned complexity; /* The estimate of the complexity of the code for
> > the computation (in no concrete units --
> > complexity field should be larger for more
> > complex expressions and addressing modes). */
> >
> > and complexity is used as tie-breaker only when cost is equal. Given that
> > shouldn't unsupported addressing modes have higher complexity? I'll note
> > that there's nothing "unsupported", each "unsupported" address computation
> > is lowered into supported pieces. "unsupported" maybe means that
> > "cost" isn't fully covered by address-cost and compensation stmts might
> > be costed in quantities not fully compatible with that?
>
> Correct, that's what I was aiming for initially - before f9f69dd that was the case,
> "unsupported" addressing modes had higher complexities.
> Also, that's what I meant by "unsupported" as well, thanks.
>
> > That said, "complexity" seems to only complicate things :/ We do have the
> > tie-breaker on preferring less IVs. complexity was added in
> > r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
>
> I agree that the complexity part is just (kind of) out there, not really strongly
> defined. I'm not sure how to feel about merging complexity into the cost part
> of an address cost, though.
>
> > If it's really only about the "complexity" value then each
> > compensation step should
> > add to the complexity?
>
> That could be the way to go. Also worth verifying is that we compensate for
> each case of an unsupported addressing mode.
Yes. Also given complexity is only a tie-breaker we should cost the
compensation
somehow, but then complexity doesn't look necessary ...
Meh.
>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, December 16, 2022 10:58 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Richard,
> >
> > Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
> >
> > > I'm not sure this is accurate but at least the cost of using an unsupported
> > > addressing mode should be at least that of the compensating code to
> > > mangle it to a supported form.
> >
> > I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> > the target architecture. It does, however, adjust the cost for a subset of those.
> > The adjustment code modifies only the cost part of the address cost (which
> > consists of a cost and a complexity).
> > Having said this, I'd propose two approaches:
> > 1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
> > sure they aren't already covered), leaving complexity for unsupported
> > addressing modes zero.
>
> The only documentation on complexity I find is
>
> int64_t cost; /* The runtime cost. */
> unsigned complexity; /* The estimate of the complexity of the code for
> the computation (in no concrete units --
> complexity field should be larger for more
> complex expressions and addressing modes). */
>
> and complexity is used as tie-breaker only when cost is equal. Given that
> shouldn't unsupported addressing modes have higher complexity? I'll note
> that there's nothing "unsupported", each "unsupported" address computation
> is lowered into supported pieces. "unsupported" maybe means that
> "cost" isn't fully covered by address-cost and compensation stmts might
> be costed in quantities not fully compatible with that?
>
> That said, "complexity" seems to only complicate things :/ We do have the
> tie-breaker on prefering less IVs. complexity was added in
> r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
>
> > 2. Revert the complexity calculation (which my initial patch does), leaving
> > everything else as it is.
> > 3. A combination of both - if the control path gets into the adjustment code, we
> > use the reverted complexity calculation.
>
> If it's really only about the "complexity" value then each
> compensation step should
> add to the complexity?
>
> > I'd love to get feedback regarding this, so I could focus on a concrete approach.
> >
> > Kind regards,
> > Dimitrije
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Monday, November 7, 2022 2:35 PM
> > To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> > On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >
> > > Hi Jeff,
> > >
> > > > This is exactly what I was trying to get to. If the addressing mode
> > > > isn't supported, then we shouldn't be picking it as a candidate. If it
> > > > is, then we've probably got a problem somewhere else in this code and
> > > > this patch is likely papering over it.
> >
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
> >
> > > I'll take a deeper look into the candidate selection algorithm then. Will
> > > get back to you.
> >
> > Thanks - as said the unfortunate situation is that both the original author and
> > the one who did the last bigger reworks of the code are gone.
> >
> > Richard.
> >
> > > Regards,
> > > Dimitrije
> > >
> > > ________________________________________
> > > From: Jeff Law <jeffreyalaw@gmail.com>
> > > Sent: Tuesday, November 1, 2022 7:46 PM
> > > To: Richard Biener; Dimitrije Milosevic
> > > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> > >
> > >
> > > On 10/28/22 01:00, Richard Biener wrote:
> > > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > > >> Hi Jeff,
> > > >>
> > > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > > >>> preventing the calculation of more complex addressing modes matter? ie,
> > > >>> what's the point of computing the cost of something like base + off +
> > > >>> scaled index when the target can't utilize it?
> > > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > > >> the lack of other addressing modes somehow. If complexities for both of
> > > >> those are equal to 0, in cases where complexities decide which candidate is
> > > >> to be chosen, a more complex candidate may be picked.
> > > > But something is wrong then - it shouldn't ever pick a candidate with
> > > > an addressing
> > > > mode that isn't supported? So you say that the cost of expressing
> > > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > > accurately?
> > >
> > > This is exactly what I was trying to get to. If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate. If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
> > >
> > >
> > > Jeff
> > >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
2024-03-18 20:27 Aleksandar Rakic
@ 2024-04-15 13:30 ` Aleksandar Rakic
0 siblings, 0 replies; 24+ messages in thread
From: Aleksandar Rakic @ 2024-04-15 13:30 UTC (permalink / raw)
To: gcc-patches
Cc: richard.guenther, jeffreyalaw, Djordje Todorovic, Jovan Dmitrovic
PING: I remind you that the patch for the computation of complexity for unsupported addressing modes has been sent.
________________________________________
From: Aleksandar Rakic
Sent: Monday, March 18, 2024 9:27 PM
To: gcc-patches@gcc.gnu.org
Cc: Jovan Dmitrovic; richard.guenther@gmail.com; Djordje Todorovic; jeffreyalaw@gmail.com; Uros Beric
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
From dbf49f2872efcc14d2ea41eb7d616498dca9789f Mon Sep 17 00:00:00 2001
From: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
Date: Tue, 5 Mar 2024 11:55:01 +0100
Subject: [PATCH] ivopts: Fixed bug 109429
This patch modifies the order of the complexity calculation. By fixing the
complexities, the candidate selection is also fixed, which leads to the smaller
code size.
This patch also fixes the complexity if the variable is present in
the address expression, similarly to the variable 'var_present' in the
commit c2b64ce.
It also differentiates the adding of the autoinc_cost and the address
cost (acost) to the cost, similarly to the commit c2b64ce.
It also contains the C test and the script that generates the
assembly file and the output of the compiler. The assembly code
obtained after the modification of the file tree-ssa-loop-ivopts.cc is
smaller in size than the assembly code obtained before that. The output
of the compiler shows the difference in complexities for the function dgefa
for the loop 3 for the group 1.
This patch is available on the gcc fork on the following address:
https://github.com/rakicaleksandar1999/gcc/tree/bug_109429.
The description of the bug 109429 is on the following address:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429.
gcc/ChangeLog:
* tree-ssa-loop-ivopts.cc (get_address_cost): Fixed the
complexities calculation.
gcc/testsuite/ChangeLog:
* after.s: The assembly file obtained by compiling the fp_foo.c
file after modification of the tree-ssa-loop-ivopts.cc file.
* after.txt: The compiler-generated output obtained by compiling
the fp_foo.c file after modification of the
tree-ssa-loop-ivopts.cc file.
* before.s: The assembly file obtained by compiling the fp_foo.c
file before modification of the tree-ssa-loop-ivopts.cc file.
* before.txt: The compiler-generated output obtained by compiling
the fp_foo.c file before modification of the
tree-ssa-loop-ivopts.cc file.
* fp_foo.c: The C test.
* test_script.sh: The script used for compiling the fp_foo.c file.
Signed-off-by: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
---
gcc/testsuite/after.s | 148 ++
gcc/testsuite/after.txt | 2792 ++++++++++++++++++++++++++++++++++
gcc/testsuite/before.s | 152 ++
gcc/testsuite/before.txt | 2694 ++++++++++++++++++++++++++++++++
gcc/testsuite/fp_foo.c | 19 +
gcc/testsuite/test_script.sh | 10 +
gcc/tree-ssa-loop-ivopts.cc | 75 +-
7 files changed, 5853 insertions(+), 37 deletions(-)
create mode 100644 gcc/testsuite/after.s
create mode 100644 gcc/testsuite/after.txt
create mode 100644 gcc/testsuite/before.s
create mode 100644 gcc/testsuite/before.txt
create mode 100644 gcc/testsuite/fp_foo.c
create mode 100644 gcc/testsuite/test_script.sh
diff --git a/gcc/testsuite/after.s b/gcc/testsuite/after.s
new file mode 100644
index 00000000000..a32bb8b3614
--- /dev/null
+++ b/gcc/testsuite/after.s
@@ -0,0 +1,148 @@
+ .file 1 "fp_foo.c"
+ .section .mdebug.abi64
+ .previous
+ .nan 2008
+ .module fp=64
+ .module oddspreg
+ .module arch=mips64r6
+ .abicalls
+ .text
+ .align 2
+ .align 3
+ .globl daxpy
+ .set nomips16
+ .set nomicromips
+ .ent daxpy
+ .type daxpy, @function
+daxpy:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ blezc $6,.L7
+ dlsa $6,$6,$4,2
+ .align 3
+.L3:
+ lwc1 $f1,0($5)
+ daddiu $4,$4,4
+ lwc1 $f0,-4($4)
+ daddiu $5,$5,4
+ maddf.s $f0,$f1,$f15
+ bne $4,$6,.L3
+ swc1 $f0,-4($4)
+
+.L7:
+ jrc $31
+ .set macro
+ .set reorder
+ .end daxpy
+ .size daxpy, .-daxpy
+ .align 2
+ .align 3
+ .globl dgefa
+ .set nomips16
+ .set nomicromips
+ .ent dgefa
+ .type dgefa, @function
+dgefa:
+ .frame $sp,48,$31 # vars= 0, regs= 5/0, args= 0, gp= 0
+ .mask 0x100f0000,-8
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ li $2,1 # 0x1
+ bgec $2,$6,.L23
+ daddiu $sp,$sp,-48
+ addiu $14,$6,-1
+ move $10,$6
+ sd $19,32($sp)
+ sd $18,24($sp)
+ move $11,$4
+ sd $17,16($sp)
+ move $17,$5
+ sd $16,8($sp)
+ dlsa $9,$7,$4,2
+ addiu $19,$5,1
+ dsll $12,$5,2
+ move $25,$5
+ move $24,$0
+ move $13,$0
+ move $15,$0
+ move $18,$14
+ .align 3
+.L11:
+ addiu $7,$15,1
+ addiu $16,$15,1
+ daddiu $13,$13,1
+ move $15,$7
+ bgec $7,$10,.L15
+ daddiu $8,$24,1
+ daddu $6,$13,$25
+ dlsa $8,$8,$11,2
+ dsll $6,$6,2
+ move $5,$14
+ .align 3
+.L14:
+ daddu $2,$9,$6
+ daddu $4,$11,$6
+ lwc1 $f2,-4($2)
+ move $3,$0
+ move $2,$8
+ .align 3
+.L13:
+ lwc1 $f1,0($4)
+ daddiu $2,$2,4
+ lwc1 $f0,-4($2)
+ addiu $3,$3,1
+ daddiu $4,$4,4
+ maddf.s $f0,$f2,$f1
+ swc1 $f0,-4($2)
+ bltc $3,$5,.L13
+ addiu $7,$7,1
+ bne $10,$7,.L14
+ daddu $6,$6,$12
+
+.L15:
+ addiu $14,$14,-1
+ daddiu $9,$9,-4
+ addu $24,$24,$19
+ bne $18,$16,.L11
+ addu $25,$17,$25
+
+ ld $19,32($sp)
+ ld $18,24($sp)
+ ld $17,16($sp)
+ ld $16,8($sp)
+ jr $31
+ daddiu $sp,$sp,48
+
+.L23:
+ jrc $31
+ .set macro
+ .set reorder
+ .end dgefa
+ .size dgefa, .-dgefa
+ .section .text.startup,"ax",@progbits
+ .align 2
+ .align 3
+ .globl main
+ .set nomips16
+ .set nomicromips
+ .ent main
+ .type main, @function
+main:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ jr $31
+ move $2,$0
+
+ .set macro
+ .set reorder
+ .end main
+ .size main, .-main
+ .ident "GCC: (GNU) 14.0.1 20240214 (experimental)"
+ .section .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/after.txt b/gcc/testsuite/after.txt
new file mode 100644
index 00000000000..772f92d2b20
--- /dev/null
+++ b/gcc/testsuite/after.txt
@@ -0,0 +1,2792 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;; header 3, latch 6
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_12(D) + 4294967295
+;; upper_bound 2147483646
+;; likely_upper_bound 2147483646
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;; nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+ single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+ {
+ <bb 2> [local count: 118111600]:
+ if (n_12(D) > 0)
+ goto <bb 5>; [89.00%]
+ else
+ goto <bb 4>; [11.00%]
+
+ }
+ bb_5 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 5> [local count: 105119324]:
+
+ }
+ bb_7 (preds = {bb_3 }, succs = {bb_4 })
+ {
+ <bb 7> [local count: 105119324]:
+ # .MEM_22 = PHI <.MEM_16(3)>
+
+ }
+ bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+ {
+ <bb 4> [local count: 118111600]:
+ # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+ # VUSE <.MEM_29>
+ return;
+
+ }
+ loop_1 (header = 3, latch = 6, finite_p
+ niter (unsigned int) n_12(D) + 4294967295
+ upper_bound 2147483646
+ likely_upper_bound 2147483646
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+ {
+ bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+ {
+ <bb 3> [local count: 955630224]:
+ # i_20 = PHI <i_17(6), 0(5)>
+ # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+ _1 = (long unsigned int) i_20;
+ _2 = _1 * 4;
+ _3 = vector1_13(D) + _2;
+ # VUSE <.MEM_21>
+ _4 = *_3;
+ _5 = vector2_14(D) + _2;
+ # VUSE <.MEM_21>
+ _6 = *_5;
+ _7 = _6 * fp_const_15(D);
+ _8 = _4 + _7;
+ # .MEM_16 = VDEF <.MEM_21>
+ *_3 = _8;
+ i_17 = i_20 + 1;
+ if (n_12(D) > i_17)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 7>; [11.00%]
+
+ }
+ bb_6 (preds = {bb_3 }, succs = {bb_3 })
+ {
+ <bb 6> [local count: 850510900]:
+ goto <bb 3>; [100.00%]
+
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_12(D)
+ bounds on difference of bases: 0 ... 2147483646
+ result:
+ # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_17
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_20
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _4 = *_3;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_3 = _8;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _6 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (n_12(D) > i_17)
+ At pos: i_17
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.6
+ Var after: ivtmp.6
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.7
+ Var after: ivtmp.7
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.8
+ Var after: ivtmp.8
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.9
+ Var after: ivtmp.9
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10
+ Var after: ivtmp.10
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.11
+ Var after: ivtmp.11
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.12
+ Var after: ivtmp.12
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 7
+ Group 1: 0, 1, 2, 3, 5, 7
+ Group 2: 0, 1, 2, 3, 6
+
+<Candidate Costs>:
+ cand cost
+force_expr_to_var_cost size costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+force_expr_to_var_cost speed costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D) (eliminable)
+Inv 2: vector2_14(D) (eliminable)
+Inv 3: fp_const_15(D) (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 2: (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 18 2 NIL; 1
+ 2 18 4 NIL; 1
+ 4 2 0 NIL; NIL;
+ 7 10 2 NIL; 1
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 9 1 NIL; 2
+ 2 9 2 NIL; 2
+ 5 1 0 NIL; NIL;
+ 7 5 1 NIL; 2
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 4
+ 1 0 0 NIL; 4
+ 2 1 0 NIL; 4
+ 3 0 0 NIL; 4
+ 4 1 0 1; NIL;
+ 5 1 0 2; NIL;
+ 6 0 0 NIL; 4
+ 7 1 0 NIL; 4
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 37 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 27 (complexity 3)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(18,2)
+ group:1 --> iv_cand:1, cost=(9,1)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 26 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 3)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,2)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 1)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 26 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 3)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,2)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 1)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: ivtmp.9_28
+ Var after: ivtmp.9_27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10_25
+ Var after: ivtmp.10_24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;; header 8, latch 13
+;; depth 3, outer 2, finite_p
+;; niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;; nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+ single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ # VUSE <.MEM_57>
+ _35 = *_34;
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ # VUSE <.MEM_57>
+ _37 = *_36;
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ # .MEM_42 = VDEF <.MEM_57>
+ *_34 = _39;
+ i_40 = i_56 + 1;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 3
+ exit condition [1, + , 1](no_overflow) < _87
+ bounds on difference of bases: -2147483649 ... 2147483646
+ result:
+ zero if _87 <= 0
+ # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _21
+ Type: sizetype
+ Base: ((sizetype) _7 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _29
+ Type: sizetype
+ Base: ((sizetype) _11 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _32
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _33
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _34
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _36
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_40
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_56
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _35 = *_34;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_34 = _39;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _37 = *_36;
+ At pos: *_36
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: i_40
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.20
+ Var after: ivtmp.20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.21
+ Var after: ivtmp.21
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.22
+ Var after: ivtmp.22
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23
+ Var after: ivtmp.23
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.24
+ Var after: ivtmp.24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25
+ Var after: ivtmp.25
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.26
+ Var after: ivtmp.26
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.27
+ Var after: ivtmp.27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 9:
+ Var befor: ivtmp.28
+ Var after: ivtmp.28
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 9
+ Group 1: 0, 1, 2, 3, 6, 7, 9
+ Group 2: 0, 1, 2, 3, 8
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 6
+ 5 6
+ 6 6
+ 7 6
+ 8 5
+ 9 5
+
+
+<Invariant Vars>:
+Inv 6: _7 (eliminable)
+Inv 1: _10 (eliminable)
+Inv 7: _11 (eliminable)
+Inv 3: _14 (eliminable)
+Inv 2: vector_27(D) (eliminable)
+Inv 4: t_28 (eliminable)
+Inv 5: _87 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 2: ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 3: (unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4
+inv_expr 4: (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 5: ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 6: (unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 22 4 1; NIL;
+ 2 22 2 1; NIL;
+ 4 2 0 NIL; NIL;
+ 5 2 2 NIL; NIL;
+ 6 16 2 2; NIL;
+ 7 16 4 3; NIL;
+ 9 14 4 1; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 11 2 4; NIL;
+ 2 11 1 4; NIL;
+ 4 8 1 5; NIL;
+ 5 8 2 6; NIL;
+ 6 1 0 NIL; NIL;
+ 7 1 1 NIL; NIL;
+ 9 7 2 4; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 5
+ 1 0 0 NIL; 5
+ 2 4 0 NIL; 5
+ 3 0 0 NIL; 5
+ 8 4 0 NIL; 5
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 47 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 37 (complexity 3)
+ candidates: 2
+ group:0 --> iv_cand:2, cost=(22,2)
+ group:1 --> iv_cand:2, cost=(11,1)
+ group:2 --> iv_cand:2, cost=(4,0)
+ invariant variables: 5
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 31 (complexity 1)
+ reg_cost: 6
+ cand_cost: 11
+ cand_group_cost: 14 (complexity 1)
+ candidates: 2, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:2, cost=(4,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 1)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 7
+ cand_cost: 16
+ cand_group_cost: 3 (complexity 0)
+ candidates: 3, 4, 6
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:6, cost=(1,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions:
+
+Initial set of candidates:
+ cost: 37 (complexity 6)
+ reg_cost: 7
+ cand_cost: 9
+ cand_group_cost: 21 (complexity 6)
+ candidates: 3, 9
+ group:0 --> iv_cand:9, cost=(14,4)
+ group:1 --> iv_cand:9, cost=(7,2)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 26 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 1)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 7
+ cand_cost: 16
+ cand_group_cost: 3 (complexity 0)
+ candidates: 3, 4, 6
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:6, cost=(1,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions:
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 3 IVs:
+Candidate 3:
+ Var befor: i_56
+ Var after: i_40
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23_85
+ Var after: ivtmp.23_84
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25_78
+ Var after: ivtmp.25_77
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+;;
+;; Loop 2
+;; header 7, latch 12
+;; depth 2, outer 1, finite_p
+;; niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;; nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+ single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+ _75 = (sizetype) _11;
+ _74 = _75 + 1;
+ _73 = _74 * 4;
+ _72 = vector_27(D) + _73;
+ ivtmp.25_76 = (unsigned long) _72;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _71 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_71];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _69 = (void *) ivtmp.25_78;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_69];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _70 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_70] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ ivtmp.25_77 = ivtmp.25_78 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 2
+ exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+ number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: int
+ Base: (i_50 + 1) * m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + l_26(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: long unsigned int
+ Base: (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+ Step: (long unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _4
+ Type: long unsigned int
+ Base: ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _11
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + i_50
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _12
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _13
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _14
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_30
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: j_51
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _72
+ Type: float *
+ Base: vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _73
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _74
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _75
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: ivtmp.25_76
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: t_28 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (n_23(D) > j_30)
+ At pos: j_30
+ IV struct:
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: ivtmp.25_76 = (unsigned long) _72;
+ At pos:
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: _14 = _13 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.29
+ Var after: ivtmp.29
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.30
+ Var after: ivtmp.30
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.31
+ Var after: ivtmp.31
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: ivtmp.32
+ Var after: ivtmp.32
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 1)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.33
+ Var after: ivtmp.33
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+ Step: (unsigned long) ((long unsigned int) m_25(D) * 4)
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.34
+ Var after: ivtmp.34
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.35
+ Var after: ivtmp.35
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) i_50
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 8:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.36
+ Var after: ivtmp.36
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37
+ Var after: ivtmp.37
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4) + (unsigned long) vector_27(D)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38
+ Var after: ivtmp.38
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 11:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.39
+ Var after: ivtmp.39
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 12:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.40
+ Var after: ivtmp.40
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 12
+ Group 1: 0, 1, 2, 3, 4, 6, 7
+ Group 2: 0, 1, 2, 3, 4, 8, 9, 10, 11, 12
+ Group 3: 0, 1, 2, 3, 4, 10, 11, 12
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 6
+ 3 6
+ 4 4
+ 5 9
+ 6 5
+ 7 5
+ 8 10
+ 9 9
+ 10 10
+ 11 9
+ 12 5
+
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24 (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50 (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1: (long unsigned int) m_25(D) * 4
+inv_expr 2: ((unsigned long) l_26(D) - (unsigned long) i_50) * 4
+inv_expr 3: (unsigned long) i_50 * 18446744073709551612 + (unsigned long) l_26(D) * 4
+inv_expr 4: ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 5: ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 6: ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 7: (signed int) i_50 + 1
+inv_expr 8: (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 9: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 10: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 11: (((signed long) i_50 - (signed long) l_26(D)) + 1) * 4
+inv_expr 12: (signed long) vector_27(D) + 4
+inv_expr 13: (((signed long) ((i_50 + 1) * m_25(D)) * 4 + (signed long) vector_27(D)) + (signed long) i_50 * 4) + 4
+inv_expr 14: (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 15: 4 - (signed long) vector_27(D)
+inv_expr 16: (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 5 1 0 NIL; NIL;
+ 8 8 2 2; NIL;
+ 9 8 1 3; NIL;
+ 10 8 2 4; NIL;
+ 11 8 1 4; NIL;
+ 12 10 1 5; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 6; NIL;
+ 1 2 0 8; NIL;
+ 2 3 0 9; NIL;
+ 3 0 0 NIL; 7
+ 4 0 0 NIL; 7
+ 6 0 0 NIL; 7
+ 7 0 0 NIL; 7
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 5 6 0 11; NIL;
+ 8 0 0 NIL; NIL;
+ 9 4 0 NIL; NIL;
+ 10 4 0 NIL; NIL;
+ 11 4 0 12; NIL;
+ 12 9 0 13; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 5 7 0 14; NIL;
+ 8 8 0 NIL; NIL;
+ 9 4 0 15; NIL;
+ 10 0 0 NIL; NIL;
+ 11 4 0 NIL; NIL;
+ 12 9 0 16; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 35 (complexity 0)
+ reg_cost: 8
+ cand_cost: 13
+ cand_group_cost: 14 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:5, cost=(1,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:5, cost=(6,0)
+ group:3 --> iv_cand:5, cost=(7,0)
+ invariant variables: 7
+ invariant expressions: 1, 11, 14
+
+Improved to:
+ cost: 33 (complexity 2)
+ reg_cost: 7
+ cand_cost: 14
+ cand_group_cost: 12 (complexity 2)
+ candidates: 4, 10
+ group:0 --> iv_cand:10, cost=(8,2)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:10, cost=(4,0)
+ group:3 --> iv_cand:10, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 4
+
+Initial set of candidates:
+ cost: 33 (complexity 2)
+ reg_cost: 7
+ cand_cost: 14
+ cand_group_cost: 12 (complexity 2)
+ candidates: 4, 10
+ group:0 --> iv_cand:10, cost=(8,2)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:10, cost=(4,0)
+ group:3 --> iv_cand:10, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 4
+
+Original cost 33 (complexity 2)
+
+Final cost 33 (complexity 2)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: j_51
+ Var after: j_30
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38_68
+ Var after: ivtmp.38_67
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;; header 4, latch 11
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_23(D) + 4294967294
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;; nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+ single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+ _66 = (sizetype) m_25(D);
+ _65 = _66 * 4;
+ _63 = i_50 + 1;
+ _62 = m_25(D) * _63;
+ _61 = (sizetype) _62;
+ _60 = (sizetype) i_50;
+ _59 = _60 + _61;
+ _58 = _59 + 1;
+ ivtmp.38_64 = _58 * 4;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ # ivtmp.38_68 = PHI <ivtmp.38_67(12), ivtmp.38_64(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ _49 = (sizetype) i_50;
+ _48 = _49 * 18446744073709551612;
+ _47 = (sizetype) l_26(D);
+ _46 = _47 * 4;
+ _44 = _46 + _48;
+ _43 = vector_27(D) + _44;
+ _41 = _43 + 18446744073709551612;
+ _31 = _43 + ivtmp.38_68;
+ # VUSE <.MEM_52>
+ t_28 = MEM[(float *)_31 + -4B];
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = ivtmp.38_68;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+ _75 = (sizetype) _11;
+ _74 = _75 + 1;
+ _73 = _74 * 4;
+ _72 = vector_27(D) + _73;
+ _20 = (unsigned long) vector_27(D);
+ _19 = _20 + ivtmp.38_68;
+ ivtmp.25_76 = _19;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ ivtmp.38_67 = ivtmp.38_68 + _65;
+ if (j_30 != n_23(D))
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _71 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_71];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _69 = (void *) ivtmp.25_78;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_69];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _70 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_70] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ ivtmp.25_77 = ivtmp.25_78 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+ number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _6
+ Type: int
+ Base: 0
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _7
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_24
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _41
+ Type: float *
+ Base: vector_27(D) + ((sizetype) l_26(D) * 4 + 18446744073709551612)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _43
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _44
+ Type: sizetype
+ Base: (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _48
+ Type: sizetype
+ Base: 0
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _49
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_50
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _60
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _62
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _63
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _87
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: COMPARE
+ Use 0.0:
+ At stmt: if (n_23(D) > j_24)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (j_24 < _45)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: _87
+ IV struct:
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: j_24 = i_50 + 1;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 4:
+ Type: GENERIC
+ Use 4.0:
+ At stmt: _43 = vector_27(D) + _44;
+ At pos:
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Group 5:
+ Type: GENERIC
+ Use 5.0:
+ At stmt: i_50 = PHI <j_24(11), 0(10)>
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 6:
+ Type: GENERIC
+ Use 6.0:
+ At stmt: _7 = _6 + i_50;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 7:
+ Type: GENERIC
+ Use 7.0:
+ At stmt: _62 = m_25(D) * _63;
+ At pos:
+ IV struct:
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 8:
+ Type: GENERIC
+ Use 8.0:
+ At stmt: _60 = (sizetype) i_50;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.41
+ Var after: ivtmp.41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.42
+ Var after: ivtmp.42
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.43
+ Var after: ivtmp.43
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.44
+ Var after: ivtmp.44
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.45
+ Var after: ivtmp.45
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) n_23(D)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.46
+ Var after: ivtmp.46
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.47
+ Var after: ivtmp.47
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.48
+ Var after: ivtmp.48
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3
+ Group 1: 0, 1, 2, 3
+ Group 2: 0, 1, 2, 3, 4, 5
+ Group 3: 0, 1, 2, 3
+ Group 4: 0, 1, 2, 3, 6
+ Group 5: 0, 1, 2, 3
+ Group 6: 0, 1, 2, 3, 7
+ Group 7: 0, 1, 2, 3, 8
+ Group 8: 0, 1, 2, 3
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 6
+ 7 5
+ 8 5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned int) m_25(D) + 1
+inv_expr 2: (signed int) n_23(D) + 1
+inv_expr 3: (signed int) n_23(D) + -1
+inv_expr 4: (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 NIL; NIL;
+ 5 4 0 2; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; 2
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 0 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 3 0 NIL; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 80 0 3; NIL;
+ 1 80 0 3; NIL;
+ 2 80 0 NIL; NIL;
+ 3 80 0 NIL; NIL;
+ 4 0 0 NIL; NIL;
+ 5 80 0 NIL; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 NIL; NIL;
+ 5 4 0 2; NIL;
+
+Group 4:
+ cand cost compl. inv.expr. inv.vars
+ 1 17 0 4; NIL;
+ 6 0 0 NIL; NIL;
+
+Group 5:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; NIL;
+ 2 4 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 3; NIL;
+ 5 4 0 NIL; NIL;
+
+Group 6:
+ cand cost compl. inv.expr. inv.vars
+ 7 0 0 NIL; NIL;
+
+Group 7:
+ cand cost compl. inv.expr. inv.vars
+ 8 0 0 NIL; NIL;
+
+Group 8:
+ cand cost compl. inv.expr. inv.vars
+ 1 0 0 NIL; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 126 (complexity 0)
+ reg_cost: 10
+ cand_cost: 19
+ cand_group_cost: 97 (complexity 0)
+ candidates: 1, 3, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:3, cost=(80,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 53 (complexity 0)
+ reg_cost: 12
+ cand_cost: 24
+ cand_group_cost: 17 (complexity 0)
+ candidates: 1, 3, 4, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 6, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 55 (complexity 0)
+ reg_cost: 10
+ cand_cost: 20
+ cand_group_cost: 25 (complexity 0)
+ candidates: 1, 4, 7, 8
+ group:0 --> iv_cand:4, cost=(4,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:4, cost=(4,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:1, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 45 (complexity 0)
+ reg_cost: 11
+ cand_cost: 26
+ cand_group_cost: 8 (complexity 0)
+ candidates: 1, 4, 6, 7, 8
+ group:0 --> iv_cand:4, cost=(4,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:4, cost=(4,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 6, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+ Var befor: ivtmp.42_18
+ Var after: ivtmp.42_17
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: i_50
+ Var after: j_24
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.44_16
+ Var after: ivtmp.44_15
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.46_92
+ Var after: ivtmp.46_93
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.47_98
+ Var after: ivtmp.47_99
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.48_102
+ Var after: ivtmp.48_103
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/before.s b/gcc/testsuite/before.s
new file mode 100644
index 00000000000..e13834bdf59
--- /dev/null
+++ b/gcc/testsuite/before.s
@@ -0,0 +1,152 @@
+ .file 1 "fp_foo.c"
+ .section .mdebug.abi64
+ .previous
+ .nan 2008
+ .module fp=64
+ .module oddspreg
+ .module arch=mips64r6
+ .abicalls
+ .text
+ .align 2
+ .align 3
+ .globl daxpy
+ .set nomips16
+ .set nomicromips
+ .ent daxpy
+ .type daxpy, @function
+daxpy:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ blezc $6,.L7
+ dlsa $6,$6,$4,2
+ .align 3
+.L3:
+ lwc1 $f1,0($5)
+ daddiu $4,$4,4
+ lwc1 $f0,-4($4)
+ daddiu $5,$5,4
+ maddf.s $f0,$f1,$f15
+ bne $4,$6,.L3
+ swc1 $f0,-4($4)
+
+.L7:
+ jrc $31
+ .set macro
+ .set reorder
+ .end daxpy
+ .size daxpy, .-daxpy
+ .align 2
+ .align 3
+ .globl dgefa
+ .set nomips16
+ .set nomicromips
+ .ent dgefa
+ .type dgefa, @function
+dgefa:
+ .frame $sp,48,$31 # vars= 0, regs= 6/0, args= 0, gp= 0
+ .mask 0x101f0000,-8
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ li $2,1 # 0x1
+ bgec $2,$6,.L23
+ daddiu $sp,$sp,-48
+ addiu $14,$6,-1
+ move $11,$6
+ sd $20,32($sp)
+ sd $19,24($sp)
+ addiu $20,$5,1
+ sd $18,16($sp)
+ move $18,$4
+ sd $17,8($sp)
+ dlsa $10,$7,$4,2
+ sd $16,0($sp)
+ move $17,$5
+ dsll $12,$5,2
+ move $25,$5
+ move $13,$0
+ move $24,$0
+ move $15,$0
+ move $19,$14
+ .align 3
+.L11:
+ addiu $8,$15,1
+ addiu $16,$15,1
+ move $15,$8
+ bgec $8,$11,.L15
+ daddu $5,$25,$24
+ daddiu $9,$13,1
+ dsubu $6,$0,$13
+ dsll $5,$5,2
+ dlsa $9,$9,$18,2
+ dsll $6,$6,2
+ move $7,$14
+ .align 3
+.L14:
+ daddu $3,$10,$5
+ move $2,$9
+ lwc1 $f2,0($3)
+ move $4,$0
+ .align 3
+.L13:
+ daddu $3,$6,$2
+ lwc1 $f0,0($2)
+ daddu $3,$3,$5
+ daddiu $2,$2,4
+ lwc1 $f1,0($3)
+ addiu $4,$4,1
+ maddf.s $f0,$f2,$f1
+ swc1 $f0,-4($2)
+ bltc $4,$7,.L13
+ addiu $8,$8,1
+ bne $11,$8,.L14
+ daddu $5,$5,$12
+
+.L15:
+ daddiu $24,$24,1
+ addu $13,$20,$13
+ addiu $14,$14,-1
+ daddiu $10,$10,-4
+ bne $19,$16,.L11
+ addu $25,$17,$25
+
+ ld $20,32($sp)
+ ld $19,24($sp)
+ ld $18,16($sp)
+ ld $17,8($sp)
+ ld $16,0($sp)
+ jr $31
+ daddiu $sp,$sp,48
+
+.L23:
+ jrc $31
+ .set macro
+ .set reorder
+ .end dgefa
+ .size dgefa, .-dgefa
+ .section .text.startup,"ax",@progbits
+ .align 2
+ .align 3
+ .globl main
+ .set nomips16
+ .set nomicromips
+ .ent main
+ .type main, @function
+main:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ jr $31
+ move $2,$0
+
+ .set macro
+ .set reorder
+ .end main
+ .size main, .-main
+ .ident "GCC: (GNU) 14.0.1 20240214 (experimental)"
+ .section .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/before.txt b/gcc/testsuite/before.txt
new file mode 100644
index 00000000000..c87764b8ae9
--- /dev/null
+++ b/gcc/testsuite/before.txt
@@ -0,0 +1,2694 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;; header 3, latch 6
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_12(D) + 4294967295
+;; upper_bound 2147483646
+;; likely_upper_bound 2147483646
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;; nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+ single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+ {
+ <bb 2> [local count: 118111600]:
+ if (n_12(D) > 0)
+ goto <bb 5>; [89.00%]
+ else
+ goto <bb 4>; [11.00%]
+
+ }
+ bb_5 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 5> [local count: 105119324]:
+
+ }
+ bb_7 (preds = {bb_3 }, succs = {bb_4 })
+ {
+ <bb 7> [local count: 105119324]:
+ # .MEM_22 = PHI <.MEM_16(3)>
+
+ }
+ bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+ {
+ <bb 4> [local count: 118111600]:
+ # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+ # VUSE <.MEM_29>
+ return;
+
+ }
+ loop_1 (header = 3, latch = 6, finite_p
+ niter (unsigned int) n_12(D) + 4294967295
+ upper_bound 2147483646
+ likely_upper_bound 2147483646
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+ {
+ bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+ {
+ <bb 3> [local count: 955630224]:
+ # i_20 = PHI <i_17(6), 0(5)>
+ # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+ _1 = (long unsigned int) i_20;
+ _2 = _1 * 4;
+ _3 = vector1_13(D) + _2;
+ # VUSE <.MEM_21>
+ _4 = *_3;
+ _5 = vector2_14(D) + _2;
+ # VUSE <.MEM_21>
+ _6 = *_5;
+ _7 = _6 * fp_const_15(D);
+ _8 = _4 + _7;
+ # .MEM_16 = VDEF <.MEM_21>
+ *_3 = _8;
+ i_17 = i_20 + 1;
+ if (n_12(D) > i_17)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 7>; [11.00%]
+
+ }
+ bb_6 (preds = {bb_3 }, succs = {bb_3 })
+ {
+ <bb 6> [local count: 850510900]:
+ goto <bb 3>; [100.00%]
+
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_12(D)
+ bounds on difference of bases: 0 ... 2147483646
+ result:
+ # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_17
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_20
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _4 = *_3;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_3 = _8;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _6 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (n_12(D) > i_17)
+ At pos: i_17
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.6
+ Var after: ivtmp.6
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.7
+ Var after: ivtmp.7
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.8
+ Var after: ivtmp.8
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.9
+ Var after: ivtmp.9
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10
+ Var after: ivtmp.10
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.11
+ Var after: ivtmp.11
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.12
+ Var after: ivtmp.12
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 7
+ Group 1: 0, 1, 2, 3, 5, 7
+ Group 2: 0, 1, 2, 3, 6
+
+<Candidate Costs>:
+ cand cost
+force_expr_to_var_cost size costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+force_expr_to_var_cost speed costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D) (eliminable)
+Inv 2: vector2_14(D) (eliminable)
+Inv 3: fp_const_15(D) (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) vector1_13(D) + 18446744073709551612
+inv_expr 2: (unsigned long) vector2_14(D) + 18446744073709551612
+inv_expr 3: (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 4: (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 18 0 NIL; 1
+ 2 20 0 1; NIL;
+ 4 2 0 NIL; NIL;
+ 7 10 0 NIL; 1
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 9 0 NIL; 2
+ 2 10 0 2; NIL;
+ 5 1 0 NIL; NIL;
+ 7 5 0 NIL; 2
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 4
+ 1 0 0 NIL; 4
+ 2 1 0 NIL; 4
+ 3 0 0 NIL; 4
+ 4 1 0 3; NIL;
+ 5 1 0 4; NIL;
+ 6 0 0 NIL; 4
+ 7 1 0 NIL; 4
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 37 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 27 (complexity 0)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(18,0)
+ group:1 --> iv_cand:1, cost=(9,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 0)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 0)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 3
+
+Initial set of candidates:
+ cost: 26 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 0)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 0)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 3
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: ivtmp.9_28
+ Var after: ivtmp.9_27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10_25
+ Var after: ivtmp.10_24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;; header 8, latch 13
+;; depth 3, outer 2, finite_p
+;; niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;; nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+ single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ # VUSE <.MEM_57>
+ _35 = *_34;
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ # VUSE <.MEM_57>
+ _37 = *_36;
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ # .MEM_42 = VDEF <.MEM_57>
+ *_34 = _39;
+ i_40 = i_56 + 1;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 3
+ exit condition [1, + , 1](no_overflow) < _87
+ bounds on difference of bases: -2147483649 ... 2147483646
+ result:
+ zero if _87 <= 0
+ # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _21
+ Type: sizetype
+ Base: ((sizetype) _7 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _29
+ Type: sizetype
+ Base: ((sizetype) _11 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _32
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _33
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _34
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _36
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_40
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_56
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _35 = *_34;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_34 = _39;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _37 = *_36;
+ At pos: *_36
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: i_40
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.20
+ Var after: ivtmp.20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.21
+ Var after: ivtmp.21
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.22
+ Var after: ivtmp.22
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23
+ Var after: ivtmp.23
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.24
+ Var after: ivtmp.24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25
+ Var after: ivtmp.25
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.26
+ Var after: ivtmp.26
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.27
+ Var after: ivtmp.27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 9:
+ Var befor: ivtmp.28
+ Var after: ivtmp.28
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 9
+ Group 1: 0, 1, 2, 3, 6, 7, 9
+ Group 2: 0, 1, 2, 3, 8
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 6
+ 5 6
+ 6 6
+ 7 6
+ 8 5
+ 9 5
+
+
+<Invariant Vars>:
+Inv 6: _7 (eliminable)
+Inv 1: _10 (eliminable)
+Inv 7: _11 (eliminable)
+Inv 3: _14 (eliminable)
+Inv 2: vector_27(D) (eliminable)
+Inv 4: t_28 (eliminable)
+Inv 5: _87 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: ((unsigned long) _7 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 2: (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 3: ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 4: ((unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4) + 4
+inv_expr 5: ((unsigned long) _11 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 6: (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 7: ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 8: ((unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4) + 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 22 0 1; NIL;
+ 2 22 0 2; NIL;
+ 4 2 0 NIL; NIL;
+ 5 2 2 NIL; NIL;
+ 6 16 0 3; NIL;
+ 7 18 0 4; NIL;
+ 9 14 0 1; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 11 0 5; NIL;
+ 2 11 0 6; NIL;
+ 4 8 0 7; NIL;
+ 5 9 0 8; NIL;
+ 6 1 0 NIL; NIL;
+ 7 1 1 NIL; NIL;
+ 9 7 0 5; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 5
+ 1 0 0 NIL; 5
+ 2 4 0 NIL; 5
+ 3 0 0 NIL; 5
+ 8 4 0 NIL; 5
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 43 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 33 (complexity 0)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(22,0)
+ group:1 --> iv_cand:1, cost=(11,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 5
+
+Improved to:
+ cost: 27 (complexity 0)
+ reg_cost: 6
+ cand_cost: 11
+ cand_group_cost: 10 (complexity 0)
+ candidates: 1, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 0)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Initial set of candidates:
+ cost: 37 (complexity 0)
+ reg_cost: 7
+ cand_cost: 9
+ cand_group_cost: 21 (complexity 0)
+ candidates: 3, 9
+ group:0 --> iv_cand:9, cost=(14,0)
+ group:1 --> iv_cand:9, cost=(7,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 0)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 3:
+ Var befor: i_56
+ Var after: i_40
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23_85
+ Var after: ivtmp.23_84
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+ allowed multipliers:
+
+;;
+;; Loop 2
+;; header 7, latch 12
+;; depth 2, outer 1, finite_p
+;; niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;; nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+ single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _78 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_78];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _76 = (sizetype) _7;
+ _75 = _76 * 18446744073709551612;
+ _74 = _75 + ivtmp.23_85;
+ _73 = (void *) _74;
+ _72 = (sizetype) _11;
+ _71 = _72 * 4;
+ _70 = _73 + _71;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_70];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _77 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_77] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 2
+ exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+ number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: int
+ Base: (i_50 + 1) * m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + l_26(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: long unsigned int
+ Base: (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+ Step: (long unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _4
+ Type: long unsigned int
+ Base: ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _11
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + i_50
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _12
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _13
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _14
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_30
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: j_51
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _71
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _72
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: t_28 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (n_23(D) > j_30)
+ At pos: j_30
+ IV struct:
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: _14 = _13 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: _71 = _72 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.29
+ Var after: ivtmp.29
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.30
+ Var after: ivtmp.30
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.31
+ Var after: ivtmp.31
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: ivtmp.32
+ Var after: ivtmp.32
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 1)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.33
+ Var after: ivtmp.33
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+ Step: (unsigned long) ((long unsigned int) m_25(D) * 4)
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.34
+ Var after: ivtmp.34
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.35
+ Var after: ivtmp.35
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) i_50
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 8:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.36
+ Var after: ivtmp.36
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37
+ Var after: ivtmp.37
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38
+ Var after: ivtmp.38
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 10
+ Group 1: 0, 1, 2, 3, 4, 6, 7
+ Group 2: 0, 1, 2, 3, 4, 8, 9, 10
+ Group 3: 0, 1, 2, 3, 4, 9, 10
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 6
+ 3 6
+ 4 4
+ 5 9
+ 6 5
+ 7 5
+ 8 10
+ 9 9
+ 10 5
+
+Scaling cost based on bb prob by 8.00: 6 (scratch: 2) -> 34
+Scaling cost based on bb prob by 8.00: 4 (scratch: 0) -> 32
+Scaling cost based on bb prob by 8.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 8.00: 8 (scratch: 4) -> 36
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24 (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50 (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1: (long unsigned int) m_25(D) * 4
+inv_expr 2: (((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4) + 18446744073709551612
+inv_expr 3: ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 4: ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 5: ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 6: (signed int) i_50 + 1
+inv_expr 7: (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 8: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 9: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 10: (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 11: (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+inv_expr 12: ((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4
+inv_expr 13: ((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 5 1 0 NIL; NIL;
+ 8 9 0 2; NIL;
+ 9 8 0 3; NIL;
+ 10 10 0 4; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 5; NIL;
+ 1 2 0 7; NIL;
+ 2 3 0 8; NIL;
+ 3 0 0 NIL; 7
+ 4 0 0 NIL; 7
+ 6 0 0 NIL; 7
+ 7 0 0 NIL; 7
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 5 7 0 10; NIL;
+ 8 0 0 NIL; NIL;
+ 9 4 0 NIL; NIL;
+ 10 9 0 11; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 5 34 0 12; NIL;
+ 8 32 0 NIL; NIL;
+ 9 0 0 NIL; NIL;
+ 10 36 0 13; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 63 (complexity 0)
+ reg_cost: 8
+ cand_cost: 13
+ cand_group_cost: 42 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:5, cost=(1,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:5, cost=(7,0)
+ group:3 --> iv_cand:5, cost=(34,0)
+ invariant variables: 7
+ invariant expressions: 1, 10, 12
+
+Improved to:
+ cost: 32 (complexity 0)
+ reg_cost: 7
+ cand_cost: 13
+ cand_group_cost: 12 (complexity 0)
+ candidates: 4, 9
+ group:0 --> iv_cand:9, cost=(8,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:9, cost=(4,0)
+ group:3 --> iv_cand:9, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 3
+
+Initial set of candidates:
+ cost: 32 (complexity 0)
+ reg_cost: 7
+ cand_cost: 13
+ cand_group_cost: 12 (complexity 0)
+ candidates: 4, 9
+ group:0 --> iv_cand:9, cost=(8,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:9, cost=(4,0)
+ group:3 --> iv_cand:9, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 3
+
+Original cost 32 (complexity 0)
+
+Final cost 32 (complexity 0)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: j_51
+ Var after: j_30
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37_69
+ Var after: ivtmp.37_68
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;; header 4, latch 11
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_23(D) + 4294967294
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;; nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+ single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+ _67 = (sizetype) m_25(D);
+ _66 = _67 * 4;
+ _64 = i_50 + 1;
+ _63 = m_25(D) * _64;
+ _62 = (sizetype) _63;
+ _61 = (sizetype) i_50;
+ _60 = _61 + _62;
+ ivtmp.37_65 = _60 * 4;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ # ivtmp.37_69 = PHI <ivtmp.37_68(12), ivtmp.37_65(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ _59 = (sizetype) i_50;
+ _58 = _59 * 18446744073709551612;
+ _49 = (sizetype) l_26(D);
+ _48 = _49 * 4;
+ _47 = _48 + _58;
+ _46 = vector_27(D) + _47;
+ _44 = _46 + ivtmp.37_69;
+ # VUSE <.MEM_52>
+ t_28 = MEM[(float *)_44];
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = ivtmp.37_69 + 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ ivtmp.37_68 = ivtmp.37_69 + _66;
+ if (j_30 != n_23(D))
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _78 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_78];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _76 = (sizetype) _7;
+ _75 = _76 * 18446744073709551612;
+ _74 = _75 + ivtmp.23_85;
+ _73 = (void *) _74;
+ _72 = (sizetype) _11;
+ _71 = ivtmp.37_69;
+ _70 = _73 + _71;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_70];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _77 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_77] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+ number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _6
+ Type: int
+ Base: 0
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _7
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_24
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _46
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _47
+ Type: sizetype
+ Base: (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: i_50
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _58
+ Type: sizetype
+ Base: 0
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _59
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _61
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _63
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _64
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _87
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: COMPARE
+ Use 0.0:
+ At stmt: if (n_23(D) > j_24)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (j_24 < _45)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: _7 = _6 + i_50;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: COMPARE
+ Use 3.0:
+ At stmt: if (i_40 < _87)
+ At pos: _87
+ IV struct:
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 4:
+ Type: GENERIC
+ Use 4.0:
+ At stmt: j_24 = i_50 + 1;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 5:
+ Type: GENERIC
+ Use 5.0:
+ At stmt: _46 = vector_27(D) + _47;
+ At pos:
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Group 6:
+ Type: GENERIC
+ Use 6.0:
+ At stmt: i_50 = PHI <j_24(11), 0(10)>
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 7:
+ Type: GENERIC
+ Use 7.0:
+ At stmt: _63 = m_25(D) * _64;
+ At pos:
+ IV struct:
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 8:
+ Type: GENERIC
+ Use 8.0:
+ At stmt: _61 = (sizetype) i_50;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.39
+ Var after: ivtmp.39
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.40
+ Var after: ivtmp.40
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.41
+ Var after: ivtmp.41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.42
+ Var after: ivtmp.42
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.43
+ Var after: ivtmp.43
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.44
+ Var after: ivtmp.44
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) n_23(D)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.45
+ Var after: ivtmp.45
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.46
+ Var after: ivtmp.46
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3
+ Group 1: 0, 1, 2, 3
+ Group 2: 0, 1, 2, 3, 4
+ Group 3: 0, 1, 2, 3, 5, 6
+ Group 4: 0, 1, 2, 3
+ Group 5: 0, 1, 2, 3, 7
+ Group 6: 0, 1, 2, 3
+ Group 7: 0, 1, 2, 3, 8
+ Group 8: 0, 1, 2, 3
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 6
+ 8 5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned int) m_25(D) + 1
+inv_expr 2: (signed int) n_23(D) + 1
+inv_expr 3: (signed int) n_23(D) + -1
+inv_expr 4: (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 NIL; NIL;
+ 6 4 0 2; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; 2
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 0 0 NIL; NIL;
+ 7 3 0 NIL; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 4 0 0 NIL; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 0 80 0 3; NIL;
+ 1 80 0 3; NIL;
+ 2 80 0 NIL; NIL;
+ 3 80 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 80 0 NIL; NIL;
+
+Group 4:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 NIL; NIL;
+ 6 4 0 2; NIL;
+
+Group 5:
+ cand cost compl. inv.expr. inv.vars
+ 1 17 0 4; NIL;
+ 7 0 0 NIL; NIL;
+
+Group 6:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; NIL;
+ 2 4 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 3; NIL;
+ 6 4 0 NIL; NIL;
+
+Group 7:
+ cand cost compl. inv.expr. inv.vars
+ 8 0 0 NIL; NIL;
+
+Group 8:
+ cand cost compl. inv.expr. inv.vars
+ 1 0 0 NIL; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 126 (complexity 0)
+ reg_cost: 10
+ cand_cost: 19
+ cand_group_cost: 97 (complexity 0)
+ candidates: 1, 3, 4, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(80,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 53 (complexity 0)
+ reg_cost: 12
+ cand_cost: 24
+ cand_group_cost: 17 (complexity 0)
+ candidates: 1, 3, 4, 5, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 5, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 55 (complexity 0)
+ reg_cost: 10
+ cand_cost: 20
+ cand_group_cost: 25 (complexity 0)
+ candidates: 1, 4, 5, 8
+ group:0 --> iv_cand:5, cost=(4,0)
+ group:1 --> iv_cand:5, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:5, cost=(4,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:1, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 45 (complexity 0)
+ reg_cost: 11
+ cand_cost: 26
+ cand_group_cost: 8 (complexity 0)
+ candidates: 1, 4, 5, 7, 8
+ group:0 --> iv_cand:5, cost=(4,0)
+ group:1 --> iv_cand:5, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:5, cost=(4,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:1, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 5, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+ Var befor: ivtmp.40_43
+ Var after: ivtmp.40_41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: i_50
+ Var after: j_24
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.42_31
+ Var after: ivtmp.42_20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.43_17
+ Var after: ivtmp.43_16
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.45_91
+ Var after: ivtmp.45_92
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.46_97
+ Var after: ivtmp.46_98
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/fp_foo.c b/gcc/testsuite/fp_foo.c
new file mode 100644
index 00000000000..f65f43d6435
--- /dev/null
+++ b/gcc/testsuite/fp_foo.c
@@ -0,0 +1,19 @@
+
+void daxpy(float *vector1, float *vector2, int n, float fp_const){
+ for (int i = 0; i < n; ++i)
+ vector1[i] += fp_const * vector2[i];
+}
+
+void dgefa(float *vector, int m, int n, int l){
+ for (int i = 0; i < n - 1; ++i){
+ for (int j = i + 1; j < n; ++j){
+ float t = vector[m * j + l];
+ daxpy(&vector[m * i + i + 1],
+ &vector[m * j + i + 1], n - (i + 1), t);
+ }
+ }
+}
+
+int main(){
+ return 0;
+}
diff --git a/gcc/testsuite/test_script.sh b/gcc/testsuite/test_script.sh
new file mode 100644
index 00000000000..4f19d248efe
--- /dev/null
+++ b/gcc/testsuite/test_script.sh
@@ -0,0 +1,10 @@
+export PREFIX="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install"
+export SOURCE_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/source"
+export BUILD_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/build"
+export SYSROOT="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install/sys_root"
+export PATH=$PREFIX/bin:$PATH
+export TARGET=mips64-r6-linux-gnu
+
+
+$PREFIX/bin/mips64-r6-linux-gnu-gcc fp_foo.c -O2 >out.txt -S -o fp_foo.s -march=mips64r6 -mabi=64
+
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7cae5bdefea..2dec5001dca 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4724,7 +4724,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
rtx addr;
bool simple_inv = true;
tree comp_inv = NULL_TREE, type = aff_var->type;
- comp_cost var_cost = no_cost, cost = no_cost;
+ comp_cost var_cost = no_cost, cost = no_cost, autoinc_cost = no_cost;
+ comp_cost acost = no_cost;
struct mem_address parts = {NULL_TREE, integer_one_node,
NULL_TREE, NULL_TREE, NULL_TREE};
machine_mode addr_mode = TYPE_MODE (type);
@@ -4755,38 +4756,36 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (!ok_with_ratio_p)
parts.step = NULL_TREE;
}
- if (ok_with_ratio_p || ok_without_ratio_p)
+ if (!(ok_with_ratio_p || ok_without_ratio_p))
+ parts.index = NULL_TREE;
+
+ if (maybe_ne (aff_inv->offset, 0))
{
- if (maybe_ne (aff_inv->offset, 0))
- {
- parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
- /* Addressing mode "base + index [<< scale] + offset". */
- if (!valid_mem_ref_p (mem_mode, as, &parts, code))
- parts.offset = NULL_TREE;
- else
- aff_inv->offset = 0;
- }
+ parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
+ /* Addressing mode "base + index[<< scale] + offset". */
+ if (!valid_mem_ref_p (mem_mode, as, &parts, code))
+ parts.offset = NULL_TREE;
+ else
+ aff_inv->offset = 0;
+ }
- move_fixed_address_to_symbol (&parts, aff_inv);
- /* Base is fixed address and is moved to symbol part. */
- if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
- parts.base = NULL_TREE;
+ move_fixed_address_to_symbol (&parts, aff_inv);
+ /* Base is fixed address and is moved to symbol part. */
+ if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
+ parts.base = NULL_TREE;
- /* Addressing mode "symbol + base + index [<< scale] [+ offset]". */
- if (parts.symbol != NULL_TREE
- && !valid_mem_ref_p (mem_mode, as, &parts, code))
- {
- aff_combination_add_elt (aff_inv, parts.symbol, 1);
- parts.symbol = NULL_TREE;
- /* Reset SIMPLE_INV since symbol address needs to be computed
- outside of address expression in this case. */
- simple_inv = false;
- /* Symbol part is moved back to base part, it can't be NULL. */
- parts.base = integer_one_node;
- }
+ /* Addressing mode "symbol + base + index[<< scale] [+ offset]". */
+ if (parts.symbol != NULL_TREE
+ && !valid_mem_ref_p (mem_mode, as, &parts, code))
+ {
+ aff_combination_add_elt (aff_inv, parts.symbol, 1);
+ parts.symbol = NULL_TREE;
+ /* Reset SIMPLE_INV since symbol address needs to be computed
+ outside of address expression in this case. */
+ simple_inv = false;
+ /* Symbol part is moved back to base part, it can't be NULL. */
+ parts.base = integer_one_node;
}
- else
- parts.index = NULL_TREE;
}
else
{
@@ -4799,14 +4798,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (stmt_after_increment (data->current_loop, cand, use->stmt))
ainc_offset += ainc_step;
- cost = get_address_cost_ainc (ainc_step, ainc_offset,
+ autoinc_cost = get_address_cost_ainc (ainc_step, ainc_offset,
addr_mode, mem_mode, as, speed);
- if (!cost.infinite_cost_p ())
- {
- *can_autoinc = true;
- return cost;
- }
- cost = no_cost;
+ if (!autoinc_cost.infinite_cost_p ())
+ *can_autoinc = true;
+ else
+ autoinc_cost = no_cost;
}
if (!aff_combination_zero_p (aff_inv))
{
@@ -4852,10 +4849,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
cost += var_cost;
addr = addr_for_mem_ref (&parts, as, false);
gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
- cost += address_cost (addr, mem_mode, as, speed);
+ acost += address_cost (addr, mem_mode, as, speed);
if (parts.symbol != NULL_TREE)
cost.complexity += 1;
+ /* var_present. */
+ else if (!aff_combination_const_p (aff_inv))
+ cost.complexity += 1;
/* Don't increase the complexity of adding a scaled index if it's
the only kind of index that the target allows. */
if (parts.step != NULL_TREE && ok_without_ratio_p)
@@ -4865,6 +4865,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
cost.complexity += 1;
+ cost += (can_autoinc && *can_autoinc) ? autoinc_cost : acost;
return cost;
}
--
2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
@ 2024-03-18 20:27 Aleksandar Rakic
2024-04-15 13:30 ` Aleksandar Rakic
0 siblings, 1 reply; 24+ messages in thread
From: Aleksandar Rakic @ 2024-03-18 20:27 UTC (permalink / raw)
To: gcc-patches
Cc: Jovan Dmitrovic, richard.guenther, Djordje Todorovic,
jeffreyalaw, Uros Beric
From dbf49f2872efcc14d2ea41eb7d616498dca9789f Mon Sep 17 00:00:00 2001
From: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
Date: Tue, 5 Mar 2024 11:55:01 +0100
Subject: [PATCH] ivopts: Fixed bug 109429
This patch modifies the order of the complexity calculation. By fixing the
complexities, the candidate selection is also fixed, which leads to the smaller
code size.
This patch also fixes the complexity if the variable is present in
the address expression, similarly to the variable 'var_present' in the
commit c2b64ce.
It also differentiates the adding of the autoinc_cost and the address
cost (acost) to the cost, similarly to the commit c2b64ce.
It also contains the C test and the script that generates the
assembly file and the output of the compiler. The assembly code
obtained after the modification of the file tree-ssa-loop-ivopts.cc is
smaller in size than the assembly code obtained before that. The output
of the compiler shows the difference in complexities for the function dgefa
for the loop 3 for the group 1.
This patch is available on the gcc fork on the following address:
https://github.com/rakicaleksandar1999/gcc/tree/bug_109429.
The description of the bug 109429 is on the following address:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429.
gcc/ChangeLog:
* tree-ssa-loop-ivopts.cc (get_address_cost): Fixed the
complexities calculation.
gcc/testsuite/ChangeLog:
* after.s: The assembly file obtained by compiling the fp_foo.c
file after modification of the tree-ssa-loop-ivopts.cc file.
* after.txt: The compiler-generated output obtained by compiling
the fp_foo.c file after modification of the
tree-ssa-loop-ivopts.cc file.
* before.s: The assembly file obtained by compiling the fp_foo.c
file before modification of the tree-ssa-loop-ivopts.cc file.
* before.txt: The compiler-generated output obtained by compiling
the fp_foo.c file before modification of the
tree-ssa-loop-ivopts.cc file.
* fp_foo.c: The C test.
* test_script.sh: The script used for compiling the fp_foo.c file.
Signed-off-by: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
---
gcc/testsuite/after.s | 148 ++
gcc/testsuite/after.txt | 2792 ++++++++++++++++++++++++++++++++++
gcc/testsuite/before.s | 152 ++
gcc/testsuite/before.txt | 2694 ++++++++++++++++++++++++++++++++
gcc/testsuite/fp_foo.c | 19 +
gcc/testsuite/test_script.sh | 10 +
gcc/tree-ssa-loop-ivopts.cc | 75 +-
7 files changed, 5853 insertions(+), 37 deletions(-)
create mode 100644 gcc/testsuite/after.s
create mode 100644 gcc/testsuite/after.txt
create mode 100644 gcc/testsuite/before.s
create mode 100644 gcc/testsuite/before.txt
create mode 100644 gcc/testsuite/fp_foo.c
create mode 100644 gcc/testsuite/test_script.sh
diff --git a/gcc/testsuite/after.s b/gcc/testsuite/after.s
new file mode 100644
index 00000000000..a32bb8b3614
--- /dev/null
+++ b/gcc/testsuite/after.s
@@ -0,0 +1,148 @@
+ .file 1 "fp_foo.c"
+ .section .mdebug.abi64
+ .previous
+ .nan 2008
+ .module fp=64
+ .module oddspreg
+ .module arch=mips64r6
+ .abicalls
+ .text
+ .align 2
+ .align 3
+ .globl daxpy
+ .set nomips16
+ .set nomicromips
+ .ent daxpy
+ .type daxpy, @function
+daxpy:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ blezc $6,.L7
+ dlsa $6,$6,$4,2
+ .align 3
+.L3:
+ lwc1 $f1,0($5)
+ daddiu $4,$4,4
+ lwc1 $f0,-4($4)
+ daddiu $5,$5,4
+ maddf.s $f0,$f1,$f15
+ bne $4,$6,.L3
+ swc1 $f0,-4($4)
+
+.L7:
+ jrc $31
+ .set macro
+ .set reorder
+ .end daxpy
+ .size daxpy, .-daxpy
+ .align 2
+ .align 3
+ .globl dgefa
+ .set nomips16
+ .set nomicromips
+ .ent dgefa
+ .type dgefa, @function
+dgefa:
+ .frame $sp,48,$31 # vars= 0, regs= 5/0, args= 0, gp= 0
+ .mask 0x100f0000,-8
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ li $2,1 # 0x1
+ bgec $2,$6,.L23
+ daddiu $sp,$sp,-48
+ addiu $14,$6,-1
+ move $10,$6
+ sd $19,32($sp)
+ sd $18,24($sp)
+ move $11,$4
+ sd $17,16($sp)
+ move $17,$5
+ sd $16,8($sp)
+ dlsa $9,$7,$4,2
+ addiu $19,$5,1
+ dsll $12,$5,2
+ move $25,$5
+ move $24,$0
+ move $13,$0
+ move $15,$0
+ move $18,$14
+ .align 3
+.L11:
+ addiu $7,$15,1
+ addiu $16,$15,1
+ daddiu $13,$13,1
+ move $15,$7
+ bgec $7,$10,.L15
+ daddiu $8,$24,1
+ daddu $6,$13,$25
+ dlsa $8,$8,$11,2
+ dsll $6,$6,2
+ move $5,$14
+ .align 3
+.L14:
+ daddu $2,$9,$6
+ daddu $4,$11,$6
+ lwc1 $f2,-4($2)
+ move $3,$0
+ move $2,$8
+ .align 3
+.L13:
+ lwc1 $f1,0($4)
+ daddiu $2,$2,4
+ lwc1 $f0,-4($2)
+ addiu $3,$3,1
+ daddiu $4,$4,4
+ maddf.s $f0,$f2,$f1
+ swc1 $f0,-4($2)
+ bltc $3,$5,.L13
+ addiu $7,$7,1
+ bne $10,$7,.L14
+ daddu $6,$6,$12
+
+.L15:
+ addiu $14,$14,-1
+ daddiu $9,$9,-4
+ addu $24,$24,$19
+ bne $18,$16,.L11
+ addu $25,$17,$25
+
+ ld $19,32($sp)
+ ld $18,24($sp)
+ ld $17,16($sp)
+ ld $16,8($sp)
+ jr $31
+ daddiu $sp,$sp,48
+
+.L23:
+ jrc $31
+ .set macro
+ .set reorder
+ .end dgefa
+ .size dgefa, .-dgefa
+ .section .text.startup,"ax",@progbits
+ .align 2
+ .align 3
+ .globl main
+ .set nomips16
+ .set nomicromips
+ .ent main
+ .type main, @function
+main:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ jr $31
+ move $2,$0
+
+ .set macro
+ .set reorder
+ .end main
+ .size main, .-main
+ .ident "GCC: (GNU) 14.0.1 20240214 (experimental)"
+ .section .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/after.txt b/gcc/testsuite/after.txt
new file mode 100644
index 00000000000..772f92d2b20
--- /dev/null
+++ b/gcc/testsuite/after.txt
@@ -0,0 +1,2792 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;; header 3, latch 6
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_12(D) + 4294967295
+;; upper_bound 2147483646
+;; likely_upper_bound 2147483646
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;; nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+ single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+ {
+ <bb 2> [local count: 118111600]:
+ if (n_12(D) > 0)
+ goto <bb 5>; [89.00%]
+ else
+ goto <bb 4>; [11.00%]
+
+ }
+ bb_5 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 5> [local count: 105119324]:
+
+ }
+ bb_7 (preds = {bb_3 }, succs = {bb_4 })
+ {
+ <bb 7> [local count: 105119324]:
+ # .MEM_22 = PHI <.MEM_16(3)>
+
+ }
+ bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+ {
+ <bb 4> [local count: 118111600]:
+ # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+ # VUSE <.MEM_29>
+ return;
+
+ }
+ loop_1 (header = 3, latch = 6, finite_p
+ niter (unsigned int) n_12(D) + 4294967295
+ upper_bound 2147483646
+ likely_upper_bound 2147483646
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+ {
+ bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+ {
+ <bb 3> [local count: 955630224]:
+ # i_20 = PHI <i_17(6), 0(5)>
+ # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+ _1 = (long unsigned int) i_20;
+ _2 = _1 * 4;
+ _3 = vector1_13(D) + _2;
+ # VUSE <.MEM_21>
+ _4 = *_3;
+ _5 = vector2_14(D) + _2;
+ # VUSE <.MEM_21>
+ _6 = *_5;
+ _7 = _6 * fp_const_15(D);
+ _8 = _4 + _7;
+ # .MEM_16 = VDEF <.MEM_21>
+ *_3 = _8;
+ i_17 = i_20 + 1;
+ if (n_12(D) > i_17)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 7>; [11.00%]
+
+ }
+ bb_6 (preds = {bb_3 }, succs = {bb_3 })
+ {
+ <bb 6> [local count: 850510900]:
+ goto <bb 3>; [100.00%]
+
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_12(D)
+ bounds on difference of bases: 0 ... 2147483646
+ result:
+ # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_17
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_20
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _4 = *_3;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_3 = _8;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _6 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (n_12(D) > i_17)
+ At pos: i_17
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.6
+ Var after: ivtmp.6
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.7
+ Var after: ivtmp.7
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.8
+ Var after: ivtmp.8
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.9
+ Var after: ivtmp.9
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10
+ Var after: ivtmp.10
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.11
+ Var after: ivtmp.11
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.12
+ Var after: ivtmp.12
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 7
+ Group 1: 0, 1, 2, 3, 5, 7
+ Group 2: 0, 1, 2, 3, 6
+
+<Candidate Costs>:
+ cand cost
+force_expr_to_var_cost size costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+force_expr_to_var_cost speed costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D) (eliminable)
+Inv 2: vector2_14(D) (eliminable)
+Inv 3: fp_const_15(D) (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 2: (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 18 2 NIL; 1
+ 2 18 4 NIL; 1
+ 4 2 0 NIL; NIL;
+ 7 10 2 NIL; 1
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 9 1 NIL; 2
+ 2 9 2 NIL; 2
+ 5 1 0 NIL; NIL;
+ 7 5 1 NIL; 2
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 4
+ 1 0 0 NIL; 4
+ 2 1 0 NIL; 4
+ 3 0 0 NIL; 4
+ 4 1 0 1; NIL;
+ 5 1 0 2; NIL;
+ 6 0 0 NIL; 4
+ 7 1 0 NIL; 4
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 37 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 27 (complexity 3)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(18,2)
+ group:1 --> iv_cand:1, cost=(9,1)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 26 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 3)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,2)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 1)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 26 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 3)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,2)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 1)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,1)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: ivtmp.9_28
+ Var after: ivtmp.9_27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10_25
+ Var after: ivtmp.10_24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;; header 8, latch 13
+;; depth 3, outer 2, finite_p
+;; niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;; nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+ single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ # VUSE <.MEM_57>
+ _35 = *_34;
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ # VUSE <.MEM_57>
+ _37 = *_36;
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ # .MEM_42 = VDEF <.MEM_57>
+ *_34 = _39;
+ i_40 = i_56 + 1;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 3
+ exit condition [1, + , 1](no_overflow) < _87
+ bounds on difference of bases: -2147483649 ... 2147483646
+ result:
+ zero if _87 <= 0
+ # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _21
+ Type: sizetype
+ Base: ((sizetype) _7 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _29
+ Type: sizetype
+ Base: ((sizetype) _11 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _32
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _33
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _34
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _36
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_40
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_56
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _35 = *_34;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_34 = _39;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _37 = *_36;
+ At pos: *_36
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: i_40
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.20
+ Var after: ivtmp.20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.21
+ Var after: ivtmp.21
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.22
+ Var after: ivtmp.22
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23
+ Var after: ivtmp.23
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.24
+ Var after: ivtmp.24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25
+ Var after: ivtmp.25
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.26
+ Var after: ivtmp.26
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.27
+ Var after: ivtmp.27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 9:
+ Var befor: ivtmp.28
+ Var after: ivtmp.28
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 9
+ Group 1: 0, 1, 2, 3, 6, 7, 9
+ Group 2: 0, 1, 2, 3, 8
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 6
+ 5 6
+ 6 6
+ 7 6
+ 8 5
+ 9 5
+
+
+<Invariant Vars>:
+Inv 6: _7 (eliminable)
+Inv 1: _10 (eliminable)
+Inv 7: _11 (eliminable)
+Inv 3: _14 (eliminable)
+Inv 2: vector_27(D) (eliminable)
+Inv 4: t_28 (eliminable)
+Inv 5: _87 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 2: ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 3: (unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4
+inv_expr 4: (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 5: ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 6: (unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 22 4 1; NIL;
+ 2 22 2 1; NIL;
+ 4 2 0 NIL; NIL;
+ 5 2 2 NIL; NIL;
+ 6 16 2 2; NIL;
+ 7 16 4 3; NIL;
+ 9 14 4 1; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 11 2 4; NIL;
+ 2 11 1 4; NIL;
+ 4 8 1 5; NIL;
+ 5 8 2 6; NIL;
+ 6 1 0 NIL; NIL;
+ 7 1 1 NIL; NIL;
+ 9 7 2 4; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 5
+ 1 0 0 NIL; 5
+ 2 4 0 NIL; 5
+ 3 0 0 NIL; 5
+ 8 4 0 NIL; 5
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 47 (complexity 3)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 37 (complexity 3)
+ candidates: 2
+ group:0 --> iv_cand:2, cost=(22,2)
+ group:1 --> iv_cand:2, cost=(11,1)
+ group:2 --> iv_cand:2, cost=(4,0)
+ invariant variables: 5
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 31 (complexity 1)
+ reg_cost: 6
+ cand_cost: 11
+ cand_group_cost: 14 (complexity 1)
+ candidates: 2, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:2, cost=(4,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 1)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 7
+ cand_cost: 16
+ cand_group_cost: 3 (complexity 0)
+ candidates: 3, 4, 6
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:6, cost=(1,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions:
+
+Initial set of candidates:
+ cost: 37 (complexity 6)
+ reg_cost: 7
+ cand_cost: 9
+ cand_group_cost: 21 (complexity 6)
+ candidates: 3, 9
+ group:0 --> iv_cand:9, cost=(14,4)
+ group:1 --> iv_cand:9, cost=(7,2)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 26 (complexity 1)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 1)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,1)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 7
+ cand_cost: 16
+ cand_group_cost: 3 (complexity 0)
+ candidates: 3, 4, 6
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:6, cost=(1,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions:
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 3 IVs:
+Candidate 3:
+ Var befor: i_56
+ Var after: i_40
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23_85
+ Var after: ivtmp.23_84
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25_78
+ Var after: ivtmp.25_77
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+;;
+;; Loop 2
+;; header 7, latch 12
+;; depth 2, outer 1, finite_p
+;; niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;; nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+ single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+ _75 = (sizetype) _11;
+ _74 = _75 + 1;
+ _73 = _74 * 4;
+ _72 = vector_27(D) + _73;
+ ivtmp.25_76 = (unsigned long) _72;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _71 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_71];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _69 = (void *) ivtmp.25_78;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_69];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _70 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_70] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ ivtmp.25_77 = ivtmp.25_78 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 2
+ exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+ number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: int
+ Base: (i_50 + 1) * m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + l_26(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: long unsigned int
+ Base: (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+ Step: (long unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _4
+ Type: long unsigned int
+ Base: ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _11
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + i_50
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _12
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _13
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _14
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_30
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: j_51
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _72
+ Type: float *
+ Base: vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _73
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _74
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _75
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: ivtmp.25_76
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: t_28 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (n_23(D) > j_30)
+ At pos: j_30
+ IV struct:
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: ivtmp.25_76 = (unsigned long) _72;
+ At pos:
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: _14 = _13 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.29
+ Var after: ivtmp.29
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.30
+ Var after: ivtmp.30
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.31
+ Var after: ivtmp.31
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: ivtmp.32
+ Var after: ivtmp.32
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 1)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.33
+ Var after: ivtmp.33
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+ Step: (unsigned long) ((long unsigned int) m_25(D) * 4)
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.34
+ Var after: ivtmp.34
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.35
+ Var after: ivtmp.35
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) i_50
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 8:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.36
+ Var after: ivtmp.36
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37
+ Var after: ivtmp.37
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4) + (unsigned long) vector_27(D)
+ Step: (sizetype) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38
+ Var after: ivtmp.38
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 11:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.39
+ Var after: ivtmp.39
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 12:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.40
+ Var after: ivtmp.40
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 12
+ Group 1: 0, 1, 2, 3, 4, 6, 7
+ Group 2: 0, 1, 2, 3, 4, 8, 9, 10, 11, 12
+ Group 3: 0, 1, 2, 3, 4, 10, 11, 12
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 6
+ 3 6
+ 4 4
+ 5 9
+ 6 5
+ 7 5
+ 8 10
+ 9 9
+ 10 10
+ 11 9
+ 12 5
+
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24 (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50 (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1: (long unsigned int) m_25(D) * 4
+inv_expr 2: ((unsigned long) l_26(D) - (unsigned long) i_50) * 4
+inv_expr 3: (unsigned long) i_50 * 18446744073709551612 + (unsigned long) l_26(D) * 4
+inv_expr 4: ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 5: ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 6: ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 7: (signed int) i_50 + 1
+inv_expr 8: (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 9: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 10: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 11: (((signed long) i_50 - (signed long) l_26(D)) + 1) * 4
+inv_expr 12: (signed long) vector_27(D) + 4
+inv_expr 13: (((signed long) ((i_50 + 1) * m_25(D)) * 4 + (signed long) vector_27(D)) + (signed long) i_50 * 4) + 4
+inv_expr 14: (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 15: 4 - (signed long) vector_27(D)
+inv_expr 16: (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 5 1 0 NIL; NIL;
+ 8 8 2 2; NIL;
+ 9 8 1 3; NIL;
+ 10 8 2 4; NIL;
+ 11 8 1 4; NIL;
+ 12 10 1 5; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 6; NIL;
+ 1 2 0 8; NIL;
+ 2 3 0 9; NIL;
+ 3 0 0 NIL; 7
+ 4 0 0 NIL; 7
+ 6 0 0 NIL; 7
+ 7 0 0 NIL; 7
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 5 6 0 11; NIL;
+ 8 0 0 NIL; NIL;
+ 9 4 0 NIL; NIL;
+ 10 4 0 NIL; NIL;
+ 11 4 0 12; NIL;
+ 12 9 0 13; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 5 7 0 14; NIL;
+ 8 8 0 NIL; NIL;
+ 9 4 0 15; NIL;
+ 10 0 0 NIL; NIL;
+ 11 4 0 NIL; NIL;
+ 12 9 0 16; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 35 (complexity 0)
+ reg_cost: 8
+ cand_cost: 13
+ cand_group_cost: 14 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:5, cost=(1,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:5, cost=(6,0)
+ group:3 --> iv_cand:5, cost=(7,0)
+ invariant variables: 7
+ invariant expressions: 1, 11, 14
+
+Improved to:
+ cost: 33 (complexity 2)
+ reg_cost: 7
+ cand_cost: 14
+ cand_group_cost: 12 (complexity 2)
+ candidates: 4, 10
+ group:0 --> iv_cand:10, cost=(8,2)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:10, cost=(4,0)
+ group:3 --> iv_cand:10, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 4
+
+Initial set of candidates:
+ cost: 33 (complexity 2)
+ reg_cost: 7
+ cand_cost: 14
+ cand_group_cost: 12 (complexity 2)
+ candidates: 4, 10
+ group:0 --> iv_cand:10, cost=(8,2)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:10, cost=(4,0)
+ group:3 --> iv_cand:10, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 4
+
+Original cost 33 (complexity 2)
+
+Final cost 33 (complexity 2)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: j_51
+ Var after: j_30
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38_68
+ Var after: ivtmp.38_67
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;; header 4, latch 11
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_23(D) + 4294967294
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;; nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+ single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+ _66 = (sizetype) m_25(D);
+ _65 = _66 * 4;
+ _63 = i_50 + 1;
+ _62 = m_25(D) * _63;
+ _61 = (sizetype) _62;
+ _60 = (sizetype) i_50;
+ _59 = _60 + _61;
+ _58 = _59 + 1;
+ ivtmp.38_64 = _58 * 4;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ # ivtmp.38_68 = PHI <ivtmp.38_67(12), ivtmp.38_64(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ _49 = (sizetype) i_50;
+ _48 = _49 * 18446744073709551612;
+ _47 = (sizetype) l_26(D);
+ _46 = _47 * 4;
+ _44 = _46 + _48;
+ _43 = vector_27(D) + _44;
+ _41 = _43 + 18446744073709551612;
+ _31 = _43 + ivtmp.38_68;
+ # VUSE <.MEM_52>
+ t_28 = MEM[(float *)_31 + -4B];
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = ivtmp.38_68;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+ _75 = (sizetype) _11;
+ _74 = _75 + 1;
+ _73 = _74 * 4;
+ _72 = vector_27(D) + _73;
+ _20 = (unsigned long) vector_27(D);
+ _19 = _20 + ivtmp.38_68;
+ ivtmp.25_76 = _19;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ ivtmp.38_67 = ivtmp.38_68 + _65;
+ if (j_30 != n_23(D))
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _71 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_71];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _69 = (void *) ivtmp.25_78;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_69];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _70 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_70] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ ivtmp.25_77 = ivtmp.25_78 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+ number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _6
+ Type: int
+ Base: 0
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _7
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_24
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _41
+ Type: float *
+ Base: vector_27(D) + ((sizetype) l_26(D) * 4 + 18446744073709551612)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _43
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _44
+ Type: sizetype
+ Base: (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _48
+ Type: sizetype
+ Base: 0
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _49
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_50
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _60
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _62
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _63
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _87
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: COMPARE
+ Use 0.0:
+ At stmt: if (n_23(D) > j_24)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (j_24 < _45)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: _87
+ IV struct:
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: j_24 = i_50 + 1;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 4:
+ Type: GENERIC
+ Use 4.0:
+ At stmt: _43 = vector_27(D) + _44;
+ At pos:
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Group 5:
+ Type: GENERIC
+ Use 5.0:
+ At stmt: i_50 = PHI <j_24(11), 0(10)>
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 6:
+ Type: GENERIC
+ Use 6.0:
+ At stmt: _7 = _6 + i_50;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 7:
+ Type: GENERIC
+ Use 7.0:
+ At stmt: _62 = m_25(D) * _63;
+ At pos:
+ IV struct:
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 8:
+ Type: GENERIC
+ Use 8.0:
+ At stmt: _60 = (sizetype) i_50;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.41
+ Var after: ivtmp.41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.42
+ Var after: ivtmp.42
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.43
+ Var after: ivtmp.43
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.44
+ Var after: ivtmp.44
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.45
+ Var after: ivtmp.45
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) n_23(D)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.46
+ Var after: ivtmp.46
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.47
+ Var after: ivtmp.47
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.48
+ Var after: ivtmp.48
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3
+ Group 1: 0, 1, 2, 3
+ Group 2: 0, 1, 2, 3, 4, 5
+ Group 3: 0, 1, 2, 3
+ Group 4: 0, 1, 2, 3, 6
+ Group 5: 0, 1, 2, 3
+ Group 6: 0, 1, 2, 3, 7
+ Group 7: 0, 1, 2, 3, 8
+ Group 8: 0, 1, 2, 3
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 6
+ 7 5
+ 8 5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned int) m_25(D) + 1
+inv_expr 2: (signed int) n_23(D) + 1
+inv_expr 3: (signed int) n_23(D) + -1
+inv_expr 4: (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 NIL; NIL;
+ 5 4 0 2; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; 2
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 0 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 3 0 NIL; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 80 0 3; NIL;
+ 1 80 0 3; NIL;
+ 2 80 0 NIL; NIL;
+ 3 80 0 NIL; NIL;
+ 4 0 0 NIL; NIL;
+ 5 80 0 NIL; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 NIL; NIL;
+ 5 4 0 2; NIL;
+
+Group 4:
+ cand cost compl. inv.expr. inv.vars
+ 1 17 0 4; NIL;
+ 6 0 0 NIL; NIL;
+
+Group 5:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; NIL;
+ 2 4 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 4 4 0 3; NIL;
+ 5 4 0 NIL; NIL;
+
+Group 6:
+ cand cost compl. inv.expr. inv.vars
+ 7 0 0 NIL; NIL;
+
+Group 7:
+ cand cost compl. inv.expr. inv.vars
+ 8 0 0 NIL; NIL;
+
+Group 8:
+ cand cost compl. inv.expr. inv.vars
+ 1 0 0 NIL; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 126 (complexity 0)
+ reg_cost: 10
+ cand_cost: 19
+ cand_group_cost: 97 (complexity 0)
+ candidates: 1, 3, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:3, cost=(80,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 53 (complexity 0)
+ reg_cost: 12
+ cand_cost: 24
+ cand_group_cost: 17 (complexity 0)
+ candidates: 1, 3, 4, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 6, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 55 (complexity 0)
+ reg_cost: 10
+ cand_cost: 20
+ cand_group_cost: 25 (complexity 0)
+ candidates: 1, 4, 7, 8
+ group:0 --> iv_cand:4, cost=(4,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:4, cost=(4,0)
+ group:4 --> iv_cand:1, cost=(17,0)
+ group:5 --> iv_cand:1, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 45 (complexity 0)
+ reg_cost: 11
+ cand_cost: 26
+ cand_group_cost: 8 (complexity 0)
+ candidates: 1, 4, 6, 7, 8
+ group:0 --> iv_cand:4, cost=(4,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:4, cost=(4,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 6, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(0,0)
+ group:4 --> iv_cand:6, cost=(0,0)
+ group:5 --> iv_cand:3, cost=(0,0)
+ group:6 --> iv_cand:7, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+ Var befor: ivtmp.42_18
+ Var after: ivtmp.42_17
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: i_50
+ Var after: j_24
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.44_16
+ Var after: ivtmp.44_15
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.46_92
+ Var after: ivtmp.46_93
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.47_98
+ Var after: ivtmp.47_99
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.48_102
+ Var after: ivtmp.48_103
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/before.s b/gcc/testsuite/before.s
new file mode 100644
index 00000000000..e13834bdf59
--- /dev/null
+++ b/gcc/testsuite/before.s
@@ -0,0 +1,152 @@
+ .file 1 "fp_foo.c"
+ .section .mdebug.abi64
+ .previous
+ .nan 2008
+ .module fp=64
+ .module oddspreg
+ .module arch=mips64r6
+ .abicalls
+ .text
+ .align 2
+ .align 3
+ .globl daxpy
+ .set nomips16
+ .set nomicromips
+ .ent daxpy
+ .type daxpy, @function
+daxpy:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ blezc $6,.L7
+ dlsa $6,$6,$4,2
+ .align 3
+.L3:
+ lwc1 $f1,0($5)
+ daddiu $4,$4,4
+ lwc1 $f0,-4($4)
+ daddiu $5,$5,4
+ maddf.s $f0,$f1,$f15
+ bne $4,$6,.L3
+ swc1 $f0,-4($4)
+
+.L7:
+ jrc $31
+ .set macro
+ .set reorder
+ .end daxpy
+ .size daxpy, .-daxpy
+ .align 2
+ .align 3
+ .globl dgefa
+ .set nomips16
+ .set nomicromips
+ .ent dgefa
+ .type dgefa, @function
+dgefa:
+ .frame $sp,48,$31 # vars= 0, regs= 6/0, args= 0, gp= 0
+ .mask 0x101f0000,-8
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ li $2,1 # 0x1
+ bgec $2,$6,.L23
+ daddiu $sp,$sp,-48
+ addiu $14,$6,-1
+ move $11,$6
+ sd $20,32($sp)
+ sd $19,24($sp)
+ addiu $20,$5,1
+ sd $18,16($sp)
+ move $18,$4
+ sd $17,8($sp)
+ dlsa $10,$7,$4,2
+ sd $16,0($sp)
+ move $17,$5
+ dsll $12,$5,2
+ move $25,$5
+ move $13,$0
+ move $24,$0
+ move $15,$0
+ move $19,$14
+ .align 3
+.L11:
+ addiu $8,$15,1
+ addiu $16,$15,1
+ move $15,$8
+ bgec $8,$11,.L15
+ daddu $5,$25,$24
+ daddiu $9,$13,1
+ dsubu $6,$0,$13
+ dsll $5,$5,2
+ dlsa $9,$9,$18,2
+ dsll $6,$6,2
+ move $7,$14
+ .align 3
+.L14:
+ daddu $3,$10,$5
+ move $2,$9
+ lwc1 $f2,0($3)
+ move $4,$0
+ .align 3
+.L13:
+ daddu $3,$6,$2
+ lwc1 $f0,0($2)
+ daddu $3,$3,$5
+ daddiu $2,$2,4
+ lwc1 $f1,0($3)
+ addiu $4,$4,1
+ maddf.s $f0,$f2,$f1
+ swc1 $f0,-4($2)
+ bltc $4,$7,.L13
+ addiu $8,$8,1
+ bne $11,$8,.L14
+ daddu $5,$5,$12
+
+.L15:
+ daddiu $24,$24,1
+ addu $13,$20,$13
+ addiu $14,$14,-1
+ daddiu $10,$10,-4
+ bne $19,$16,.L11
+ addu $25,$17,$25
+
+ ld $20,32($sp)
+ ld $19,24($sp)
+ ld $18,16($sp)
+ ld $17,8($sp)
+ ld $16,0($sp)
+ jr $31
+ daddiu $sp,$sp,48
+
+.L23:
+ jrc $31
+ .set macro
+ .set reorder
+ .end dgefa
+ .size dgefa, .-dgefa
+ .section .text.startup,"ax",@progbits
+ .align 2
+ .align 3
+ .globl main
+ .set nomips16
+ .set nomicromips
+ .ent main
+ .type main, @function
+main:
+ .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
+ .mask 0x00000000,0
+ .fmask 0x00000000,0
+ .set noreorder
+ .set nomacro
+ jr $31
+ move $2,$0
+
+ .set macro
+ .set reorder
+ .end main
+ .size main, .-main
+ .ident "GCC: (GNU) 14.0.1 20240214 (experimental)"
+ .section .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/before.txt b/gcc/testsuite/before.txt
new file mode 100644
index 00000000000..c87764b8ae9
--- /dev/null
+++ b/gcc/testsuite/before.txt
@@ -0,0 +1,2694 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;; header 3, latch 6
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_12(D) + 4294967295
+;; upper_bound 2147483646
+;; likely_upper_bound 2147483646
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;; nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+ single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+ {
+ <bb 2> [local count: 118111600]:
+ if (n_12(D) > 0)
+ goto <bb 5>; [89.00%]
+ else
+ goto <bb 4>; [11.00%]
+
+ }
+ bb_5 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 5> [local count: 105119324]:
+
+ }
+ bb_7 (preds = {bb_3 }, succs = {bb_4 })
+ {
+ <bb 7> [local count: 105119324]:
+ # .MEM_22 = PHI <.MEM_16(3)>
+
+ }
+ bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+ {
+ <bb 4> [local count: 118111600]:
+ # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+ # VUSE <.MEM_29>
+ return;
+
+ }
+ loop_1 (header = 3, latch = 6, finite_p
+ niter (unsigned int) n_12(D) + 4294967295
+ upper_bound 2147483646
+ likely_upper_bound 2147483646
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+ {
+ bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+ {
+ <bb 3> [local count: 955630224]:
+ # i_20 = PHI <i_17(6), 0(5)>
+ # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+ _1 = (long unsigned int) i_20;
+ _2 = _1 * 4;
+ _3 = vector1_13(D) + _2;
+ # VUSE <.MEM_21>
+ _4 = *_3;
+ _5 = vector2_14(D) + _2;
+ # VUSE <.MEM_21>
+ _6 = *_5;
+ _7 = _6 * fp_const_15(D);
+ _8 = _4 + _7;
+ # .MEM_16 = VDEF <.MEM_21>
+ *_3 = _8;
+ i_17 = i_20 + 1;
+ if (n_12(D) > i_17)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 7>; [11.00%]
+
+ }
+ bb_6 (preds = {bb_3 }, succs = {bb_3 })
+ {
+ <bb 6> [local count: 850510900]:
+ goto <bb 3>; [100.00%]
+
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_12(D)
+ bounds on difference of bases: 0 ... 2147483646
+ result:
+ # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_17
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_20
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _4 = *_3;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_3 = _8;
+ At pos: *_3
+ IV struct:
+ Type: float *
+ Base: vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _6 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (n_12(D) > i_17)
+ At pos: i_17
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.6
+ Var after: ivtmp.6
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.7
+ Var after: ivtmp.7
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.8
+ Var after: ivtmp.8
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.9
+ Var after: ivtmp.9
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10
+ Var after: ivtmp.10
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.11
+ Var after: ivtmp.11
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.12
+ Var after: ivtmp.12
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 7
+ Group 1: 0, 1, 2, 3, 5, 7
+ Group 2: 0, 1, 2, 3, 6
+
+<Candidate Costs>:
+ cand cost
+force_expr_to_var_cost size costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+force_expr_to_var_cost speed costs:
+ integer 0
+ symbol 5
+ address 5
+ other 24
+
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D) (eliminable)
+Inv 2: vector2_14(D) (eliminable)
+Inv 3: fp_const_15(D) (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned long) vector1_13(D) + 18446744073709551612
+inv_expr 2: (unsigned long) vector2_14(D) + 18446744073709551612
+inv_expr 3: (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 4: (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 18 0 NIL; 1
+ 2 20 0 1; NIL;
+ 4 2 0 NIL; NIL;
+ 7 10 0 NIL; 1
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 9 0 NIL; 2
+ 2 10 0 2; NIL;
+ 5 1 0 NIL; NIL;
+ 7 5 0 NIL; 2
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 4
+ 1 0 0 NIL; 4
+ 2 1 0 NIL; 4
+ 3 0 0 NIL; 4
+ 4 1 0 3; NIL;
+ 5 1 0 4; NIL;
+ 6 0 0 NIL; 4
+ 7 1 0 NIL; 4
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 37 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 27 (complexity 0)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(18,0)
+ group:1 --> iv_cand:1, cost=(9,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 0)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 0)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 3
+
+Initial set of candidates:
+ cost: 26 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 16 (complexity 0)
+ candidates: 7
+ group:0 --> iv_cand:7, cost=(10,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 1, 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 24 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 8 (complexity 0)
+ candidates: 4, 7
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:7, cost=(5,0)
+ group:2 --> iv_cand:7, cost=(1,0)
+ invariant variables: 2, 4
+ invariant expressions:
+
+Improved to:
+ cost: 19 (complexity 0)
+ reg_cost: 5
+ cand_cost: 10
+ cand_group_cost: 4 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:5, cost=(1,0)
+ group:2 --> iv_cand:4, cost=(1,0)
+ invariant variables:
+ invariant expressions: 3
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: ivtmp.9_28
+ Var after: ivtmp.9_27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector1_13(D)
+ Step: 4
+ Object: (void *) vector1_13(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.10_25
+ Var after: ivtmp.10_24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) vector2_14(D)
+ Step: 4
+ Object: (void *) vector2_14(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;; header 8, latch 13
+;; depth 3, outer 2, finite_p
+;; niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;; nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+ single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ # VUSE <.MEM_57>
+ _35 = *_34;
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ # VUSE <.MEM_57>
+ _37 = *_36;
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ # .MEM_42 = VDEF <.MEM_57>
+ *_34 = _39;
+ i_40 = i_56 + 1;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 3
+ exit condition [1, + , 1](no_overflow) < _87
+ bounds on difference of bases: -2147483649 ... 2147483646
+ result:
+ zero if _87 <= 0
+ # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+ number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _21
+ Type: sizetype
+ Base: ((sizetype) _7 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _29
+ Type: sizetype
+ Base: ((sizetype) _11 + 1) * 4
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _32
+ Type: long unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _33
+ Type: long unsigned int
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _34
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _36
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_40
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: i_56
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: _35 = *_34;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+ Use 0.1:
+ At stmt: *_34 = _39;
+ At pos: *_34
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _7 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: REFERENCE ADDRESS
+ Use 1.0:
+ At stmt: _37 = *_36;
+ At pos: *_36
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((sizetype) _11 + 1) * 4
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 2:
+ Type: COMPARE
+ Use 2.0:
+ At stmt: if (i_40 < _87)
+ At pos: i_40
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.20
+ Var after: ivtmp.20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.21
+ Var after: ivtmp.21
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.22
+ Var after: ivtmp.22
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23
+ Var after: ivtmp.23
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.24
+ Var after: ivtmp.24
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.25
+ Var after: ivtmp.25
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.26
+ Var after: ivtmp.26
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.27
+ Var after: ivtmp.27
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 9:
+ Var befor: ivtmp.28
+ Var after: ivtmp.28
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 4
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 9
+ Group 1: 0, 1, 2, 3, 6, 7, 9
+ Group 2: 0, 1, 2, 3, 8
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 6
+ 5 6
+ 6 6
+ 7 6
+ 8 5
+ 9 5
+
+
+<Invariant Vars>:
+Inv 6: _7 (eliminable)
+Inv 1: _10 (eliminable)
+Inv 7: _11 (eliminable)
+Inv 3: _14 (eliminable)
+Inv 2: vector_27(D) (eliminable)
+Inv 4: t_28 (eliminable)
+Inv 5: _87 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: ((unsigned long) _7 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 2: (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 3: ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 4: ((unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4) + 4
+inv_expr 5: ((unsigned long) _11 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 6: (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 7: ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 8: ((unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4) + 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 1 22 0 1; NIL;
+ 2 22 0 2; NIL;
+ 4 2 0 NIL; NIL;
+ 5 2 2 NIL; NIL;
+ 6 16 0 3; NIL;
+ 7 18 0 4; NIL;
+ 9 14 0 1; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 1 11 0 5; NIL;
+ 2 11 0 6; NIL;
+ 4 8 0 7; NIL;
+ 5 9 0 8; NIL;
+ 6 1 0 NIL; NIL;
+ 7 1 1 NIL; NIL;
+ 9 7 0 5; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; 5
+ 1 0 0 NIL; 5
+ 2 4 0 NIL; 5
+ 3 0 0 NIL; 5
+ 8 4 0 NIL; 5
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 0
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 38
+ 20 40
+ 21 42
+ 22 44
+ 23 115
+ 24 120
+ 25 125
+ 26 130
+ 27 179
+ 28 228
+ 29 277
+ 30 326
+ 31 375
+ 32 424
+ 33 473
+ 34 522
+ 35 571
+ 36 620
+ 37 669
+ 38 718
+ 39 767
+ 40 816
+ 41 865
+ 42 914
+ 43 963
+ 44 1012
+ 45 1061
+ 46 1110
+ 47 1159
+ 48 1208
+ 49 1257
+ 50 1306
+ 51 1355
+ 52 1404
+
+Initial set of candidates:
+ cost: 43 (complexity 0)
+ reg_cost: 5
+ cand_cost: 5
+ cand_group_cost: 33 (complexity 0)
+ candidates: 1
+ group:0 --> iv_cand:1, cost=(22,0)
+ group:1 --> iv_cand:1, cost=(11,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 5
+
+Improved to:
+ cost: 27 (complexity 0)
+ reg_cost: 6
+ cand_cost: 11
+ cand_group_cost: 10 (complexity 0)
+ candidates: 1, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:1, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 0)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Initial set of candidates:
+ cost: 37 (complexity 0)
+ reg_cost: 7
+ cand_cost: 9
+ cand_group_cost: 21 (complexity 0)
+ candidates: 3, 9
+ group:0 --> iv_cand:9, cost=(14,0)
+ group:1 --> iv_cand:9, cost=(7,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 1, 5
+
+Improved to:
+ cost: 26 (complexity 0)
+ reg_cost: 6
+ cand_cost: 10
+ cand_group_cost: 10 (complexity 0)
+ candidates: 3, 4
+ group:0 --> iv_cand:4, cost=(2,0)
+ group:1 --> iv_cand:4, cost=(8,0)
+ group:2 --> iv_cand:3, cost=(0,0)
+ invariant variables: 5
+ invariant expressions: 7
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 3:
+ Var befor: i_56
+ Var after: i_40
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Var befor: ivtmp.23_85
+ Var after: ivtmp.23_84
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+ Step: 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+ allowed multipliers:
+
+;;
+;; Loop 2
+;; header 7, latch 12
+;; depth 2, outer 1, finite_p
+;; niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;; nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+ single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ # VUSE <.MEM_52>
+ t_28 = *_5;
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = _13 * 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ if (n_23(D) > j_30)
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _78 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_78];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _76 = (sizetype) _7;
+ _75 = _76 * 18446744073709551612;
+ _74 = _75 + ivtmp.23_85;
+ _73 = (void *) _74;
+ _72 = (sizetype) _11;
+ _71 = _72 * 4;
+ _70 = _73 + _71;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_70];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _77 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_77] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 2
+ exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+ number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _1
+ Type: int
+ Base: (i_50 + 1) * m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _2
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + l_26(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _3
+ Type: long unsigned int
+ Base: (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+ Step: (long unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _4
+ Type: long unsigned int
+ Base: ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _5
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _11
+ Type: int
+ Base: (i_50 + 1) * m_25(D) + i_50
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _12
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _13
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _14
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_30
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: j_51
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _71
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _72
+ Type: sizetype
+ Base: (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+ Step: (sizetype) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: REFERENCE ADDRESS
+ Use 0.0:
+ At stmt: t_28 = *_5;
+ At pos: *_5
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+ Step: (long unsigned int) m_25(D) * 4
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (n_23(D) > j_30)
+ At pos: j_30
+ IV struct:
+ Type: int
+ Base: i_50 + 2
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: _14 = _13 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: GENERIC
+ Use 3.0:
+ At stmt: _71 = _72 * 4;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.29
+ Var after: ivtmp.29
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.30
+ Var after: ivtmp.30
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.31
+ Var after: ivtmp.31
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: ivtmp.32
+ Var after: ivtmp.32
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (sizetype) (i_50 + 1)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.33
+ Var after: ivtmp.33
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+ Step: (unsigned long) ((long unsigned int) m_25(D) * 4)
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.34
+ Var after: ivtmp.34
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (i_50 + 2)
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 7:
+ Var befor: ivtmp.35
+ Var after: ivtmp.35
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) i_50
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 8:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.36
+ Var after: ivtmp.36
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37
+ Var after: ivtmp.37
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 10:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.38
+ Var after: ivtmp.38
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: (long unsigned int) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3, 4, 5, 10
+ Group 1: 0, 1, 2, 3, 4, 6, 7
+ Group 2: 0, 1, 2, 3, 4, 8, 9, 10
+ Group 3: 0, 1, 2, 3, 4, 9, 10
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 6
+ 3 6
+ 4 4
+ 5 9
+ 6 5
+ 7 5
+ 8 10
+ 9 9
+ 10 5
+
+Scaling cost based on bb prob by 8.00: 6 (scratch: 2) -> 34
+Scaling cost based on bb prob by 8.00: 4 (scratch: 0) -> 32
+Scaling cost based on bb prob by 8.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 8.00: 8 (scratch: 4) -> 36
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24 (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50 (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1: (long unsigned int) m_25(D) * 4
+inv_expr 2: (((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4) + 18446744073709551612
+inv_expr 3: ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 4: ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 5: ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 6: (signed int) i_50 + 1
+inv_expr 7: (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 8: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 9: ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 10: (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 11: (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+inv_expr 12: ((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4
+inv_expr 13: ((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) * 4
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 5 1 0 NIL; NIL;
+ 8 9 0 2; NIL;
+ 9 8 0 3; NIL;
+ 10 10 0 4; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 5; NIL;
+ 1 2 0 7; NIL;
+ 2 3 0 8; NIL;
+ 3 0 0 NIL; 7
+ 4 0 0 NIL; 7
+ 6 0 0 NIL; 7
+ 7 0 0 NIL; 7
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 5 7 0 10; NIL;
+ 8 0 0 NIL; NIL;
+ 9 4 0 NIL; NIL;
+ 10 9 0 11; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 5 34 0 12; NIL;
+ 8 32 0 NIL; NIL;
+ 9 0 0 NIL; NIL;
+ 10 36 0 13; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 63 (complexity 0)
+ reg_cost: 8
+ cand_cost: 13
+ cand_group_cost: 42 (complexity 0)
+ candidates: 4, 5
+ group:0 --> iv_cand:5, cost=(1,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:5, cost=(7,0)
+ group:3 --> iv_cand:5, cost=(34,0)
+ invariant variables: 7
+ invariant expressions: 1, 10, 12
+
+Improved to:
+ cost: 32 (complexity 0)
+ reg_cost: 7
+ cand_cost: 13
+ cand_group_cost: 12 (complexity 0)
+ candidates: 4, 9
+ group:0 --> iv_cand:9, cost=(8,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:9, cost=(4,0)
+ group:3 --> iv_cand:9, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 3
+
+Initial set of candidates:
+ cost: 32 (complexity 0)
+ reg_cost: 7
+ cand_cost: 13
+ cand_group_cost: 12 (complexity 0)
+ candidates: 4, 9
+ group:0 --> iv_cand:9, cost=(8,0)
+ group:1 --> iv_cand:4, cost=(0,0)
+ group:2 --> iv_cand:9, cost=(4,0)
+ group:3 --> iv_cand:9, cost=(0,0)
+ invariant variables: 7
+ invariant expressions: 1, 3
+
+Original cost 32 (complexity 0)
+
+Final cost 32 (complexity 0)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+ Var befor: j_51
+ Var after: j_30
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: i_50 + 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 9:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.37_69
+ Var after: ivtmp.37_68
+ Incr POS: before exit test
+ IV struct:
+ Type: sizetype
+ Base: ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+ Step: (sizetype) m_25(D) * 4
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;; header 4, latch 11
+;; depth 1, outer 0, finite_p
+;; niter (unsigned int) n_23(D) + 4294967294
+;; upper_bound 2147483645
+;; likely_upper_bound 2147483645
+;; iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;; nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+ single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+ bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+ {
+ <bb 2> [local count: 1804255]:
+ _45 = n_23(D) + -1;
+ if (n_23(D) > 1)
+ goto <bb 10>; [89.00%]
+ else
+ goto <bb 14>; [11.00%]
+
+ }
+ bb_14 (preds = {bb_2 }, succs = {bb_3 })
+ {
+ <bb 14> [local count: 198468]:
+
+ }
+ bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+ {
+ <bb 3> [local count: 1804255]:
+ # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+ # VUSE <.MEM_88>
+ return;
+
+ }
+ bb_10 (preds = {bb_2 }, succs = {bb_4 })
+ {
+ <bb 10> [local count: 1605787]:
+
+ }
+ bb_16 (preds = {bb_5 }, succs = {bb_3 })
+ {
+ <bb 16> [local count: 1605787]:
+ # .MEM_53 = PHI <.MEM_89(5)>
+ goto <bb 3>; [100.00%]
+
+ }
+ loop_1 (header = 4, latch = 11, finite_p
+ niter (unsigned int) n_23(D) + 4294967294
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+ {
+ bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+ {
+ <bb 4> [local count: 14598063]:
+ # i_50 = PHI <j_24(11), 0(10)>
+ # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+ j_24 = i_50 + 1;
+ if (n_23(D) > j_24)
+ goto <bb 6>; [89.00%]
+ else
+ goto <bb 15>; [11.00%]
+
+ }
+ bb_15 (preds = {bb_4 }, succs = {bb_5 })
+ {
+ <bb 15> [local count: 1605787]:
+
+ }
+ bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+ {
+ <bb 5> [local count: 14598063]:
+ # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+ if (j_24 < _45)
+ goto <bb 11>; [89.00%]
+ else
+ goto <bb 16>; [11.00%]
+
+ }
+ bb_11 (preds = {bb_5 }, succs = {bb_4 })
+ {
+ <bb 11> [local count: 12992276]:
+ goto <bb 4>; [100.00%]
+
+ }
+ bb_6 (preds = {bb_4 }, succs = {bb_7 })
+ {
+ <bb 6> [local count: 12992276]:
+ _6 = m_25(D) * i_50;
+ _7 = _6 + i_50;
+ _8 = (sizetype) _7;
+ _9 = _8 + 1;
+ _10 = _9 * 4;
+ _87 = n_23(D) - j_24;
+ _67 = (sizetype) m_25(D);
+ _66 = _67 * 4;
+ _64 = i_50 + 1;
+ _63 = m_25(D) * _64;
+ _62 = (sizetype) _63;
+ _61 = (sizetype) i_50;
+ _60 = _61 + _62;
+ ivtmp.37_65 = _60 * 4;
+
+ }
+ bb_17 (preds = {bb_9 }, succs = {bb_5 })
+ {
+ <bb 17> [local count: 12992276]:
+ # .MEM_86 = PHI <.MEM_55(9)>
+ goto <bb 5>; [100.00%]
+
+ }
+ loop_2 (header = 7, latch = 12, finite_p
+ niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+ {
+ bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+ {
+ <bb 7> [local count: 118111600]:
+ # j_51 = PHI <j_30(12), j_24(6)>
+ # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+ # ivtmp.37_69 = PHI <ivtmp.37_68(12), ivtmp.37_65(6)>
+ _1 = m_25(D) * j_51;
+ _2 = _1 + l_26(D);
+ _3 = (long unsigned int) _2;
+ _4 = _3 * 4;
+ _5 = vector_27(D) + _4;
+ _59 = (sizetype) i_50;
+ _58 = _59 * 18446744073709551612;
+ _49 = (sizetype) l_26(D);
+ _48 = _49 * 4;
+ _47 = _48 + _58;
+ _46 = vector_27(D) + _47;
+ _44 = _46 + ivtmp.37_69;
+ # VUSE <.MEM_52>
+ t_28 = MEM[(float *)_44];
+ _11 = _1 + i_50;
+ _12 = (sizetype) _11;
+ _13 = _12 + 1;
+ _14 = ivtmp.37_69 + 4;
+ _82 = (sizetype) _7;
+ _81 = _82 + 1;
+ _80 = _81 * 4;
+ _79 = vector_27(D) + _80;
+ ivtmp.23_83 = (unsigned long) _79;
+
+ }
+ bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+ {
+ <bb 9> [local count: 118111600]:
+ # .MEM_55 = PHI <.MEM_42(8)>
+ j_30 = j_51 + 1;
+ ivtmp.37_68 = ivtmp.37_69 + _66;
+ if (j_30 != n_23(D))
+ goto <bb 12>; [89.00%]
+ else
+ goto <bb 17>; [11.00%]
+
+ }
+ bb_12 (preds = {bb_9 }, succs = {bb_7 })
+ {
+ <bb 12> [local count: 105119324]:
+ goto <bb 7>; [100.00%]
+
+ }
+ loop_3 (header = 8, latch = 13, finite_p
+ niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+ upper_bound 2147483645
+ likely_upper_bound 2147483645
+ iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+ {
+ bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+ {
+ <bb 8> [local count: 955630225]:
+ # i_56 = PHI <i_40(13), 0(7)>
+ # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+ # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+ _32 = (long unsigned int) i_56;
+ _33 = _32 * 4;
+ _21 = _10 + _33;
+ _34 = vector_27(D) + _21;
+ _78 = (void *) ivtmp.23_85;
+ # VUSE <.MEM_57>
+ _35 = MEM[(float *)_78];
+ _29 = _14 + _33;
+ _36 = vector_27(D) + _29;
+ _76 = (sizetype) _7;
+ _75 = _76 * 18446744073709551612;
+ _74 = _75 + ivtmp.23_85;
+ _73 = (void *) _74;
+ _72 = (sizetype) _11;
+ _71 = ivtmp.37_69;
+ _70 = _73 + _71;
+ # VUSE <.MEM_57>
+ _37 = MEM[(float *)_70];
+ _38 = t_28 * _37;
+ _39 = _35 + _38;
+ _77 = (void *) ivtmp.23_85;
+ # .MEM_42 = VDEF <.MEM_57>
+ MEM[(float *)_77] = _39;
+ i_40 = i_56 + 1;
+ ivtmp.23_84 = ivtmp.23_85 + 4;
+ if (i_40 < _87)
+ goto <bb 13>; [89.00%]
+ else
+ goto <bb 9>; [11.00%]
+
+ }
+ bb_13 (preds = {bb_8 }, succs = {bb_8 })
+ {
+ <bb 13> [local count: 850510901]:
+ goto <bb 8>; [100.00%]
+
+ }
+ }
+ }
+ }
+}
+Analyzing # of iterations of loop 1
+ exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+ bounds on difference of bases: 0 ... 2147483645
+ result:
+ # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+ number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+ SSA_NAME: _6
+ Type: int
+ Base: 0
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _7
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: j_24
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _46
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _47
+ Type: sizetype
+ Base: (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: i_50
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _58
+ Type: sizetype
+ Base: 0
+ Step: 18446744073709551612
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _59
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _61
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _63
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+IV struct:
+ SSA_NAME: _64
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+IV struct:
+ SSA_NAME: _87
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<IV Groups>:
+Group 0:
+ Type: COMPARE
+ Use 0.0:
+ At stmt: if (n_23(D) > j_24)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 1:
+ Type: COMPARE
+ Use 1.0:
+ At stmt: if (j_24 < _45)
+ At pos: j_24
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 2:
+ Type: GENERIC
+ Use 2.0:
+ At stmt: _7 = _6 + i_50;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: (int) ((unsigned int) m_25(D) + 1)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 3:
+ Type: COMPARE
+ Use 3.0:
+ At stmt: if (i_40 < _87)
+ At pos: _87
+ IV struct:
+ Type: int
+ Base: n_23(D) + -1
+ Step: -1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 4:
+ Type: GENERIC
+ Use 4.0:
+ At stmt: j_24 = i_50 + 1;
+ At pos:
+ IV struct:
+ Type: int
+ Base: 1
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 5:
+ Type: GENERIC
+ Use 5.0:
+ At stmt: _46 = vector_27(D) + _47;
+ At pos:
+ IV struct:
+ Type: float *
+ Base: vector_27(D) + (sizetype) l_26(D) * 4
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Group 6:
+ Type: GENERIC
+ Use 6.0:
+ At stmt: i_50 = PHI <j_24(11), 0(10)>
+ At pos:
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: Y
+ Overflowness wrto loop niter: No-overflow
+Group 7:
+ Type: GENERIC
+ Use 7.0:
+ At stmt: _63 = m_25(D) * _64;
+ At pos:
+ IV struct:
+ Type: int
+ Base: m_25(D)
+ Step: m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Group 8:
+ Type: GENERIC
+ Use 8.0:
+ At stmt: _61 = (sizetype) i_50;
+ At pos:
+ IV struct:
+ Type: sizetype
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+ Var befor: ivtmp.39
+ Var after: ivtmp.39
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 1:
+ Var befor: ivtmp.40
+ Var after: ivtmp.40
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 2:
+ Var befor: ivtmp.41
+ Var after: ivtmp.41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 1
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.42
+ Var after: ivtmp.42
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.43
+ Var after: ivtmp.43
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 6:
+ Var befor: ivtmp.44
+ Var after: ivtmp.44
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) n_23(D)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.45
+ Var after: ivtmp.45
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.46
+ Var after: ivtmp.46
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+<Important Candidates>: 0, 1, 2, 3,
+
+<Group, Cand> Related:
+ Group 0: 0, 1, 2, 3
+ Group 1: 0, 1, 2, 3
+ Group 2: 0, 1, 2, 3, 4
+ Group 3: 0, 1, 2, 3, 5, 6
+ Group 4: 0, 1, 2, 3
+ Group 5: 0, 1, 2, 3, 7
+ Group 6: 0, 1, 2, 3
+ Group 7: 0, 1, 2, 3, 8
+ Group 8: 0, 1, 2, 3
+
+<Candidate Costs>:
+ cand cost
+ 0 5
+ 1 5
+ 2 5
+ 3 4
+ 4 5
+ 5 5
+ 6 5
+ 7 6
+ 8 5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45 (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: (unsigned int) m_25(D) + 1
+inv_expr 2: (signed int) n_23(D) + 1
+inv_expr 3: (signed int) n_23(D) + -1
+inv_expr 4: (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 NIL; NIL;
+ 6 4 0 2; NIL;
+
+Group 1:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; 2
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 0 0 NIL; NIL;
+ 7 3 0 NIL; NIL;
+
+Group 2:
+ cand cost compl. inv.expr. inv.vars
+ 4 0 0 NIL; NIL;
+
+Group 3:
+ cand cost compl. inv.expr. inv.vars
+ 0 80 0 3; NIL;
+ 1 80 0 3; NIL;
+ 2 80 0 NIL; NIL;
+ 3 80 0 NIL; NIL;
+ 5 0 0 NIL; NIL;
+ 6 80 0 NIL; NIL;
+
+Group 4:
+ cand cost compl. inv.expr. inv.vars
+ 0 4 0 NIL; NIL;
+ 1 4 0 NIL; NIL;
+ 2 0 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 NIL; NIL;
+ 6 4 0 2; NIL;
+
+Group 5:
+ cand cost compl. inv.expr. inv.vars
+ 1 17 0 4; NIL;
+ 7 0 0 NIL; NIL;
+
+Group 6:
+ cand cost compl. inv.expr. inv.vars
+ 0 0 0 NIL; NIL;
+ 1 0 0 NIL; NIL;
+ 2 4 0 NIL; NIL;
+ 3 0 0 NIL; NIL;
+ 5 4 0 3; NIL;
+ 6 4 0 NIL; NIL;
+
+Group 7:
+ cand cost compl. inv.expr. inv.vars
+ 8 0 0 NIL; NIL;
+
+Group 8:
+ cand cost compl. inv.expr. inv.vars
+ 1 0 0 NIL; NIL;
+
+
+<Global Costs>:
+ target_avail_regs 26
+ target_clobbered_regs 16
+ target_reg_cost 4
+ target_spill_cost 24
+ regs_used 4
+ cost for size:
+ ivs cost
+ 0 0
+ 1 2
+ 2 4
+ 3 6
+ 4 8
+ 5 10
+ 6 12
+ 7 14
+ 8 16
+ 9 18
+ 10 20
+ 11 22
+ 12 24
+ 13 26
+ 14 28
+ 15 30
+ 16 32
+ 17 34
+ 18 36
+ 19 111
+ 20 116
+ 21 121
+ 22 126
+ 23 151
+ 24 176
+ 25 201
+ 26 226
+ 27 275
+ 28 324
+ 29 373
+ 30 422
+ 31 471
+ 32 520
+ 33 569
+ 34 618
+ 35 667
+ 36 716
+ 37 765
+ 38 814
+ 39 863
+ 40 912
+ 41 961
+ 42 1010
+ 43 1059
+ 44 1108
+ 45 1157
+ 46 1206
+ 47 1255
+ 48 1304
+ 49 1353
+ 50 1402
+ 51 1451
+ 52 1500
+
+Initial set of candidates:
+ cost: 126 (complexity 0)
+ reg_cost: 10
+ cand_cost: 19
+ cand_group_cost: 97 (complexity 0)
+ candidates: 1, 3, 4, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:3, cost=(80,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 53 (complexity 0)
+ reg_cost: 12
+ cand_cost: 24
+ cand_group_cost: 17 (complexity 0)
+ candidates: 1, 3, 4, 5, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 5, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Initial set of candidates:
+ cost: 55 (complexity 0)
+ reg_cost: 10
+ cand_cost: 20
+ cand_group_cost: 25 (complexity 0)
+ candidates: 1, 4, 5, 8
+ group:0 --> iv_cand:5, cost=(4,0)
+ group:1 --> iv_cand:5, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:5, cost=(4,0)
+ group:5 --> iv_cand:1, cost=(17,0)
+ group:6 --> iv_cand:1, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1, 4
+
+Improved to:
+ cost: 45 (complexity 0)
+ reg_cost: 11
+ cand_cost: 26
+ cand_group_cost: 8 (complexity 0)
+ candidates: 1, 4, 5, 7, 8
+ group:0 --> iv_cand:5, cost=(4,0)
+ group:1 --> iv_cand:5, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:5, cost=(4,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:1, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Improved to:
+ cost: 43 (complexity 0)
+ reg_cost: 13
+ cand_cost: 30
+ cand_group_cost: 0 (complexity 0)
+ candidates: 1, 3, 4, 5, 7, 8
+ group:0 --> iv_cand:3, cost=(0,0)
+ group:1 --> iv_cand:3, cost=(0,0)
+ group:2 --> iv_cand:4, cost=(0,0)
+ group:3 --> iv_cand:5, cost=(0,0)
+ group:4 --> iv_cand:3, cost=(0,0)
+ group:5 --> iv_cand:7, cost=(0,0)
+ group:6 --> iv_cand:3, cost=(0,0)
+ group:7 --> iv_cand:8, cost=(0,0)
+ group:8 --> iv_cand:1, cost=(0,0)
+ invariant variables:
+ invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+ Var befor: ivtmp.40_43
+ Var after: ivtmp.40_41
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 3:
+ Var befor: i_50
+ Var after: j_24
+ Incr POS: orig biv
+ IV struct:
+ Type: int
+ Base: 0
+ Step: 1
+ Biv: N
+ Overflowness wrto loop niter: No-overflow
+Candidate 4:
+ Depend on inv.exprs: 1
+ Var befor: ivtmp.42_31
+ Var after: ivtmp.42_20
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: 0
+ Step: (unsigned int) m_25(D) + 1
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 5:
+ Var befor: ivtmp.43_17
+ Var after: ivtmp.43_16
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) (n_23(D) + -1)
+ Step: 4294967295
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 7:
+ Var befor: ivtmp.45_91
+ Var after: ivtmp.45_92
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned long
+ Base: (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+ Step: 18446744073709551612
+ Object: (void *) vector_27(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+Candidate 8:
+ Var befor: ivtmp.46_97
+ Var after: ivtmp.46_98
+ Incr POS: before exit test
+ IV struct:
+ Type: unsigned int
+ Base: (unsigned int) m_25(D)
+ Step: (unsigned int) m_25(D)
+ Biv: N
+ Overflowness wrto loop niter: Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/fp_foo.c b/gcc/testsuite/fp_foo.c
new file mode 100644
index 00000000000..f65f43d6435
--- /dev/null
+++ b/gcc/testsuite/fp_foo.c
@@ -0,0 +1,19 @@
+
+void daxpy(float *vector1, float *vector2, int n, float fp_const){
+ for (int i = 0; i < n; ++i)
+ vector1[i] += fp_const * vector2[i];
+}
+
+void dgefa(float *vector, int m, int n, int l){
+ for (int i = 0; i < n - 1; ++i){
+ for (int j = i + 1; j < n; ++j){
+ float t = vector[m * j + l];
+ daxpy(&vector[m * i + i + 1],
+ &vector[m * j + i + 1], n - (i + 1), t);
+ }
+ }
+}
+
+int main(){
+ return 0;
+}
diff --git a/gcc/testsuite/test_script.sh b/gcc/testsuite/test_script.sh
new file mode 100644
index 00000000000..4f19d248efe
--- /dev/null
+++ b/gcc/testsuite/test_script.sh
@@ -0,0 +1,10 @@
+export PREFIX="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install"
+export SOURCE_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/source"
+export BUILD_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/build"
+export SYSROOT="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install/sys_root"
+export PATH=$PREFIX/bin:$PATH
+export TARGET=mips64-r6-linux-gnu
+
+
+$PREFIX/bin/mips64-r6-linux-gnu-gcc fp_foo.c -O2 >out.txt -S -o fp_foo.s -march=mips64r6 -mabi=64
+
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7cae5bdefea..2dec5001dca 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4724,7 +4724,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
rtx addr;
bool simple_inv = true;
tree comp_inv = NULL_TREE, type = aff_var->type;
- comp_cost var_cost = no_cost, cost = no_cost;
+ comp_cost var_cost = no_cost, cost = no_cost, autoinc_cost = no_cost;
+ comp_cost acost = no_cost;
struct mem_address parts = {NULL_TREE, integer_one_node,
NULL_TREE, NULL_TREE, NULL_TREE};
machine_mode addr_mode = TYPE_MODE (type);
@@ -4755,38 +4756,36 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (!ok_with_ratio_p)
parts.step = NULL_TREE;
}
- if (ok_with_ratio_p || ok_without_ratio_p)
+ if (!(ok_with_ratio_p || ok_without_ratio_p))
+ parts.index = NULL_TREE;
+
+ if (maybe_ne (aff_inv->offset, 0))
{
- if (maybe_ne (aff_inv->offset, 0))
- {
- parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
- /* Addressing mode "base + index [<< scale] + offset". */
- if (!valid_mem_ref_p (mem_mode, as, &parts, code))
- parts.offset = NULL_TREE;
- else
- aff_inv->offset = 0;
- }
+ parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
+ /* Addressing mode "base + index[<< scale] + offset". */
+ if (!valid_mem_ref_p (mem_mode, as, &parts, code))
+ parts.offset = NULL_TREE;
+ else
+ aff_inv->offset = 0;
+ }
- move_fixed_address_to_symbol (&parts, aff_inv);
- /* Base is fixed address and is moved to symbol part. */
- if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
- parts.base = NULL_TREE;
+ move_fixed_address_to_symbol (&parts, aff_inv);
+ /* Base is fixed address and is moved to symbol part. */
+ if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
+ parts.base = NULL_TREE;
- /* Addressing mode "symbol + base + index [<< scale] [+ offset]". */
- if (parts.symbol != NULL_TREE
- && !valid_mem_ref_p (mem_mode, as, &parts, code))
- {
- aff_combination_add_elt (aff_inv, parts.symbol, 1);
- parts.symbol = NULL_TREE;
- /* Reset SIMPLE_INV since symbol address needs to be computed
- outside of address expression in this case. */
- simple_inv = false;
- /* Symbol part is moved back to base part, it can't be NULL. */
- parts.base = integer_one_node;
- }
+ /* Addressing mode "symbol + base + index[<< scale] [+ offset]". */
+ if (parts.symbol != NULL_TREE
+ && !valid_mem_ref_p (mem_mode, as, &parts, code))
+ {
+ aff_combination_add_elt (aff_inv, parts.symbol, 1);
+ parts.symbol = NULL_TREE;
+ /* Reset SIMPLE_INV since symbol address needs to be computed
+ outside of address expression in this case. */
+ simple_inv = false;
+ /* Symbol part is moved back to base part, it can't be NULL. */
+ parts.base = integer_one_node;
}
- else
- parts.index = NULL_TREE;
}
else
{
@@ -4799,14 +4798,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (stmt_after_increment (data->current_loop, cand, use->stmt))
ainc_offset += ainc_step;
- cost = get_address_cost_ainc (ainc_step, ainc_offset,
+ autoinc_cost = get_address_cost_ainc (ainc_step, ainc_offset,
addr_mode, mem_mode, as, speed);
- if (!cost.infinite_cost_p ())
- {
- *can_autoinc = true;
- return cost;
- }
- cost = no_cost;
+ if (!autoinc_cost.infinite_cost_p ())
+ *can_autoinc = true;
+ else
+ autoinc_cost = no_cost;
}
if (!aff_combination_zero_p (aff_inv))
{
@@ -4852,10 +4849,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
cost += var_cost;
addr = addr_for_mem_ref (&parts, as, false);
gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
- cost += address_cost (addr, mem_mode, as, speed);
+ acost += address_cost (addr, mem_mode, as, speed);
if (parts.symbol != NULL_TREE)
cost.complexity += 1;
+ /* var_present. */
+ else if (!aff_combination_const_p (aff_inv))
+ cost.complexity += 1;
/* Don't increase the complexity of adding a scaled index if it's
the only kind of index that the target allows. */
if (parts.step != NULL_TREE && ok_without_ratio_p)
@@ -4865,6 +4865,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
cost.complexity += 1;
+ cost += (can_autoinc && *can_autoinc) ? autoinc_cost : acost;
return cost;
}
--
2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
@ 2024-03-18 11:28 Aleksandar Rakic
0 siblings, 0 replies; 24+ messages in thread
From: Aleksandar Rakic @ 2024-03-18 11:28 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 160 bytes --]
Here<https://github.com/rakicaleksandar1999/gcc/tree/bug_109429> is a patch for the GCC bug 109429<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429>.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-04-15 13:30 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
2022-10-25 11:08 ` Richard Biener
2022-10-25 13:00 ` Dimitrije Milosevic
2022-10-27 23:02 ` Jeff Law
2022-10-28 6:43 ` Dimitrije Milosevic
2022-10-28 7:00 ` Richard Biener
2022-10-28 13:39 ` Dimitrije Milosevic
2022-11-01 18:46 ` Jeff Law
2022-11-02 8:40 ` Dimitrije Milosevic
2022-11-07 13:35 ` Richard Biener
2022-12-15 15:26 ` Dimitrije Milosevic
2022-12-16 9:58 ` Richard Biener
2022-12-16 11:37 ` Dimitrije Milosevic
2022-12-16 11:58 ` Richard Biener
2022-10-21 13:52 ` [PATCH 2/2] ivopts: Consider number of invariants when calculating register pressure Dimitrije Milosevic
2022-10-25 11:07 ` Richard Biener
2022-10-25 13:00 ` Dimitrije Milosevic
2022-10-28 7:38 ` Richard Biener
2022-10-28 13:39 ` Dimitrije Milosevic
2022-11-07 12:56 ` Richard Biener
2024-03-18 11:28 [PATCH 1/2] ivopts: Revert computation of address cost complexity Aleksandar Rakic
2024-03-18 20:27 Aleksandar Rakic
2024-04-15 13:30 ` Aleksandar Rakic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).