public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
@ 2024-03-18 11:28 Aleksandar Rakic
  0 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2024-03-18 11:28 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 160 bytes --]

Here<https://github.com/rakicaleksandar1999/gcc/tree/bug_109429> is a patch for the GCC bug 109429<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429>.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2024-03-18 20:27 Aleksandar Rakic
@ 2024-04-15 13:30 ` Aleksandar Rakic
  0 siblings, 0 replies; 17+ messages in thread
From: Aleksandar Rakic @ 2024-04-15 13:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: richard.guenther, jeffreyalaw, Djordje Todorovic, Jovan Dmitrovic

PING: I remind you that the patch for the computation of complexity for unsupported addressing modes has been sent.

________________________________________
From: Aleksandar Rakic
Sent: Monday, March 18, 2024 9:27 PM
To: gcc-patches@gcc.gnu.org
Cc: Jovan Dmitrovic; richard.guenther@gmail.com; Djordje Todorovic; jeffreyalaw@gmail.com; Uros Beric
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.

From dbf49f2872efcc14d2ea41eb7d616498dca9789f Mon Sep 17 00:00:00 2001
From: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
Date: Tue, 5 Mar 2024 11:55:01 +0100
Subject: [PATCH] ivopts: Fixed bug 109429

This patch modifies the order of the complexity calculation. By fixing the
complexities, the candidate selection is also fixed, which leads to the smaller
code size.

This patch also fixes the complexity if the variable is present in
the address expression, similarly to the variable 'var_present' in the
commit c2b64ce.

It also differentiates the adding of the autoinc_cost and the address
cost (acost) to the cost, similarly to the commit c2b64ce.

It also contains the C test and the script that generates the
assembly file and the output of the compiler. The assembly code
obtained after the modification of the file tree-ssa-loop-ivopts.cc is
smaller in size than the assembly code obtained before that. The output
of the compiler shows the difference in complexities for the function dgefa
for the loop 3 for the group 1.

This patch is available on the gcc fork on the following address:
https://github.com/rakicaleksandar1999/gcc/tree/bug_109429.

The description of the bug 109429 is on the following address:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429.

gcc/ChangeLog:

    * tree-ssa-loop-ivopts.cc (get_address_cost): Fixed the
    complexities calculation.

gcc/testsuite/ChangeLog:

    * after.s: The assembly file obtained by compiling the fp_foo.c
    file after modification of the tree-ssa-loop-ivopts.cc file.
    * after.txt: The compiler-generated output obtained by compiling
    the fp_foo.c file after modification of the
    tree-ssa-loop-ivopts.cc file.
    * before.s: The assembly file obtained by compiling the fp_foo.c
    file before modification of the tree-ssa-loop-ivopts.cc file.
    * before.txt: The compiler-generated output obtained by compiling
    the fp_foo.c file before modification of the
    tree-ssa-loop-ivopts.cc file.
    * fp_foo.c: The C test.
    * test_script.sh: The script used for compiling the fp_foo.c file.

Signed-off-by: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
---
 gcc/testsuite/after.s        |  148 ++
 gcc/testsuite/after.txt      | 2792 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/before.s       |  152 ++
 gcc/testsuite/before.txt     | 2694 ++++++++++++++++++++++++++++++++
 gcc/testsuite/fp_foo.c       |   19 +
 gcc/testsuite/test_script.sh |   10 +
 gcc/tree-ssa-loop-ivopts.cc  |   75 +-
 7 files changed, 5853 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/after.s
 create mode 100644 gcc/testsuite/after.txt
 create mode 100644 gcc/testsuite/before.s
 create mode 100644 gcc/testsuite/before.txt
 create mode 100644 gcc/testsuite/fp_foo.c
 create mode 100644 gcc/testsuite/test_script.sh

diff --git a/gcc/testsuite/after.s b/gcc/testsuite/after.s
new file mode 100644
index 00000000000..a32bb8b3614
--- /dev/null
+++ b/gcc/testsuite/after.s
@@ -0,0 +1,148 @@
+       .file   1 "fp_foo.c"
+       .section .mdebug.abi64
+       .previous
+       .nan    2008
+       .module fp=64
+       .module oddspreg
+       .module arch=mips64r6
+       .abicalls
+       .text
+       .align  2
+       .align  3
+       .globl  daxpy
+       .set    nomips16
+       .set    nomicromips
+       .ent    daxpy
+       .type   daxpy, @function
+daxpy:
+       .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
+       .mask   0x00000000,0
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       blezc   $6,.L7
+       dlsa    $6,$6,$4,2
+       .align  3
+.L3:
+       lwc1    $f1,0($5)
+       daddiu  $4,$4,4
+       lwc1    $f0,-4($4)
+       daddiu  $5,$5,4
+       maddf.s $f0,$f1,$f15
+       bne     $4,$6,.L3
+       swc1    $f0,-4($4)
+
+.L7:
+       jrc     $31
+       .set    macro
+       .set    reorder
+       .end    daxpy
+       .size   daxpy, .-daxpy
+       .align  2
+       .align  3
+       .globl  dgefa
+       .set    nomips16
+       .set    nomicromips
+       .ent    dgefa
+       .type   dgefa, @function
+dgefa:
+       .frame  $sp,48,$31              # vars= 0, regs= 5/0, args= 0, gp= 0
+       .mask   0x100f0000,-8
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       li      $2,1                    # 0x1
+       bgec    $2,$6,.L23
+       daddiu  $sp,$sp,-48
+       addiu   $14,$6,-1
+       move    $10,$6
+       sd      $19,32($sp)
+       sd      $18,24($sp)
+       move    $11,$4
+       sd      $17,16($sp)
+       move    $17,$5
+       sd      $16,8($sp)
+       dlsa    $9,$7,$4,2
+       addiu   $19,$5,1
+       dsll    $12,$5,2
+       move    $25,$5
+       move    $24,$0
+       move    $13,$0
+       move    $15,$0
+       move    $18,$14
+       .align  3
+.L11:
+       addiu   $7,$15,1
+       addiu   $16,$15,1
+       daddiu  $13,$13,1
+       move    $15,$7
+       bgec    $7,$10,.L15
+       daddiu  $8,$24,1
+       daddu   $6,$13,$25
+       dlsa    $8,$8,$11,2
+       dsll    $6,$6,2
+       move    $5,$14
+       .align  3
+.L14:
+       daddu   $2,$9,$6
+       daddu   $4,$11,$6
+       lwc1    $f2,-4($2)
+       move    $3,$0
+       move    $2,$8
+       .align  3
+.L13:
+       lwc1    $f1,0($4)
+       daddiu  $2,$2,4
+       lwc1    $f0,-4($2)
+       addiu   $3,$3,1
+       daddiu  $4,$4,4
+       maddf.s $f0,$f2,$f1
+       swc1    $f0,-4($2)
+       bltc    $3,$5,.L13
+       addiu   $7,$7,1
+       bne     $10,$7,.L14
+       daddu   $6,$6,$12
+
+.L15:
+       addiu   $14,$14,-1
+       daddiu  $9,$9,-4
+       addu    $24,$24,$19
+       bne     $18,$16,.L11
+       addu    $25,$17,$25
+
+       ld      $19,32($sp)
+       ld      $18,24($sp)
+       ld      $17,16($sp)
+       ld      $16,8($sp)
+       jr      $31
+       daddiu  $sp,$sp,48
+
+.L23:
+       jrc     $31
+       .set    macro
+       .set    reorder
+       .end    dgefa
+       .size   dgefa, .-dgefa
+       .section        .text.startup,"ax",@progbits
+       .align  2
+       .align  3
+       .globl  main
+       .set    nomips16
+       .set    nomicromips
+       .ent    main
+       .type   main, @function
+main:
+       .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
+       .mask   0x00000000,0
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       jr      $31
+       move    $2,$0
+
+       .set    macro
+       .set    reorder
+       .end    main
+       .size   main, .-main
+       .ident  "GCC: (GNU) 14.0.1 20240214 (experimental)"
+       .section        .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/after.txt b/gcc/testsuite/after.txt
new file mode 100644
index 00000000000..772f92d2b20
--- /dev/null
+++ b/gcc/testsuite/after.txt
@@ -0,0 +1,2792 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;;  header 3, latch 6
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_12(D) + 4294967295
+;;  upper_bound 2147483646
+;;  likely_upper_bound 2147483646
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;;  nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+  single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+  {
+    <bb 2> [local count: 118111600]:
+    if (n_12(D) > 0)
+      goto <bb 5>; [89.00%]
+    else
+      goto <bb 4>; [11.00%]
+
+  }
+  bb_5 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 5> [local count: 105119324]:
+
+  }
+  bb_7 (preds = {bb_3 }, succs = {bb_4 })
+  {
+    <bb 7> [local count: 105119324]:
+    # .MEM_22 = PHI <.MEM_16(3)>
+
+  }
+  bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+  {
+    <bb 4> [local count: 118111600]:
+    # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+    # VUSE <.MEM_29>
+    return;
+
+  }
+  loop_1 (header = 3, latch = 6, finite_p
+  niter (unsigned int) n_12(D) + 4294967295
+  upper_bound 2147483646
+  likely_upper_bound 2147483646
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+  {
+    bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+    {
+      <bb 3> [local count: 955630224]:
+      # i_20 = PHI <i_17(6), 0(5)>
+      # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+      _1 = (long unsigned int) i_20;
+      _2 = _1 * 4;
+      _3 = vector1_13(D) + _2;
+      # VUSE <.MEM_21>
+      _4 = *_3;
+      _5 = vector2_14(D) + _2;
+      # VUSE <.MEM_21>
+      _6 = *_5;
+      _7 = _6 * fp_const_15(D);
+      _8 = _4 + _7;
+      # .MEM_16 = VDEF <.MEM_21>
+      *_3 = _8;
+      i_17 = i_20 + 1;
+      if (n_12(D) > i_17)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 7>; [11.00%]
+
+    }
+    bb_6 (preds = {bb_3 }, succs = {bb_3 })
+    {
+      <bb 6> [local count: 850510900]:
+      goto <bb 3>; [100.00%]
+
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_12(D)
+  bounds on difference of bases: 0 ... 2147483646
+  result:
+    # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _1
+  Type:        long unsigned int
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _2
+  Type:        long unsigned int
+  Base:        0
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _3
+  Type:        float *
+  Base:        vector1_13(D)
+  Step:        4
+  Object:      (void *) vector1_13(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _5
+  Type:        float *
+  Base:        vector2_14(D)
+  Step:        4
+  Object:      (void *) vector2_14(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_17
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_20
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   _4 = *_3;
+    At pos:    *_3
+    IV struct:
+      Type:    float *
+      Base:    vector1_13(D)
+      Step:    4
+      Object:  (void *) vector1_13(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+  Use 0.1:
+    At stmt:   *_3 = _8;
+    At pos:    *_3
+    IV struct:
+      Type:    float *
+      Base:    vector1_13(D)
+      Step:    4
+      Object:  (void *) vector1_13(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:   _6 = *_5;
+    At pos:    *_5
+    IV struct:
+      Type:    float *
+      Base:    vector2_14(D)
+      Step:    4
+      Object:  (void *) vector2_14(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 2:
+  Type:        COMPARE
+  Use 2.0:
+    At stmt:   if (n_12(D) > i_17)
+    At pos:    i_17
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.6
+  Var after: ivtmp.6
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.7
+  Var after: ivtmp.7
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.8
+  Var after: ivtmp.8
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.9
+  Var after: ivtmp.9
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector1_13(D)
+    Step:      4
+    Object:    (void *) vector1_13(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.10
+  Var after: ivtmp.10
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector2_14(D)
+    Step:      4
+    Object:    (void *) vector2_14(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.11
+  Var after: ivtmp.11
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 7:
+  Var befor: ivtmp.12
+  Var after: ivtmp.12
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      4
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 7
+  Group 1:     0, 1, 2, 3, 5, 7
+  Group 2:     0, 1, 2, 3, 6
+
+<Candidate Costs>:
+  cand cost
+force_expr_to_var_cost size costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+force_expr_to_var_cost speed costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+  0    5
+  1    5
+  2    5
+  3    4
+  4    5
+  5    5
+  6    5
+  7    5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D)   (eliminable)
+Inv 2: vector2_14(D)   (eliminable)
+Inv 3: fp_const_15(D)  (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 2:    (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    18      2       NIL;    1
+  2    18      4       NIL;    1
+  4    2       0       NIL;    NIL;
+  7    10      2       NIL;    1
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    9       1       NIL;    2
+  2    9       2       NIL;    2
+  5    1       0       NIL;    NIL;
+  7    5       1       NIL;    2
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    4
+  1    0       0       NIL;    4
+  2    1       0       NIL;    4
+  3    0       0       NIL;    4
+  4    1       0       1;      NIL;
+  5    1       0       2;      NIL;
+  6    0       0       NIL;    4
+  7    1       0       NIL;    4
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   38
+  20   40
+  21   42
+  22   44
+  23   115
+  24   120
+  25   125
+  26   130
+  27   179
+  28   228
+  29   277
+  30   326
+  31   375
+  32   424
+  33   473
+  34   522
+  35   571
+  36   620
+  37   669
+  38   718
+  39   767
+  40   816
+  41   865
+  42   914
+  43   963
+  44   1012
+  45   1061
+  46   1110
+  47   1159
+  48   1208
+  49   1257
+  50   1306
+  51   1355
+  52   1404
+
+Initial set of candidates:
+  cost: 37 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 27 (complexity 3)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(18,2)
+   group:1 --> iv_cand:1, cost=(9,1)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 26 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 3)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,2)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 24 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 1)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables:
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 26 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 3)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,2)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 24 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 1)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables:
+  invariant expressions: 1
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: ivtmp.9_28
+  Var after: ivtmp.9_27
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector1_13(D)
+    Step:      4
+    Object:    (void *) vector1_13(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.10_25
+  Var after: ivtmp.10_24
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector2_14(D)
+    Step:      4
+    Object:    (void *) vector2_14(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;;  header 8, latch 13
+;;  depth 3, outer 2, finite_p
+;;  niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;;  nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+  single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          # VUSE <.MEM_57>
+          _35 = *_34;
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          # VUSE <.MEM_57>
+          _37 = *_36;
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          # .MEM_42 = VDEF <.MEM_57>
+          *_34 = _39;
+          i_40 = i_56 + 1;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 3
+  exit condition [1, + , 1](no_overflow) < _87
+  bounds on difference of bases: -2147483649 ... 2147483646
+  result:
+    zero if _87 <= 0
+    # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _21
+  Type:        sizetype
+  Base:        ((sizetype) _7 + 1) * 4
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _29
+  Type:        sizetype
+  Base:        ((sizetype) _11 + 1) * 4
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _32
+  Type:        long unsigned int
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _33
+  Type:        long unsigned int
+  Base:        0
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _34
+  Type:        float *
+  Base:        vector_27(D) + ((sizetype) _7 + 1) * 4
+  Step:        4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _36
+  Type:        float *
+  Base:        vector_27(D) + ((sizetype) _11 + 1) * 4
+  Step:        4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_40
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_56
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   _35 = *_34;
+    At pos:    *_34
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+  Use 0.1:
+    At stmt:   *_34 = _39;
+    At pos:    *_34
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:   _37 = *_36;
+    At pos:    *_36
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _11 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 2:
+  Type:        COMPARE
+  Use 2.0:
+    At stmt:   if (i_40 < _87)
+    At pos:    i_40
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.20
+  Var after: ivtmp.20
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.21
+  Var after: ivtmp.21
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.22
+  Var after: ivtmp.22
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.23
+  Var after: ivtmp.23
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.24
+  Var after: ivtmp.24
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.25
+  Var after: ivtmp.25
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Var befor: ivtmp.26
+  Var after: ivtmp.26
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.27
+  Var after: ivtmp.27
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 9:
+  Var befor: ivtmp.28
+  Var after: ivtmp.28
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      4
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 5, 9
+  Group 1:     0, 1, 2, 3, 6, 7, 9
+  Group 2:     0, 1, 2, 3, 8
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    5
+  3    4
+  4    6
+  5    6
+  6    6
+  7    6
+  8    5
+  9    5
+
+
+<Invariant Vars>:
+Inv 6: _7      (eliminable)
+Inv 1: _10     (eliminable)
+Inv 7: _11     (eliminable)
+Inv 3: _14     (eliminable)
+Inv 2: vector_27(D)    (eliminable)
+Inv 4: t_28    (eliminable)
+Inv 5: _87     (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 2:    ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 3:    (unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4
+inv_expr 4:    (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 5:    ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 6:    (unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    22      4       1;      NIL;
+  2    22      2       1;      NIL;
+  4    2       0       NIL;    NIL;
+  5    2       2       NIL;    NIL;
+  6    16      2       2;      NIL;
+  7    16      4       3;      NIL;
+  9    14      4       1;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    11      2       4;      NIL;
+  2    11      1       4;      NIL;
+  4    8       1       5;      NIL;
+  5    8       2       6;      NIL;
+  6    1       0       NIL;    NIL;
+  7    1       1       NIL;    NIL;
+  9    7       2       4;      NIL;
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    5
+  1    0       0       NIL;    5
+  2    4       0       NIL;    5
+  3    0       0       NIL;    5
+  8    4       0       NIL;    5
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   38
+  20   40
+  21   42
+  22   44
+  23   115
+  24   120
+  25   125
+  26   130
+  27   179
+  28   228
+  29   277
+  30   326
+  31   375
+  32   424
+  33   473
+  34   522
+  35   571
+  36   620
+  37   669
+  38   718
+  39   767
+  40   816
+  41   865
+  42   914
+  43   963
+  44   1012
+  45   1061
+  46   1110
+  47   1159
+  48   1208
+  49   1257
+  50   1306
+  51   1355
+  52   1404
+
+Initial set of candidates:
+  cost: 47 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 37 (complexity 3)
+  candidates: 2
+   group:0 --> iv_cand:2, cost=(22,2)
+   group:1 --> iv_cand:2, cost=(11,1)
+   group:2 --> iv_cand:2, cost=(4,0)
+  invariant variables: 5
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 31 (complexity 1)
+  reg_cost: 6
+  cand_cost: 11
+  cand_group_cost: 14 (complexity 1)
+  candidates: 2, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:2, cost=(4,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 1)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 7
+  cand_cost: 16
+  cand_group_cost: 3 (complexity 0)
+  candidates: 3, 4, 6
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:6, cost=(1,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions:
+
+Initial set of candidates:
+  cost: 37 (complexity 6)
+  reg_cost: 7
+  cand_cost: 9
+  cand_group_cost: 21 (complexity 6)
+  candidates: 3, 9
+   group:0 --> iv_cand:9, cost=(14,4)
+   group:1 --> iv_cand:9, cost=(7,2)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 26 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 1)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 7
+  cand_cost: 16
+  cand_group_cost: 3 (complexity 0)
+  candidates: 3, 4, 6
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:6, cost=(1,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions:
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 3 IVs:
+Candidate 3:
+  Var befor: i_56
+  Var after: i_40
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.23_85
+  Var after: ivtmp.23_84
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.25_78
+  Var after: ivtmp.25_77
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+;;
+;; Loop 2
+;;  header 7, latch 12
+;;  depth 2, outer 1, finite_p
+;;  niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;;  nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+  single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+        _75 = (sizetype) _11;
+        _74 = _75 + 1;
+        _73 = _74 * 4;
+        _72 = vector_27(D) + _73;
+        ivtmp.25_76 = (unsigned long) _72;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _71 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_71];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _69 = (void *) ivtmp.25_78;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_69];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _70 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_70] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          ivtmp.25_77 = ivtmp.25_78 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 2
+  exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+  number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _1
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _2
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D) + l_26(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _3
+  Type:        long unsigned int
+  Base:        (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+  Step:        (long unsigned int) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _4
+  Type:        long unsigned int
+  Base:        ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:        (long unsigned int) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _5
+  Type:        float *
+  Base:        vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:        (long unsigned int) m_25(D) * 4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _11
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D) + i_50
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _12
+  Type:        sizetype
+  Base:        (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _13
+  Type:        sizetype
+  Base:        ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _14
+  Type:        sizetype
+  Base:        (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:        (sizetype) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    j_30
+  Type:        int
+  Base:        i_50 + 2
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    j_51
+  Type:        int
+  Base:        i_50 + 1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _72
+  Type:        float *
+  Base:        vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:        (sizetype) m_25(D) * 4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _73
+  Type:        sizetype
+  Base:        (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:        (sizetype) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _74
+  Type:        sizetype
+  Base:        ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _75
+  Type:        sizetype
+  Base:        (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    ivtmp.25_76
+  Type:        unsigned long
+  Base:        (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+  Step:        (sizetype) m_25(D) * 4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   t_28 = *_5;
+    At pos:    *_5
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+      Step:    (long unsigned int) m_25(D) * 4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        COMPARE
+  Use 1.0:
+    At stmt:   if (n_23(D) > j_30)
+    At pos:    j_30
+    IV struct:
+      Type:    int
+      Base:    i_50 + 2
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 2:
+  Type:        GENERIC
+  Use 2.0:
+    At stmt:   ivtmp.25_76 = (unsigned long) _72;
+    At pos:
+    IV struct:
+      Type:    unsigned long
+      Base:    (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+      Step:    (sizetype) m_25(D) * 4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 3:
+  Type:        GENERIC
+  Use 3.0:
+    At stmt:   _14 = _13 * 4;
+    At pos:
+    IV struct:
+      Type:    sizetype
+      Base:    (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+      Step:    (sizetype) m_25(D) * 4
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.29
+  Var after: ivtmp.29
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.30
+  Var after: ivtmp.30
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.31
+  Var after: ivtmp.31
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (sizetype) (i_50 + 2)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Var befor: ivtmp.32
+  Var after: ivtmp.32
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (sizetype) (i_50 + 1)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      i_50 + 1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.33
+  Var after: ivtmp.33
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+    Step:      (unsigned long) ((long unsigned int) m_25(D) * 4)
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.34
+  Var after: ivtmp.34
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (i_50 + 2)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 7:
+  Var befor: ivtmp.35
+  Var after: ivtmp.35
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) i_50
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 8:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.36
+  Var after: ivtmp.36
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+    Step:      (sizetype) m_25(D) * 4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37
+  Var after: ivtmp.37
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4) + (unsigned long) vector_27(D)
+    Step:      (sizetype) m_25(D) * 4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38
+  Var after: ivtmp.38
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 11:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.39
+  Var after: ivtmp.39
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 12:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.40
+  Var after: ivtmp.40
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      (long unsigned int) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+<Important Candidates>:         0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 5, 12
+  Group 1:     0, 1, 2, 3, 4, 6, 7
+  Group 2:     0, 1, 2, 3, 4, 8, 9, 10, 11, 12
+  Group 3:     0, 1, 2, 3, 4, 10, 11, 12
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    6
+  3    6
+  4    4
+  5    9
+  6    5
+  7    5
+  8    10
+  9    9
+  10   10
+  11   9
+  12   5
+
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24    (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50    (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1:    (long unsigned int) m_25(D) * 4
+inv_expr 2:    ((unsigned long) l_26(D) - (unsigned long) i_50) * 4
+inv_expr 3:    (unsigned long) i_50 * 18446744073709551612 + (unsigned long) l_26(D) * 4
+inv_expr 4:    ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 5:    ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 6:    ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 7:    (signed int) i_50 + 1
+inv_expr 8:    (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 9:    ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 10:   ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 11:   (((signed long) i_50 - (signed long) l_26(D)) + 1) * 4
+inv_expr 12:   (signed long) vector_27(D) + 4
+inv_expr 13:   (((signed long) ((i_50 + 1) * m_25(D)) * 4 + (signed long) vector_27(D)) + (signed long) i_50 * 4) + 4
+inv_expr 14:   (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 15:   4 - (signed long) vector_27(D)
+inv_expr 16:   (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    1       0       NIL;    NIL;
+  8    8       2       2;      NIL;
+  9    8       1       3;      NIL;
+  10   8       2       4;      NIL;
+  11   8       1       4;      NIL;
+  12   10      1       5;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       6;      NIL;
+  1    2       0       8;      NIL;
+  2    3       0       9;      NIL;
+  3    0       0       NIL;    7
+  4    0       0       NIL;    7
+  6    0       0       NIL;    7
+  7    0       0       NIL;    7
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    6       0       11;     NIL;
+  8    0       0       NIL;    NIL;
+  9    4       0       NIL;    NIL;
+  10   4       0       NIL;    NIL;
+  11   4       0       12;     NIL;
+  12   9       0       13;     NIL;
+
+Group 3:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    7       0       14;     NIL;
+  8    8       0       NIL;    NIL;
+  9    4       0       15;     NIL;
+  10   0       0       NIL;    NIL;
+  11   4       0       NIL;    NIL;
+  12   9       0       16;     NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   111
+  20   116
+  21   121
+  22   126
+  23   151
+  24   176
+  25   201
+  26   226
+  27   275
+  28   324
+  29   373
+  30   422
+  31   471
+  32   520
+  33   569
+  34   618
+  35   667
+  36   716
+  37   765
+  38   814
+  39   863
+  40   912
+  41   961
+  42   1010
+  43   1059
+  44   1108
+  45   1157
+  46   1206
+  47   1255
+  48   1304
+  49   1353
+  50   1402
+  51   1451
+  52   1500
+
+Initial set of candidates:
+  cost: 35 (complexity 0)
+  reg_cost: 8
+  cand_cost: 13
+  cand_group_cost: 14 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:5, cost=(1,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:5, cost=(6,0)
+   group:3 --> iv_cand:5, cost=(7,0)
+  invariant variables: 7
+  invariant expressions: 1, 11, 14
+
+Improved to:
+  cost: 33 (complexity 2)
+  reg_cost: 7
+  cand_cost: 14
+  cand_group_cost: 12 (complexity 2)
+  candidates: 4, 10
+   group:0 --> iv_cand:10, cost=(8,2)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:10, cost=(4,0)
+   group:3 --> iv_cand:10, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 4
+
+Initial set of candidates:
+  cost: 33 (complexity 2)
+  reg_cost: 7
+  cand_cost: 14
+  cand_group_cost: 12 (complexity 2)
+  candidates: 4, 10
+   group:0 --> iv_cand:10, cost=(8,2)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:10, cost=(4,0)
+   group:3 --> iv_cand:10, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 4
+
+Original cost 33 (complexity 2)
+
+Final cost 33 (complexity 2)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: j_51
+  Var after: j_30
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      i_50 + 1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38_68
+  Var after: ivtmp.38_67
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;;  header 4, latch 11
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_23(D) + 4294967294
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;;  nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+  single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+      _66 = (sizetype) m_25(D);
+      _65 = _66 * 4;
+      _63 = i_50 + 1;
+      _62 = m_25(D) * _63;
+      _61 = (sizetype) _62;
+      _60 = (sizetype) i_50;
+      _59 = _60 + _61;
+      _58 = _59 + 1;
+      ivtmp.38_64 = _58 * 4;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        # ivtmp.38_68 = PHI <ivtmp.38_67(12), ivtmp.38_64(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        _49 = (sizetype) i_50;
+        _48 = _49 * 18446744073709551612;
+        _47 = (sizetype) l_26(D);
+        _46 = _47 * 4;
+        _44 = _46 + _48;
+        _43 = vector_27(D) + _44;
+        _41 = _43 + 18446744073709551612;
+        _31 = _43 + ivtmp.38_68;
+        # VUSE <.MEM_52>
+        t_28 = MEM[(float *)_31 + -4B];
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = ivtmp.38_68;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+        _75 = (sizetype) _11;
+        _74 = _75 + 1;
+        _73 = _74 * 4;
+        _72 = vector_27(D) + _73;
+        _20 = (unsigned long) vector_27(D);
+        _19 = _20 + ivtmp.38_68;
+        ivtmp.25_76 = _19;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        ivtmp.38_67 = ivtmp.38_68 + _65;
+        if (j_30 != n_23(D))
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _71 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_71];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _69 = (void *) ivtmp.25_78;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_69];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _70 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_70] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          ivtmp.25_77 = ivtmp.25_78 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+  number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _6
+  Type:        int
+  Base:        0
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _7
+  Type:        int
+  Base:        0
+  Step:        (int) ((unsigned int) m_25(D) + 1)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    j_24
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _41
+  Type:        float *
+  Base:        vector_27(D) + ((sizetype) l_26(D) * 4 + 18446744073709551612)
+  Step:        18446744073709551612
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _43
+  Type:        float *
+  Base:        vector_27(D) + (sizetype) l_26(D) * 4
+  Step:        18446744073709551612
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _44
+  Type:        sizetype
+  Base:        (sizetype) l_26(D) * 4
+  Step:        18446744073709551612
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _48
+  Type:        sizetype
+  Base:        0
+  Step:        18446744073709551612
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _49
+  Type:        sizetype
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_50
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _60
+  Type:        sizetype
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _62
+  Type:        int
+  Base:        m_25(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _63
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _87
+  Type:        int
+  Base:        n_23(D) + -1
+  Step:        -1
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+
+<IV Groups>:
+Group 0:
+  Type:        COMPARE
+  Use 0.0:
+    At stmt:   if (n_23(D) > j_24)
+    At pos:    j_24
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 1:
+  Type:        COMPARE
+  Use 1.0:
+    At stmt:   if (j_24 < _45)
+    At pos:    j_24
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 2:
+  Type:        COMPARE
+  Use 2.0:
+    At stmt:   if (i_40 < _87)
+    At pos:    _87
+    IV struct:
+      Type:    int
+      Base:    n_23(D) + -1
+      Step:    -1
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 3:
+  Type:        GENERIC
+  Use 3.0:
+    At stmt:   j_24 = i_50 + 1;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 4:
+  Type:        GENERIC
+  Use 4.0:
+    At stmt:   _43 = vector_27(D) + _44;
+    At pos:
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + (sizetype) l_26(D) * 4
+      Step:    18446744073709551612
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    No-overflow
+Group 5:
+  Type:        GENERIC
+  Use 5.0:
+    At stmt:   i_50 = PHI <j_24(11), 0(10)>
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    0
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 6:
+  Type:        GENERIC
+  Use 6.0:
+    At stmt:   _7 = _6 + i_50;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    0
+      Step:    (int) ((unsigned int) m_25(D) + 1)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 7:
+  Type:        GENERIC
+  Use 7.0:
+    At stmt:   _62 = m_25(D) * _63;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    m_25(D)
+      Step:    m_25(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 8:
+  Type:        GENERIC
+  Use 8.0:
+    At stmt:   _60 = (sizetype) i_50;
+    At pos:
+    IV struct:
+      Type:    sizetype
+      Base:    0
+      Step:    1
+      Biv:     N
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.41
+  Var after: ivtmp.41
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.42
+  Var after: ivtmp.42
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.43
+  Var after: ivtmp.43
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.44
+  Var after: ivtmp.44
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (n_23(D) + -1)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.45
+  Var after: ivtmp.45
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) n_23(D)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.46
+  Var after: ivtmp.46
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:      18446744073709551612
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.47
+  Var after: ivtmp.47
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      (unsigned int) m_25(D) + 1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.48
+  Var after: ivtmp.48
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) m_25(D)
+    Step:      (unsigned int) m_25(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3
+  Group 1:     0, 1, 2, 3
+  Group 2:     0, 1, 2, 3, 4, 5
+  Group 3:     0, 1, 2, 3
+  Group 4:     0, 1, 2, 3, 6
+  Group 5:     0, 1, 2, 3
+  Group 6:     0, 1, 2, 3, 7
+  Group 7:     0, 1, 2, 3, 8
+  Group 8:     0, 1, 2, 3
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    5
+  3    4
+  4    5
+  5    5
+  6    6
+  7    5
+  8    5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45     (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    (unsigned int) m_25(D) + 1
+inv_expr 2:    (signed int) n_23(D) + 1
+inv_expr 3:    (signed int) n_23(D) + -1
+inv_expr 4:    (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    4       0       NIL;    NIL;
+  1    4       0       NIL;    NIL;
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  4    4       0       NIL;    NIL;
+  5    4       0       2;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    NIL;
+  1    0       0       NIL;    2
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  4    0       0       NIL;    NIL;
+  5    0       0       NIL;    NIL;
+  6    3       0       NIL;    NIL;
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    80      0       3;      NIL;
+  1    80      0       3;      NIL;
+  2    80      0       NIL;    NIL;
+  3    80      0       NIL;    NIL;
+  4    0       0       NIL;    NIL;
+  5    80      0       NIL;    NIL;
+
+Group 3:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    4       0       NIL;    NIL;
+  1    4       0       NIL;    NIL;
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  4    4       0       NIL;    NIL;
+  5    4       0       2;      NIL;
+
+Group 4:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    17      0       4;      NIL;
+  6    0       0       NIL;    NIL;
+
+Group 5:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    NIL;
+  1    0       0       NIL;    NIL;
+  2    4       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  4    4       0       3;      NIL;
+  5    4       0       NIL;    NIL;
+
+Group 6:
+  cand cost    compl.  inv.expr.       inv.vars
+  7    0       0       NIL;    NIL;
+
+Group 7:
+  cand cost    compl.  inv.expr.       inv.vars
+  8    0       0       NIL;    NIL;
+
+Group 8:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    0       0       NIL;    NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   111
+  20   116
+  21   121
+  22   126
+  23   151
+  24   176
+  25   201
+  26   226
+  27   275
+  28   324
+  29   373
+  30   422
+  31   471
+  32   520
+  33   569
+  34   618
+  35   667
+  36   716
+  37   765
+  38   814
+  39   863
+  40   912
+  41   961
+  42   1010
+  43   1059
+  44   1108
+  45   1157
+  46   1206
+  47   1255
+  48   1304
+  49   1353
+  50   1402
+  51   1451
+  52   1500
+
+Initial set of candidates:
+  cost: 126 (complexity 0)
+  reg_cost: 10
+  cand_cost: 19
+  cand_group_cost: 97 (complexity 0)
+  candidates: 1, 3, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:3, cost=(80,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 53 (complexity 0)
+  reg_cost: 12
+  cand_cost: 24
+  cand_group_cost: 17 (complexity 0)
+  candidates: 1, 3, 4, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 6, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 55 (complexity 0)
+  reg_cost: 10
+  cand_cost: 20
+  cand_group_cost: 25 (complexity 0)
+  candidates: 1, 4, 7, 8
+   group:0 --> iv_cand:4, cost=(4,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:4, cost=(4,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:1, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 45 (complexity 0)
+  reg_cost: 11
+  cand_cost: 26
+  cand_group_cost: 8 (complexity 0)
+  candidates: 1, 4, 6, 7, 8
+   group:0 --> iv_cand:4, cost=(4,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:4, cost=(4,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 6, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+  Var befor: ivtmp.42_18
+  Var after: ivtmp.42_17
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Var befor: i_50
+  Var after: j_24
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.44_16
+  Var after: ivtmp.44_15
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (n_23(D) + -1)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.46_92
+  Var after: ivtmp.46_93
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:      18446744073709551612
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.47_98
+  Var after: ivtmp.47_99
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      (unsigned int) m_25(D) + 1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.48_102
+  Var after: ivtmp.48_103
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) m_25(D)
+    Step:      (unsigned int) m_25(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/before.s b/gcc/testsuite/before.s
new file mode 100644
index 00000000000..e13834bdf59
--- /dev/null
+++ b/gcc/testsuite/before.s
@@ -0,0 +1,152 @@
+       .file   1 "fp_foo.c"
+       .section .mdebug.abi64
+       .previous
+       .nan    2008
+       .module fp=64
+       .module oddspreg
+       .module arch=mips64r6
+       .abicalls
+       .text
+       .align  2
+       .align  3
+       .globl  daxpy
+       .set    nomips16
+       .set    nomicromips
+       .ent    daxpy
+       .type   daxpy, @function
+daxpy:
+       .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
+       .mask   0x00000000,0
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       blezc   $6,.L7
+       dlsa    $6,$6,$4,2
+       .align  3
+.L3:
+       lwc1    $f1,0($5)
+       daddiu  $4,$4,4
+       lwc1    $f0,-4($4)
+       daddiu  $5,$5,4
+       maddf.s $f0,$f1,$f15
+       bne     $4,$6,.L3
+       swc1    $f0,-4($4)
+
+.L7:
+       jrc     $31
+       .set    macro
+       .set    reorder
+       .end    daxpy
+       .size   daxpy, .-daxpy
+       .align  2
+       .align  3
+       .globl  dgefa
+       .set    nomips16
+       .set    nomicromips
+       .ent    dgefa
+       .type   dgefa, @function
+dgefa:
+       .frame  $sp,48,$31              # vars= 0, regs= 6/0, args= 0, gp= 0
+       .mask   0x101f0000,-8
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       li      $2,1                    # 0x1
+       bgec    $2,$6,.L23
+       daddiu  $sp,$sp,-48
+       addiu   $14,$6,-1
+       move    $11,$6
+       sd      $20,32($sp)
+       sd      $19,24($sp)
+       addiu   $20,$5,1
+       sd      $18,16($sp)
+       move    $18,$4
+       sd      $17,8($sp)
+       dlsa    $10,$7,$4,2
+       sd      $16,0($sp)
+       move    $17,$5
+       dsll    $12,$5,2
+       move    $25,$5
+       move    $13,$0
+       move    $24,$0
+       move    $15,$0
+       move    $19,$14
+       .align  3
+.L11:
+       addiu   $8,$15,1
+       addiu   $16,$15,1
+       move    $15,$8
+       bgec    $8,$11,.L15
+       daddu   $5,$25,$24
+       daddiu  $9,$13,1
+       dsubu   $6,$0,$13
+       dsll    $5,$5,2
+       dlsa    $9,$9,$18,2
+       dsll    $6,$6,2
+       move    $7,$14
+       .align  3
+.L14:
+       daddu   $3,$10,$5
+       move    $2,$9
+       lwc1    $f2,0($3)
+       move    $4,$0
+       .align  3
+.L13:
+       daddu   $3,$6,$2
+       lwc1    $f0,0($2)
+       daddu   $3,$3,$5
+       daddiu  $2,$2,4
+       lwc1    $f1,0($3)
+       addiu   $4,$4,1
+       maddf.s $f0,$f2,$f1
+       swc1    $f0,-4($2)
+       bltc    $4,$7,.L13
+       addiu   $8,$8,1
+       bne     $11,$8,.L14
+       daddu   $5,$5,$12
+
+.L15:
+       daddiu  $24,$24,1
+       addu    $13,$20,$13
+       addiu   $14,$14,-1
+       daddiu  $10,$10,-4
+       bne     $19,$16,.L11
+       addu    $25,$17,$25
+
+       ld      $20,32($sp)
+       ld      $19,24($sp)
+       ld      $18,16($sp)
+       ld      $17,8($sp)
+       ld      $16,0($sp)
+       jr      $31
+       daddiu  $sp,$sp,48
+
+.L23:
+       jrc     $31
+       .set    macro
+       .set    reorder
+       .end    dgefa
+       .size   dgefa, .-dgefa
+       .section        .text.startup,"ax",@progbits
+       .align  2
+       .align  3
+       .globl  main
+       .set    nomips16
+       .set    nomicromips
+       .ent    main
+       .type   main, @function
+main:
+       .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
+       .mask   0x00000000,0
+       .fmask  0x00000000,0
+       .set    noreorder
+       .set    nomacro
+       jr      $31
+       move    $2,$0
+
+       .set    macro
+       .set    reorder
+       .end    main
+       .size   main, .-main
+       .ident  "GCC: (GNU) 14.0.1 20240214 (experimental)"
+       .section        .note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/before.txt b/gcc/testsuite/before.txt
new file mode 100644
index 00000000000..c87764b8ae9
--- /dev/null
+++ b/gcc/testsuite/before.txt
@@ -0,0 +1,2694 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;;  header 3, latch 6
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_12(D) + 4294967295
+;;  upper_bound 2147483646
+;;  likely_upper_bound 2147483646
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;;  nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+  single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+  {
+    <bb 2> [local count: 118111600]:
+    if (n_12(D) > 0)
+      goto <bb 5>; [89.00%]
+    else
+      goto <bb 4>; [11.00%]
+
+  }
+  bb_5 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 5> [local count: 105119324]:
+
+  }
+  bb_7 (preds = {bb_3 }, succs = {bb_4 })
+  {
+    <bb 7> [local count: 105119324]:
+    # .MEM_22 = PHI <.MEM_16(3)>
+
+  }
+  bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+  {
+    <bb 4> [local count: 118111600]:
+    # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+    # VUSE <.MEM_29>
+    return;
+
+  }
+  loop_1 (header = 3, latch = 6, finite_p
+  niter (unsigned int) n_12(D) + 4294967295
+  upper_bound 2147483646
+  likely_upper_bound 2147483646
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+  {
+    bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+    {
+      <bb 3> [local count: 955630224]:
+      # i_20 = PHI <i_17(6), 0(5)>
+      # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+      _1 = (long unsigned int) i_20;
+      _2 = _1 * 4;
+      _3 = vector1_13(D) + _2;
+      # VUSE <.MEM_21>
+      _4 = *_3;
+      _5 = vector2_14(D) + _2;
+      # VUSE <.MEM_21>
+      _6 = *_5;
+      _7 = _6 * fp_const_15(D);
+      _8 = _4 + _7;
+      # .MEM_16 = VDEF <.MEM_21>
+      *_3 = _8;
+      i_17 = i_20 + 1;
+      if (n_12(D) > i_17)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 7>; [11.00%]
+
+    }
+    bb_6 (preds = {bb_3 }, succs = {bb_3 })
+    {
+      <bb 6> [local count: 850510900]:
+      goto <bb 3>; [100.00%]
+
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_12(D)
+  bounds on difference of bases: 0 ... 2147483646
+  result:
+    # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _1
+  Type:        long unsigned int
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _2
+  Type:        long unsigned int
+  Base:        0
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _3
+  Type:        float *
+  Base:        vector1_13(D)
+  Step:        4
+  Object:      (void *) vector1_13(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _5
+  Type:        float *
+  Base:        vector2_14(D)
+  Step:        4
+  Object:      (void *) vector2_14(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_17
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_20
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   _4 = *_3;
+    At pos:    *_3
+    IV struct:
+      Type:    float *
+      Base:    vector1_13(D)
+      Step:    4
+      Object:  (void *) vector1_13(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+  Use 0.1:
+    At stmt:   *_3 = _8;
+    At pos:    *_3
+    IV struct:
+      Type:    float *
+      Base:    vector1_13(D)
+      Step:    4
+      Object:  (void *) vector1_13(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:   _6 = *_5;
+    At pos:    *_5
+    IV struct:
+      Type:    float *
+      Base:    vector2_14(D)
+      Step:    4
+      Object:  (void *) vector2_14(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 2:
+  Type:        COMPARE
+  Use 2.0:
+    At stmt:   if (n_12(D) > i_17)
+    At pos:    i_17
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.6
+  Var after: ivtmp.6
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.7
+  Var after: ivtmp.7
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.8
+  Var after: ivtmp.8
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.9
+  Var after: ivtmp.9
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector1_13(D)
+    Step:      4
+    Object:    (void *) vector1_13(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.10
+  Var after: ivtmp.10
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector2_14(D)
+    Step:      4
+    Object:    (void *) vector2_14(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.11
+  Var after: ivtmp.11
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 7:
+  Var befor: ivtmp.12
+  Var after: ivtmp.12
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      4
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 7
+  Group 1:     0, 1, 2, 3, 5, 7
+  Group 2:     0, 1, 2, 3, 6
+
+<Candidate Costs>:
+  cand cost
+force_expr_to_var_cost size costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+force_expr_to_var_cost speed costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+  0    5
+  1    5
+  2    5
+  3    4
+  4    5
+  5    5
+  6    5
+  7    5
+
+
+<Invariant Vars>:
+Inv 4: n_12(D) (eliminable)
+Inv 1: vector1_13(D)   (eliminable)
+Inv 2: vector2_14(D)   (eliminable)
+Inv 3: fp_const_15(D)  (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    (unsigned long) vector1_13(D) + 18446744073709551612
+inv_expr 2:    (unsigned long) vector2_14(D) + 18446744073709551612
+inv_expr 3:    (unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 4:    (unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    18      0       NIL;    1
+  2    20      0       1;      NIL;
+  4    2       0       NIL;    NIL;
+  7    10      0       NIL;    1
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    9       0       NIL;    2
+  2    10      0       2;      NIL;
+  5    1       0       NIL;    NIL;
+  7    5       0       NIL;    2
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    4
+  1    0       0       NIL;    4
+  2    1       0       NIL;    4
+  3    0       0       NIL;    4
+  4    1       0       3;      NIL;
+  5    1       0       4;      NIL;
+  6    0       0       NIL;    4
+  7    1       0       NIL;    4
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   38
+  20   40
+  21   42
+  22   44
+  23   115
+  24   120
+  25   125
+  26   130
+  27   179
+  28   228
+  29   277
+  30   326
+  31   375
+  32   424
+  33   473
+  34   522
+  35   571
+  36   620
+  37   669
+  38   718
+  39   767
+  40   816
+  41   865
+  42   914
+  43   963
+  44   1012
+  45   1061
+  46   1110
+  47   1159
+  48   1208
+  49   1257
+  50   1306
+  51   1355
+  52   1404
+
+Initial set of candidates:
+  cost: 37 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 27 (complexity 0)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(18,0)
+   group:1 --> iv_cand:1, cost=(9,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 0)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 24 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 0)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables:
+  invariant expressions: 3
+
+Initial set of candidates:
+  cost: 26 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 0)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 24 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 0)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions:
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables:
+  invariant expressions: 3
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: ivtmp.9_28
+  Var after: ivtmp.9_27
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector1_13(D)
+    Step:      4
+    Object:    (void *) vector1_13(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.10_25
+  Var after: ivtmp.10_24
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) vector2_14(D)
+    Step:      4
+    Object:    (void *) vector2_14(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;;  header 8, latch 13
+;;  depth 3, outer 2, finite_p
+;;  niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;;  nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+  single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          # VUSE <.MEM_57>
+          _35 = *_34;
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          # VUSE <.MEM_57>
+          _37 = *_36;
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          # .MEM_42 = VDEF <.MEM_57>
+          *_34 = _39;
+          i_40 = i_56 + 1;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 3
+  exit condition [1, + , 1](no_overflow) < _87
+  bounds on difference of bases: -2147483649 ... 2147483646
+  result:
+    zero if _87 <= 0
+    # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _21
+  Type:        sizetype
+  Base:        ((sizetype) _7 + 1) * 4
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _29
+  Type:        sizetype
+  Base:        ((sizetype) _11 + 1) * 4
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _32
+  Type:        long unsigned int
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _33
+  Type:        long unsigned int
+  Base:        0
+  Step:        4
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _34
+  Type:        float *
+  Base:        vector_27(D) + ((sizetype) _7 + 1) * 4
+  Step:        4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _36
+  Type:        float *
+  Base:        vector_27(D) + ((sizetype) _11 + 1) * 4
+  Step:        4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_40
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    i_56
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   _35 = *_34;
+    At pos:    *_34
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+  Use 0.1:
+    At stmt:   *_34 = _39;
+    At pos:    *_34
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:   _37 = *_36;
+    At pos:    *_36
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((sizetype) _11 + 1) * 4
+      Step:    4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 2:
+  Type:        COMPARE
+  Use 2.0:
+    At stmt:   if (i_40 < _87)
+    At pos:    i_40
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.20
+  Var after: ivtmp.20
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.21
+  Var after: ivtmp.21
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.22
+  Var after: ivtmp.22
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.23
+  Var after: ivtmp.23
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.24
+  Var after: ivtmp.24
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.25
+  Var after: ivtmp.25
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Var befor: ivtmp.26
+  Var after: ivtmp.26
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.27
+  Var after: ivtmp.27
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 9:
+  Var befor: ivtmp.28
+  Var after: ivtmp.28
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      4
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 5, 9
+  Group 1:     0, 1, 2, 3, 6, 7, 9
+  Group 2:     0, 1, 2, 3, 8
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    5
+  3    4
+  4    6
+  5    6
+  6    6
+  7    6
+  8    5
+  9    5
+
+
+<Invariant Vars>:
+Inv 6: _7      (eliminable)
+Inv 1: _10     (eliminable)
+Inv 7: _11     (eliminable)
+Inv 3: _14     (eliminable)
+Inv 2: vector_27(D)    (eliminable)
+Inv 4: t_28    (eliminable)
+Inv 5: _87     (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    ((unsigned long) _7 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 2:    (unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 3:    ((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 4:    ((unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4) + 4
+inv_expr 5:    ((unsigned long) _11 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 6:    (unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 7:    ((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 8:    ((unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4) + 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    22      0       1;      NIL;
+  2    22      0       2;      NIL;
+  4    2       0       NIL;    NIL;
+  5    2       2       NIL;    NIL;
+  6    16      0       3;      NIL;
+  7    18      0       4;      NIL;
+  9    14      0       1;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    11      0       5;      NIL;
+  2    11      0       6;      NIL;
+  4    8       0       7;      NIL;
+  5    9       0       8;      NIL;
+  6    1       0       NIL;    NIL;
+  7    1       1       NIL;    NIL;
+  9    7       0       5;      NIL;
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    5
+  1    0       0       NIL;    5
+  2    4       0       NIL;    5
+  3    0       0       NIL;    5
+  8    4       0       NIL;    5
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   38
+  20   40
+  21   42
+  22   44
+  23   115
+  24   120
+  25   125
+  26   130
+  27   179
+  28   228
+  29   277
+  30   326
+  31   375
+  32   424
+  33   473
+  34   522
+  35   571
+  36   620
+  37   669
+  38   718
+  39   767
+  40   816
+  41   865
+  42   914
+  43   963
+  44   1012
+  45   1061
+  46   1110
+  47   1159
+  48   1208
+  49   1257
+  50   1306
+  51   1355
+  52   1404
+
+Initial set of candidates:
+  cost: 43 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 33 (complexity 0)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(22,0)
+   group:1 --> iv_cand:1, cost=(11,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 5
+
+Improved to:
+  cost: 27 (complexity 0)
+  reg_cost: 6
+  cand_cost: 11
+  cand_group_cost: 10 (complexity 0)
+  candidates: 1, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 0)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Initial set of candidates:
+  cost: 37 (complexity 0)
+  reg_cost: 7
+  cand_cost: 9
+  cand_group_cost: 21 (complexity 0)
+  candidates: 3, 9
+   group:0 --> iv_cand:9, cost=(14,0)
+   group:1 --> iv_cand:9, cost=(7,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 0)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 3:
+  Var befor: i_56
+  Var after: i_40
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Var befor: ivtmp.23_85
+  Var after: ivtmp.23_84
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:      4
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+  allowed multipliers:
+
+;;
+;; Loop 2
+;;  header 7, latch 12
+;;  depth 2, outer 1, finite_p
+;;  niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;;  nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+  single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _78 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_78];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _76 = (sizetype) _7;
+          _75 = _76 * 18446744073709551612;
+          _74 = _75 + ivtmp.23_85;
+          _73 = (void *) _74;
+          _72 = (sizetype) _11;
+          _71 = _72 * 4;
+          _70 = _73 + _71;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_70];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _77 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_77] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 2
+  exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+  number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _1
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _2
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D) + l_26(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _3
+  Type:        long unsigned int
+  Base:        (long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+  Step:        (long unsigned int) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _4
+  Type:        long unsigned int
+  Base:        ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:        (long unsigned int) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _5
+  Type:        float *
+  Base:        vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:        (long unsigned int) m_25(D) * 4
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _11
+  Type:        int
+  Base:        (i_50 + 1) * m_25(D) + i_50
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _12
+  Type:        sizetype
+  Base:        (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _13
+  Type:        sizetype
+  Base:        ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _14
+  Type:        sizetype
+  Base:        (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:        (sizetype) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    j_30
+  Type:        int
+  Base:        i_50 + 2
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    j_51
+  Type:        int
+  Base:        i_50 + 1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _71
+  Type:        sizetype
+  Base:        ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+  Step:        (sizetype) m_25(D) * 4
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _72
+  Type:        sizetype
+  Base:        (sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:        (sizetype) m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+
+<IV Groups>:
+Group 0:
+  Type:        REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:   t_28 = *_5;
+    At pos:    *_5
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+      Step:    (long unsigned int) m_25(D) * 4
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 1:
+  Type:        COMPARE
+  Use 1.0:
+    At stmt:   if (n_23(D) > j_30)
+    At pos:    j_30
+    IV struct:
+      Type:    int
+      Base:    i_50 + 2
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 2:
+  Type:        GENERIC
+  Use 2.0:
+    At stmt:   _14 = _13 * 4;
+    At pos:
+    IV struct:
+      Type:    sizetype
+      Base:    (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+      Step:    (sizetype) m_25(D) * 4
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 3:
+  Type:        GENERIC
+  Use 3.0:
+    At stmt:   _71 = _72 * 4;
+    At pos:
+    IV struct:
+      Type:    sizetype
+      Base:    ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+      Step:    (sizetype) m_25(D) * 4
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.29
+  Var after: ivtmp.29
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.30
+  Var after: ivtmp.30
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.31
+  Var after: ivtmp.31
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (sizetype) (i_50 + 2)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Var befor: ivtmp.32
+  Var after: ivtmp.32
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (sizetype) (i_50 + 1)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      i_50 + 1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.33
+  Var after: ivtmp.33
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+    Step:      (unsigned long) ((long unsigned int) m_25(D) * 4)
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.34
+  Var after: ivtmp.34
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (i_50 + 2)
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 7:
+  Var befor: ivtmp.35
+  Var after: ivtmp.35
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) i_50
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 8:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.36
+  Var after: ivtmp.36
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37
+  Var after: ivtmp.37
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38
+  Var after: ivtmp.38
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      0
+    Step:      (long unsigned int) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+<Important Candidates>:         0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3, 4, 5, 10
+  Group 1:     0, 1, 2, 3, 4, 6, 7
+  Group 2:     0, 1, 2, 3, 4, 8, 9, 10
+  Group 3:     0, 1, 2, 3, 4, 9, 10
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    6
+  3    6
+  4    4
+  5    9
+  6    5
+  7    5
+  8    10
+  9    9
+  10   5
+
+Scaling cost based on bb prob by 8.00: 6 (scratch: 2) -> 34
+Scaling cost based on bb prob by 8.00: 4 (scratch: 0) -> 32
+Scaling cost based on bb prob by 8.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 8.00: 8 (scratch: 4) -> 36
+
+<Invariant Vars>:
+Inv 6: _7
+Inv 8: _10
+Inv 7: n_23(D) (eliminable)
+Inv 1: j_24    (eliminable)
+Inv 2: m_25(D) (eliminable)
+Inv 3: l_26(D) (eliminable)
+Inv 4: vector_27(D)
+Inv 5: i_50    (eliminable)
+Inv 9: _87
+
+<Invariant Expressions>:
+inv_expr 1:    (long unsigned int) m_25(D) * 4
+inv_expr 2:    (((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4) + 18446744073709551612
+inv_expr 3:    ((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 4:    ((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 5:    ((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 6:    (signed int) i_50 + 1
+inv_expr 7:    (unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 8:    ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 9:    ((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 10:   (((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 11:   (((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+inv_expr 12:   ((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4
+inv_expr 13:   ((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    1       0       NIL;    NIL;
+  8    9       0       2;      NIL;
+  9    8       0       3;      NIL;
+  10   10      0       4;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       5;      NIL;
+  1    2       0       7;      NIL;
+  2    3       0       8;      NIL;
+  3    0       0       NIL;    7
+  4    0       0       NIL;    7
+  6    0       0       NIL;    7
+  7    0       0       NIL;    7
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    7       0       10;     NIL;
+  8    0       0       NIL;    NIL;
+  9    4       0       NIL;    NIL;
+  10   9       0       11;     NIL;
+
+Group 3:
+  cand cost    compl.  inv.expr.       inv.vars
+  5    34      0       12;     NIL;
+  8    32      0       NIL;    NIL;
+  9    0       0       NIL;    NIL;
+  10   36      0       13;     NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   111
+  20   116
+  21   121
+  22   126
+  23   151
+  24   176
+  25   201
+  26   226
+  27   275
+  28   324
+  29   373
+  30   422
+  31   471
+  32   520
+  33   569
+  34   618
+  35   667
+  36   716
+  37   765
+  38   814
+  39   863
+  40   912
+  41   961
+  42   1010
+  43   1059
+  44   1108
+  45   1157
+  46   1206
+  47   1255
+  48   1304
+  49   1353
+  50   1402
+  51   1451
+  52   1500
+
+Initial set of candidates:
+  cost: 63 (complexity 0)
+  reg_cost: 8
+  cand_cost: 13
+  cand_group_cost: 42 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:5, cost=(1,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:5, cost=(7,0)
+   group:3 --> iv_cand:5, cost=(34,0)
+  invariant variables: 7
+  invariant expressions: 1, 10, 12
+
+Improved to:
+  cost: 32 (complexity 0)
+  reg_cost: 7
+  cand_cost: 13
+  cand_group_cost: 12 (complexity 0)
+  candidates: 4, 9
+   group:0 --> iv_cand:9, cost=(8,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:9, cost=(4,0)
+   group:3 --> iv_cand:9, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 3
+
+Initial set of candidates:
+  cost: 32 (complexity 0)
+  reg_cost: 7
+  cand_cost: 13
+  cand_group_cost: 12 (complexity 0)
+  candidates: 4, 9
+   group:0 --> iv_cand:9, cost=(8,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:9, cost=(4,0)
+   group:3 --> iv_cand:9, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 3
+
+Original cost 32 (complexity 0)
+
+Final cost 32 (complexity 0)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: j_51
+  Var after: j_30
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      i_50 + 1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37_69
+  Var after: ivtmp.37_68
+  Incr POS: before exit test
+  IV struct:
+    Type:      sizetype
+    Base:      ((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:      (sizetype) m_25(D) * 4
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;;  header 4, latch 11
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_23(D) + 4294967294
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;;  nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+  single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+      _67 = (sizetype) m_25(D);
+      _66 = _67 * 4;
+      _64 = i_50 + 1;
+      _63 = m_25(D) * _64;
+      _62 = (sizetype) _63;
+      _61 = (sizetype) i_50;
+      _60 = _61 + _62;
+      ivtmp.37_65 = _60 * 4;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        # ivtmp.37_69 = PHI <ivtmp.37_68(12), ivtmp.37_65(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        _59 = (sizetype) i_50;
+        _58 = _59 * 18446744073709551612;
+        _49 = (sizetype) l_26(D);
+        _48 = _49 * 4;
+        _47 = _48 + _58;
+        _46 = vector_27(D) + _47;
+        _44 = _46 + ivtmp.37_69;
+        # VUSE <.MEM_52>
+        t_28 = MEM[(float *)_44];
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = ivtmp.37_69 + 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        ivtmp.37_68 = ivtmp.37_69 + _66;
+        if (j_30 != n_23(D))
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _78 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_78];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _76 = (sizetype) _7;
+          _75 = _76 * 18446744073709551612;
+          _74 = _75 + ivtmp.23_85;
+          _73 = (void *) _74;
+          _72 = (sizetype) _11;
+          _71 = ivtmp.37_69;
+          _70 = _73 + _71;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_70];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _77 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_77] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+  number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:    _6
+  Type:        int
+  Base:        0
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _7
+  Type:        int
+  Base:        0
+  Step:        (int) ((unsigned int) m_25(D) + 1)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    j_24
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _46
+  Type:        float *
+  Base:        vector_27(D) + (sizetype) l_26(D) * 4
+  Step:        18446744073709551612
+  Object:      (void *) vector_27(D)
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _47
+  Type:        sizetype
+  Base:        (sizetype) l_26(D) * 4
+  Step:        18446744073709551612
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    i_50
+  Type:        int
+  Base:        0
+  Step:        1
+  Biv: Y
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _58
+  Type:        sizetype
+  Base:        0
+  Step:        18446744073709551612
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _59
+  Type:        sizetype
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _61
+  Type:        sizetype
+  Base:        0
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _63
+  Type:        int
+  Base:        m_25(D)
+  Step:        m_25(D)
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+IV struct:
+  SSA_NAME:    _64
+  Type:        int
+  Base:        1
+  Step:        1
+  Biv: N
+  Overflowness wrto loop niter:        No-overflow
+IV struct:
+  SSA_NAME:    _87
+  Type:        int
+  Base:        n_23(D) + -1
+  Step:        -1
+  Biv: N
+  Overflowness wrto loop niter:        Overflow
+
+<IV Groups>:
+Group 0:
+  Type:        COMPARE
+  Use 0.0:
+    At stmt:   if (n_23(D) > j_24)
+    At pos:    j_24
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 1:
+  Type:        COMPARE
+  Use 1.0:
+    At stmt:   if (j_24 < _45)
+    At pos:    j_24
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 2:
+  Type:        GENERIC
+  Use 2.0:
+    At stmt:   _7 = _6 + i_50;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    0
+      Step:    (int) ((unsigned int) m_25(D) + 1)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 3:
+  Type:        COMPARE
+  Use 3.0:
+    At stmt:   if (i_40 < _87)
+    At pos:    _87
+    IV struct:
+      Type:    int
+      Base:    n_23(D) + -1
+      Step:    -1
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 4:
+  Type:        GENERIC
+  Use 4.0:
+    At stmt:   j_24 = i_50 + 1;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    1
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 5:
+  Type:        GENERIC
+  Use 5.0:
+    At stmt:   _46 = vector_27(D) + _47;
+    At pos:
+    IV struct:
+      Type:    float *
+      Base:    vector_27(D) + (sizetype) l_26(D) * 4
+      Step:    18446744073709551612
+      Object:  (void *) vector_27(D)
+      Biv:     N
+      Overflowness wrto loop niter:    No-overflow
+Group 6:
+  Type:        GENERIC
+  Use 6.0:
+    At stmt:   i_50 = PHI <j_24(11), 0(10)>
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    0
+      Step:    1
+      Biv:     Y
+      Overflowness wrto loop niter:    No-overflow
+Group 7:
+  Type:        GENERIC
+  Use 7.0:
+    At stmt:   _63 = m_25(D) * _64;
+    At pos:
+    IV struct:
+      Type:    int
+      Base:    m_25(D)
+      Step:    m_25(D)
+      Biv:     N
+      Overflowness wrto loop niter:    Overflow
+Group 8:
+  Type:        GENERIC
+  Use 8.0:
+    At stmt:   _61 = (sizetype) i_50;
+    At pos:
+    IV struct:
+      Type:    sizetype
+      Base:    0
+      Step:    1
+      Biv:     N
+      Overflowness wrto loop niter:    No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.39
+  Var after: ivtmp.39
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 1:
+  Var befor: ivtmp.40
+  Var after: ivtmp.40
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 2:
+  Var befor: ivtmp.41
+  Var after: ivtmp.41
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      1
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.42
+  Var after: ivtmp.42
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      (unsigned int) m_25(D) + 1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.43
+  Var after: ivtmp.43
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (n_23(D) + -1)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 6:
+  Var befor: ivtmp.44
+  Var after: ivtmp.44
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) n_23(D)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Var befor: ivtmp.45
+  Var after: ivtmp.45
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:      18446744073709551612
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.46
+  Var after: ivtmp.46
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) m_25(D)
+    Step:      (unsigned int) m_25(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+<Important Candidates>:         0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:     0, 1, 2, 3
+  Group 1:     0, 1, 2, 3
+  Group 2:     0, 1, 2, 3, 4
+  Group 3:     0, 1, 2, 3, 5, 6
+  Group 4:     0, 1, 2, 3
+  Group 5:     0, 1, 2, 3, 7
+  Group 6:     0, 1, 2, 3
+  Group 7:     0, 1, 2, 3, 8
+  Group 8:     0, 1, 2, 3
+
+<Candidate Costs>:
+  cand cost
+  0    5
+  1    5
+  2    5
+  3    4
+  4    5
+  5    5
+  6    5
+  7    6
+  8    5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1: n_23(D)
+Inv 4: m_25(D)
+Inv 5: l_26(D)
+Inv 3: vector_27(D)
+Inv 2: _45     (eliminable)
+
+<Invariant Expressions>:
+inv_expr 1:    (unsigned int) m_25(D) + 1
+inv_expr 2:    (signed int) n_23(D) + 1
+inv_expr 3:    (signed int) n_23(D) + -1
+inv_expr 4:    (signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    4       0       NIL;    NIL;
+  1    4       0       NIL;    NIL;
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  5    4       0       NIL;    NIL;
+  6    4       0       2;      NIL;
+
+Group 1:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    NIL;
+  1    0       0       NIL;    2
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  5    0       0       NIL;    NIL;
+  6    0       0       NIL;    NIL;
+  7    3       0       NIL;    NIL;
+
+Group 2:
+  cand cost    compl.  inv.expr.       inv.vars
+  4    0       0       NIL;    NIL;
+
+Group 3:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    80      0       3;      NIL;
+  1    80      0       3;      NIL;
+  2    80      0       NIL;    NIL;
+  3    80      0       NIL;    NIL;
+  5    0       0       NIL;    NIL;
+  6    80      0       NIL;    NIL;
+
+Group 4:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    4       0       NIL;    NIL;
+  1    4       0       NIL;    NIL;
+  2    0       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  5    4       0       NIL;    NIL;
+  6    4       0       2;      NIL;
+
+Group 5:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    17      0       4;      NIL;
+  7    0       0       NIL;    NIL;
+
+Group 6:
+  cand cost    compl.  inv.expr.       inv.vars
+  0    0       0       NIL;    NIL;
+  1    0       0       NIL;    NIL;
+  2    4       0       NIL;    NIL;
+  3    0       0       NIL;    NIL;
+  5    4       0       3;      NIL;
+  6    4       0       NIL;    NIL;
+
+Group 7:
+  cand cost    compl.  inv.expr.       inv.vars
+  8    0       0       NIL;    NIL;
+
+Group 8:
+  cand cost    compl.  inv.expr.       inv.vars
+  1    0       0       NIL;    NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs  cost
+  0    0
+  1    2
+  2    4
+  3    6
+  4    8
+  5    10
+  6    12
+  7    14
+  8    16
+  9    18
+  10   20
+  11   22
+  12   24
+  13   26
+  14   28
+  15   30
+  16   32
+  17   34
+  18   36
+  19   111
+  20   116
+  21   121
+  22   126
+  23   151
+  24   176
+  25   201
+  26   226
+  27   275
+  28   324
+  29   373
+  30   422
+  31   471
+  32   520
+  33   569
+  34   618
+  35   667
+  36   716
+  37   765
+  38   814
+  39   863
+  40   912
+  41   961
+  42   1010
+  43   1059
+  44   1108
+  45   1157
+  46   1206
+  47   1255
+  48   1304
+  49   1353
+  50   1402
+  51   1451
+  52   1500
+
+Initial set of candidates:
+  cost: 126 (complexity 0)
+  reg_cost: 10
+  cand_cost: 19
+  cand_group_cost: 97 (complexity 0)
+  candidates: 1, 3, 4, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(80,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 53 (complexity 0)
+  reg_cost: 12
+  cand_cost: 24
+  cand_group_cost: 17 (complexity 0)
+  candidates: 1, 3, 4, 5, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 5, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 55 (complexity 0)
+  reg_cost: 10
+  cand_cost: 20
+  cand_group_cost: 25 (complexity 0)
+  candidates: 1, 4, 5, 8
+   group:0 --> iv_cand:5, cost=(4,0)
+   group:1 --> iv_cand:5, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:5, cost=(4,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:1, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 45 (complexity 0)
+  reg_cost: 11
+  cand_cost: 26
+  cand_group_cost: 8 (complexity 0)
+  candidates: 1, 4, 5, 7, 8
+   group:0 --> iv_cand:5, cost=(4,0)
+   group:1 --> iv_cand:5, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:5, cost=(4,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:1, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 5, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables:
+  invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+  Var befor: ivtmp.40_43
+  Var after: ivtmp.40_41
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 3:
+  Var befor: i_50
+  Var after: j_24
+  Incr POS: orig biv
+  IV struct:
+    Type:      int
+    Base:      0
+    Step:      1
+    Biv:       N
+    Overflowness wrto loop niter:      No-overflow
+Candidate 4:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.42_31
+  Var after: ivtmp.42_20
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      0
+    Step:      (unsigned int) m_25(D) + 1
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 5:
+  Var befor: ivtmp.43_17
+  Var after: ivtmp.43_16
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) (n_23(D) + -1)
+    Step:      4294967295
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 7:
+  Var befor: ivtmp.45_91
+  Var after: ivtmp.45_92
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned long
+    Base:      (unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:      18446744073709551612
+    Object:    (void *) vector_27(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+Candidate 8:
+  Var befor: ivtmp.46_97
+  Var after: ivtmp.46_98
+  Incr POS: before exit test
+  IV struct:
+    Type:      unsigned int
+    Base:      (unsigned int) m_25(D)
+    Step:      (unsigned int) m_25(D)
+    Biv:       N
+    Overflowness wrto loop niter:      Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/fp_foo.c b/gcc/testsuite/fp_foo.c
new file mode 100644
index 00000000000..f65f43d6435
--- /dev/null
+++ b/gcc/testsuite/fp_foo.c
@@ -0,0 +1,19 @@
+
+void daxpy(float *vector1, float *vector2, int n, float fp_const){
+       for (int i = 0; i < n; ++i)
+               vector1[i] += fp_const * vector2[i];
+}
+
+void dgefa(float *vector, int m, int n, int l){
+       for (int i = 0; i < n - 1; ++i){
+               for (int j = i + 1; j < n; ++j){
+                       float t = vector[m * j + l];
+                       daxpy(&vector[m * i + i + 1],
+                              &vector[m * j + i + 1], n - (i + 1), t);
+               }
+       }
+}
+
+int main(){
+  return 0;
+}
diff --git a/gcc/testsuite/test_script.sh b/gcc/testsuite/test_script.sh
new file mode 100644
index 00000000000..4f19d248efe
--- /dev/null
+++ b/gcc/testsuite/test_script.sh
@@ -0,0 +1,10 @@
+export PREFIX="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install"
+export SOURCE_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/source"
+export BUILD_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/build"
+export SYSROOT="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install/sys_root"
+export PATH=$PREFIX/bin:$PATH
+export TARGET=mips64-r6-linux-gnu
+
+
+$PREFIX/bin/mips64-r6-linux-gnu-gcc fp_foo.c -O2 >out.txt -S -o fp_foo.s -march=mips64r6 -mabi=64
+
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7cae5bdefea..2dec5001dca 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4724,7 +4724,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   rtx addr;
   bool simple_inv = true;
   tree comp_inv = NULL_TREE, type = aff_var->type;
-  comp_cost var_cost = no_cost, cost = no_cost;
+  comp_cost var_cost = no_cost, cost = no_cost, autoinc_cost = no_cost;
+  comp_cost acost = no_cost;
   struct mem_address parts = {NULL_TREE, integer_one_node,
                              NULL_TREE, NULL_TREE, NULL_TREE};
   machine_mode addr_mode = TYPE_MODE (type);
@@ -4755,38 +4756,36 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
          if (!ok_with_ratio_p)
            parts.step = NULL_TREE;
        }
-      if (ok_with_ratio_p || ok_without_ratio_p)
+      if (!(ok_with_ratio_p || ok_without_ratio_p))
+    parts.index = NULL_TREE;
+
+      if (maybe_ne (aff_inv->offset, 0))
        {
-         if (maybe_ne (aff_inv->offset, 0))
-           {
-             parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
-             /* Addressing mode "base + index [<< scale] + offset".  */
-             if (!valid_mem_ref_p (mem_mode, as, &parts, code))
-               parts.offset = NULL_TREE;
-             else
-               aff_inv->offset = 0;
-           }
+         parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
+         /* Addressing mode "base + index[<< scale] + offset".  */
+         if (!valid_mem_ref_p (mem_mode, as, &parts, code))
+           parts.offset = NULL_TREE;
+         else
+           aff_inv->offset = 0;
+       }

-         move_fixed_address_to_symbol (&parts, aff_inv);
-         /* Base is fixed address and is moved to symbol part.  */
-         if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
-           parts.base = NULL_TREE;
+      move_fixed_address_to_symbol (&parts, aff_inv);
+      /* Base is fixed address and is moved to symbol part.  */
+      if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
+    parts.base = NULL_TREE;

-         /* Addressing mode "symbol + base + index [<< scale] [+ offset]".  */
-         if (parts.symbol != NULL_TREE
-             && !valid_mem_ref_p (mem_mode, as, &parts, code))
-           {
-             aff_combination_add_elt (aff_inv, parts.symbol, 1);
-             parts.symbol = NULL_TREE;
-             /* Reset SIMPLE_INV since symbol address needs to be computed
-                outside of address expression in this case.  */
-             simple_inv = false;
-             /* Symbol part is moved back to base part, it can't be NULL.  */
-             parts.base = integer_one_node;
-           }
+      /* Addressing mode "symbol + base + index[<< scale] [+ offset]".  */
+      if (parts.symbol != NULL_TREE
+          && !valid_mem_ref_p (mem_mode, as, &parts, code))
+       {
+         aff_combination_add_elt (aff_inv, parts.symbol, 1);
+         parts.symbol = NULL_TREE;
+         /* Reset SIMPLE_INV since symbol address needs to be computed
+ outside of address expression in this case.  */
+         simple_inv = false;
+        /* Symbol part is moved back to base part, it can't be NULL.  */
+         parts.base = integer_one_node;
        }
-      else
-       parts.index = NULL_TREE;
     }
   else
     {
@@ -4799,14 +4798,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,

          if (stmt_after_increment (data->current_loop, cand, use->stmt))
            ainc_offset += ainc_step;
-         cost = get_address_cost_ainc (ainc_step, ainc_offset,
+         autoinc_cost = get_address_cost_ainc (ainc_step, ainc_offset,
                                        addr_mode, mem_mode, as, speed);
-         if (!cost.infinite_cost_p ())
-           {
-             *can_autoinc = true;
-             return cost;
-           }
-         cost = no_cost;
+         if (!autoinc_cost.infinite_cost_p ())
+           *can_autoinc = true;
+         else
+           autoinc_cost = no_cost;
        }
       if (!aff_combination_zero_p (aff_inv))
        {
@@ -4852,10 +4849,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   cost += var_cost;
   addr = addr_for_mem_ref (&parts, as, false);
   gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
-  cost += address_cost (addr, mem_mode, as, speed);
+  acost += address_cost (addr, mem_mode, as, speed);

   if (parts.symbol != NULL_TREE)
     cost.complexity += 1;
+  /* var_present.  */
+  else if (!aff_combination_const_p (aff_inv))
+    cost.complexity += 1;
   /* Don't increase the complexity of adding a scaled index if it's
      the only kind of index that the target allows.  */
   if (parts.step != NULL_TREE && ok_without_ratio_p)
@@ -4865,6 +4865,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
     cost.complexity += 1;

+  cost += (can_autoinc && *can_autoinc) ? autoinc_cost : acost;
   return cost;
 }

--
2.34.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
@ 2024-03-18 20:27 Aleksandar Rakic
  2024-04-15 13:30 ` Aleksandar Rakic
  0 siblings, 1 reply; 17+ messages in thread
From: Aleksandar Rakic @ 2024-03-18 20:27 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jovan Dmitrovic, richard.guenther, Djordje Todorovic,
	jeffreyalaw, Uros Beric

From dbf49f2872efcc14d2ea41eb7d616498dca9789f Mon Sep 17 00:00:00 2001
From: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
Date: Tue, 5 Mar 2024 11:55:01 +0100
Subject: [PATCH] ivopts: Fixed bug 109429

This patch modifies the order of the complexity calculation. By fixing the
complexities, the candidate selection is also fixed, which leads to the smaller
code size.

This patch also fixes the complexity if the variable is present in
the address expression, similarly to the variable 'var_present' in the
commit c2b64ce.

It also differentiates the adding of the autoinc_cost and the address
cost (acost) to the cost, similarly to the commit c2b64ce.

It also contains the C test and the script that generates the
assembly file and the output of the compiler. The assembly code
obtained after the modification of the file tree-ssa-loop-ivopts.cc is
smaller in size than the assembly code obtained before that. The output
of the compiler shows the difference in complexities for the function dgefa
for the loop 3 for the group 1.

This patch is available on the gcc fork on the following address:
https://github.com/rakicaleksandar1999/gcc/tree/bug_109429.

The description of the bug 109429 is on the following address:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109429.

gcc/ChangeLog:

    * tree-ssa-loop-ivopts.cc (get_address_cost): Fixed the
    complexities calculation.

gcc/testsuite/ChangeLog:

    * after.s: The assembly file obtained by compiling the fp_foo.c
    file after modification of the tree-ssa-loop-ivopts.cc file.
    * after.txt: The compiler-generated output obtained by compiling
    the fp_foo.c file after modification of the
    tree-ssa-loop-ivopts.cc file.
    * before.s: The assembly file obtained by compiling the fp_foo.c
    file before modification of the tree-ssa-loop-ivopts.cc file.
    * before.txt: The compiler-generated output obtained by compiling
    the fp_foo.c file before modification of the
    tree-ssa-loop-ivopts.cc file.
    * fp_foo.c: The C test.
    * test_script.sh: The script used for compiling the fp_foo.c file.

Signed-off-by: Aleksandar Rakić <Aleksandar.Rakic@Syrmia.com>
---
 gcc/testsuite/after.s        |  148 ++
 gcc/testsuite/after.txt      | 2792 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/before.s       |  152 ++
 gcc/testsuite/before.txt     | 2694 ++++++++++++++++++++++++++++++++
 gcc/testsuite/fp_foo.c       |   19 +
 gcc/testsuite/test_script.sh |   10 +
 gcc/tree-ssa-loop-ivopts.cc  |   75 +-
 7 files changed, 5853 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/after.s
 create mode 100644 gcc/testsuite/after.txt
 create mode 100644 gcc/testsuite/before.s
 create mode 100644 gcc/testsuite/before.txt
 create mode 100644 gcc/testsuite/fp_foo.c
 create mode 100644 gcc/testsuite/test_script.sh

diff --git a/gcc/testsuite/after.s b/gcc/testsuite/after.s
new file mode 100644
index 00000000000..a32bb8b3614
--- /dev/null
+++ b/gcc/testsuite/after.s
@@ -0,0 +1,148 @@
+	.file	1 "fp_foo.c"
+	.section .mdebug.abi64
+	.previous
+	.nan	2008
+	.module	fp=64
+	.module	oddspreg
+	.module	arch=mips64r6
+	.abicalls
+	.text
+	.align	2
+	.align	3
+	.globl	daxpy
+	.set	nomips16
+	.set	nomicromips
+	.ent	daxpy
+	.type	daxpy, @function
+daxpy:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	blezc	$6,.L7
+	dlsa	$6,$6,$4,2
+	.align	3
+.L3:
+	lwc1	$f1,0($5)
+	daddiu	$4,$4,4
+	lwc1	$f0,-4($4)
+	daddiu	$5,$5,4
+	maddf.s	$f0,$f1,$f15
+	bne	$4,$6,.L3
+	swc1	$f0,-4($4)
+
+.L7:
+	jrc	$31
+	.set	macro
+	.set	reorder
+	.end	daxpy
+	.size	daxpy, .-daxpy
+	.align	2
+	.align	3
+	.globl	dgefa
+	.set	nomips16
+	.set	nomicromips
+	.ent	dgefa
+	.type	dgefa, @function
+dgefa:
+	.frame	$sp,48,$31		# vars= 0, regs= 5/0, args= 0, gp= 0
+	.mask	0x100f0000,-8
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	li	$2,1			# 0x1
+	bgec	$2,$6,.L23
+	daddiu	$sp,$sp,-48
+	addiu	$14,$6,-1
+	move	$10,$6
+	sd	$19,32($sp)
+	sd	$18,24($sp)
+	move	$11,$4
+	sd	$17,16($sp)
+	move	$17,$5
+	sd	$16,8($sp)
+	dlsa	$9,$7,$4,2
+	addiu	$19,$5,1
+	dsll	$12,$5,2
+	move	$25,$5
+	move	$24,$0
+	move	$13,$0
+	move	$15,$0
+	move	$18,$14
+	.align	3
+.L11:
+	addiu	$7,$15,1
+	addiu	$16,$15,1
+	daddiu	$13,$13,1
+	move	$15,$7
+	bgec	$7,$10,.L15
+	daddiu	$8,$24,1
+	daddu	$6,$13,$25
+	dlsa	$8,$8,$11,2
+	dsll	$6,$6,2
+	move	$5,$14
+	.align	3
+.L14:
+	daddu	$2,$9,$6
+	daddu	$4,$11,$6
+	lwc1	$f2,-4($2)
+	move	$3,$0
+	move	$2,$8
+	.align	3
+.L13:
+	lwc1	$f1,0($4)
+	daddiu	$2,$2,4
+	lwc1	$f0,-4($2)
+	addiu	$3,$3,1
+	daddiu	$4,$4,4
+	maddf.s	$f0,$f2,$f1
+	swc1	$f0,-4($2)
+	bltc	$3,$5,.L13
+	addiu	$7,$7,1
+	bne	$10,$7,.L14
+	daddu	$6,$6,$12
+
+.L15:
+	addiu	$14,$14,-1
+	daddiu	$9,$9,-4
+	addu	$24,$24,$19
+	bne	$18,$16,.L11
+	addu	$25,$17,$25
+
+	ld	$19,32($sp)
+	ld	$18,24($sp)
+	ld	$17,16($sp)
+	ld	$16,8($sp)
+	jr	$31
+	daddiu	$sp,$sp,48
+
+.L23:
+	jrc	$31
+	.set	macro
+	.set	reorder
+	.end	dgefa
+	.size	dgefa, .-dgefa
+	.section	.text.startup,"ax",@progbits
+	.align	2
+	.align	3
+	.globl	main
+	.set	nomips16
+	.set	nomicromips
+	.ent	main
+	.type	main, @function
+main:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	jr	$31
+	move	$2,$0
+
+	.set	macro
+	.set	reorder
+	.end	main
+	.size	main, .-main
+	.ident	"GCC: (GNU) 14.0.1 20240214 (experimental)"
+	.section	.note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/after.txt b/gcc/testsuite/after.txt
new file mode 100644
index 00000000000..772f92d2b20
--- /dev/null
+++ b/gcc/testsuite/after.txt
@@ -0,0 +1,2792 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;;  header 3, latch 6
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_12(D) + 4294967295
+;;  upper_bound 2147483646
+;;  likely_upper_bound 2147483646
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;;  nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+  single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+  {
+    <bb 2> [local count: 118111600]:
+    if (n_12(D) > 0)
+      goto <bb 5>; [89.00%]
+    else
+      goto <bb 4>; [11.00%]
+
+  }
+  bb_5 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 5> [local count: 105119324]:
+
+  }
+  bb_7 (preds = {bb_3 }, succs = {bb_4 })
+  {
+    <bb 7> [local count: 105119324]:
+    # .MEM_22 = PHI <.MEM_16(3)>
+
+  }
+  bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+  {
+    <bb 4> [local count: 118111600]:
+    # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+    # VUSE <.MEM_29>
+    return;
+
+  }
+  loop_1 (header = 3, latch = 6, finite_p
+  niter (unsigned int) n_12(D) + 4294967295
+  upper_bound 2147483646
+  likely_upper_bound 2147483646
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+  {
+    bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+    {
+      <bb 3> [local count: 955630224]:
+      # i_20 = PHI <i_17(6), 0(5)>
+      # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+      _1 = (long unsigned int) i_20;
+      _2 = _1 * 4;
+      _3 = vector1_13(D) + _2;
+      # VUSE <.MEM_21>
+      _4 = *_3;
+      _5 = vector2_14(D) + _2;
+      # VUSE <.MEM_21>
+      _6 = *_5;
+      _7 = _6 * fp_const_15(D);
+      _8 = _4 + _7;
+      # .MEM_16 = VDEF <.MEM_21>
+      *_3 = _8;
+      i_17 = i_20 + 1;
+      if (n_12(D) > i_17)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 7>; [11.00%]
+
+    }
+    bb_6 (preds = {bb_3 }, succs = {bb_3 })
+    {
+      <bb 6> [local count: 850510900]:
+      goto <bb 3>; [100.00%]
+
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_12(D)
+  bounds on difference of bases: 0 ... 2147483646
+  result:
+    # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_1
+  Type:	long unsigned int
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_2
+  Type:	long unsigned int
+  Base:	0
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_3
+  Type:	float *
+  Base:	vector1_13(D)
+  Step:	4
+  Object:	(void *) vector1_13(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_5
+  Type:	float *
+  Base:	vector2_14(D)
+  Step:	4
+  Object:	(void *) vector2_14(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_17
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_20
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	_4 = *_3;
+    At pos:	*_3
+    IV struct:
+      Type:	float *
+      Base:	vector1_13(D)
+      Step:	4
+      Object:	(void *) vector1_13(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+  Use 0.1:
+    At stmt:	*_3 = _8;
+    At pos:	*_3
+    IV struct:
+      Type:	float *
+      Base:	vector1_13(D)
+      Step:	4
+      Object:	(void *) vector1_13(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:	_6 = *_5;
+    At pos:	*_5
+    IV struct:
+      Type:	float *
+      Base:	vector2_14(D)
+      Step:	4
+      Object:	(void *) vector2_14(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 2:
+  Type:	COMPARE
+  Use 2.0:
+    At stmt:	if (n_12(D) > i_17)
+    At pos:	i_17
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.6
+  Var after: ivtmp.6
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.7
+  Var after: ivtmp.7
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.8
+  Var after: ivtmp.8
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.9
+  Var after: ivtmp.9
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector1_13(D)
+    Step:	4
+    Object:	(void *) vector1_13(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.10
+  Var after: ivtmp.10
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector2_14(D)
+    Step:	4
+    Object:	(void *) vector2_14(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.11
+  Var after: ivtmp.11
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 7:
+  Var befor: ivtmp.12
+  Var after: ivtmp.12
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	4
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 7
+  Group 1:	0, 1, 2, 3, 5, 7
+  Group 2:	0, 1, 2, 3, 6
+
+<Candidate Costs>:
+  cand	cost
+force_expr_to_var_cost size costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+force_expr_to_var_cost speed costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+  0	5
+  1	5
+  2	5
+  3	4
+  4	5
+  5	5
+  6	5
+  7	5
+
+
+<Invariant Vars>:
+Inv 4:	n_12(D)	(eliminable)
+Inv 1:	vector1_13(D)	(eliminable)
+Inv 2:	vector2_14(D)	(eliminable)
+Inv 3:	fp_const_15(D)	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	(unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 2: 	(unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	18	2	NIL;	1
+  2	18	4	NIL;	1
+  4	2	0	NIL;	NIL;
+  7	10	2	NIL;	1
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	9	1	NIL;	2
+  2	9	2	NIL;	2
+  5	1	0	NIL;	NIL;
+  7	5	1	NIL;	2
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	4
+  1	0	0	NIL;	4
+  2	1	0	NIL;	4
+  3	0	0	NIL;	4
+  4	1	0	1;	NIL;
+  5	1	0	2;	NIL;
+  6	0	0	NIL;	4
+  7	1	0	NIL;	4
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	38
+  20	40
+  21	42
+  22	44
+  23	115
+  24	120
+  25	125
+  26	130
+  27	179
+  28	228
+  29	277
+  30	326
+  31	375
+  32	424
+  33	473
+  34	522
+  35	571
+  36	620
+  37	669
+  38	718
+  39	767
+  40	816
+  41	865
+  42	914
+  43	963
+  44	1012
+  45	1061
+  46	1110
+  47	1159
+  48	1208
+  49	1257
+  50	1306
+  51	1355
+  52	1404
+
+Initial set of candidates:
+  cost: 37 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 27 (complexity 3)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(18,2)
+   group:1 --> iv_cand:1, cost=(9,1)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 26 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 3)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,2)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 24 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 1)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 26 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 3)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,2)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 24 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 1)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,1)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: ivtmp.9_28
+  Var after: ivtmp.9_27
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector1_13(D)
+    Step:	4
+    Object:	(void *) vector1_13(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.10_25
+  Var after: ivtmp.10_24
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector2_14(D)
+    Step:	4
+    Object:	(void *) vector2_14(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;;  header 8, latch 13
+;;  depth 3, outer 2, finite_p
+;;  niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;;  nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+  single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          # VUSE <.MEM_57>
+          _35 = *_34;
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          # VUSE <.MEM_57>
+          _37 = *_36;
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          # .MEM_42 = VDEF <.MEM_57>
+          *_34 = _39;
+          i_40 = i_56 + 1;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 3
+  exit condition [1, + , 1](no_overflow) < _87
+  bounds on difference of bases: -2147483649 ... 2147483646
+  result:
+    zero if _87 <= 0
+    # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_21
+  Type:	sizetype
+  Base:	((sizetype) _7 + 1) * 4
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_29
+  Type:	sizetype
+  Base:	((sizetype) _11 + 1) * 4
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_32
+  Type:	long unsigned int
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_33
+  Type:	long unsigned int
+  Base:	0
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_34
+  Type:	float *
+  Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+  Step:	4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_36
+  Type:	float *
+  Base:	vector_27(D) + ((sizetype) _11 + 1) * 4
+  Step:	4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_40
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_56
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	_35 = *_34;
+    At pos:	*_34
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+  Use 0.1:
+    At stmt:	*_34 = _39;
+    At pos:	*_34
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:	_37 = *_36;
+    At pos:	*_36
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _11 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 2:
+  Type:	COMPARE
+  Use 2.0:
+    At stmt:	if (i_40 < _87)
+    At pos:	i_40
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.20
+  Var after: ivtmp.20
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.21
+  Var after: ivtmp.21
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.22
+  Var after: ivtmp.22
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.23
+  Var after: ivtmp.23
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.24
+  Var after: ivtmp.24
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.25
+  Var after: ivtmp.25
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Var befor: ivtmp.26
+  Var after: ivtmp.26
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.27
+  Var after: ivtmp.27
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 9:
+  Var befor: ivtmp.28
+  Var after: ivtmp.28
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	4
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 5, 9
+  Group 1:	0, 1, 2, 3, 6, 7, 9
+  Group 2:	0, 1, 2, 3, 8
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	5
+  3	4
+  4	6
+  5	6
+  6	6
+  7	6
+  8	5
+  9	5
+
+
+<Invariant Vars>:
+Inv 6:	_7	(eliminable)
+Inv 1:	_10	(eliminable)
+Inv 7:	_11	(eliminable)
+Inv 3:	_14	(eliminable)
+Inv 2:	vector_27(D)	(eliminable)
+Inv 4:	t_28	(eliminable)
+Inv 5:	_87	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	(unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 2: 	((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 3: 	(unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4
+inv_expr 4: 	(unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 5: 	((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 6: 	(unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	22	4	1;	NIL;
+  2	22	2	1;	NIL;
+  4	2	0	NIL;	NIL;
+  5	2	2	NIL;	NIL;
+  6	16	2	2;	NIL;
+  7	16	4	3;	NIL;
+  9	14	4	1;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	11	2	4;	NIL;
+  2	11	1	4;	NIL;
+  4	8	1	5;	NIL;
+  5	8	2	6;	NIL;
+  6	1	0	NIL;	NIL;
+  7	1	1	NIL;	NIL;
+  9	7	2	4;	NIL;
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	5
+  1	0	0	NIL;	5
+  2	4	0	NIL;	5
+  3	0	0	NIL;	5
+  8	4	0	NIL;	5
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	38
+  20	40
+  21	42
+  22	44
+  23	115
+  24	120
+  25	125
+  26	130
+  27	179
+  28	228
+  29	277
+  30	326
+  31	375
+  32	424
+  33	473
+  34	522
+  35	571
+  36	620
+  37	669
+  38	718
+  39	767
+  40	816
+  41	865
+  42	914
+  43	963
+  44	1012
+  45	1061
+  46	1110
+  47	1159
+  48	1208
+  49	1257
+  50	1306
+  51	1355
+  52	1404
+
+Initial set of candidates:
+  cost: 47 (complexity 3)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 37 (complexity 3)
+  candidates: 2
+   group:0 --> iv_cand:2, cost=(22,2)
+   group:1 --> iv_cand:2, cost=(11,1)
+   group:2 --> iv_cand:2, cost=(4,0)
+  invariant variables: 5
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 31 (complexity 1)
+  reg_cost: 6
+  cand_cost: 11
+  cand_group_cost: 14 (complexity 1)
+  candidates: 2, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:2, cost=(4,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 1)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 7
+  cand_cost: 16
+  cand_group_cost: 3 (complexity 0)
+  candidates: 3, 4, 6
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:6, cost=(1,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 
+
+Initial set of candidates:
+  cost: 37 (complexity 6)
+  reg_cost: 7
+  cand_cost: 9
+  cand_group_cost: 21 (complexity 6)
+  candidates: 3, 9
+   group:0 --> iv_cand:9, cost=(14,4)
+   group:1 --> iv_cand:9, cost=(7,2)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 26 (complexity 1)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 1)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,1)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 7
+  cand_cost: 16
+  cand_group_cost: 3 (complexity 0)
+  candidates: 3, 4, 6
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:6, cost=(1,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 3 IVs:
+Candidate 3:
+  Var befor: i_56
+  Var after: i_40
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.23_85
+  Var after: ivtmp.23_84
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.25_78
+  Var after: ivtmp.25_77
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+;;
+;; Loop 2
+;;  header 7, latch 12
+;;  depth 2, outer 1, finite_p
+;;  niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;;  nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+  single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+        _75 = (sizetype) _11;
+        _74 = _75 + 1;
+        _73 = _74 * 4;
+        _72 = vector_27(D) + _73;
+        ivtmp.25_76 = (unsigned long) _72;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _71 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_71];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _69 = (void *) ivtmp.25_78;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_69];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _70 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_70] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          ivtmp.25_77 = ivtmp.25_78 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 2
+  exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+  number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_1
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_2
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D) + l_26(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_3
+  Type:	long unsigned int
+  Base:	(long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+  Step:	(long unsigned int) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_4
+  Type:	long unsigned int
+  Base:	((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:	(long unsigned int) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_5
+  Type:	float *
+  Base:	vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:	(long unsigned int) m_25(D) * 4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_11
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D) + i_50
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_12
+  Type:	sizetype
+  Base:	(sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_13
+  Type:	sizetype
+  Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_14
+  Type:	sizetype
+  Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:	(sizetype) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	j_30
+  Type:	int
+  Base:	i_50 + 2
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	j_51
+  Type:	int
+  Base:	i_50 + 1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_72
+  Type:	float *
+  Base:	vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:	(sizetype) m_25(D) * 4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_73
+  Type:	sizetype
+  Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:	(sizetype) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_74
+  Type:	sizetype
+  Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_75
+  Type:	sizetype
+  Base:	(sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	ivtmp.25_76
+  Type:	unsigned long
+  Base:	(unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+  Step:	(sizetype) m_25(D) * 4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	t_28 = *_5;
+    At pos:	*_5
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+      Step:	(long unsigned int) m_25(D) * 4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	COMPARE
+  Use 1.0:
+    At stmt:	if (n_23(D) > j_30)
+    At pos:	j_30
+    IV struct:
+      Type:	int
+      Base:	i_50 + 2
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 2:
+  Type:	GENERIC
+  Use 2.0:
+    At stmt:	ivtmp.25_76 = (unsigned long) _72;
+    At pos:	
+    IV struct:
+      Type:	unsigned long
+      Base:	(unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+      Step:	(sizetype) m_25(D) * 4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 3:
+  Type:	GENERIC
+  Use 3.0:
+    At stmt:	_14 = _13 * 4;
+    At pos:	
+    IV struct:
+      Type:	sizetype
+      Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+      Step:	(sizetype) m_25(D) * 4
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.29
+  Var after: ivtmp.29
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.30
+  Var after: ivtmp.30
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.31
+  Var after: ivtmp.31
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(sizetype) (i_50 + 2)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Var befor: ivtmp.32
+  Var after: ivtmp.32
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(sizetype) (i_50 + 1)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	i_50 + 1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.33
+  Var after: ivtmp.33
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+    Step:	(unsigned long) ((long unsigned int) m_25(D) * 4)
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.34
+  Var after: ivtmp.34
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (i_50 + 2)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 7:
+  Var befor: ivtmp.35
+  Var after: ivtmp.35
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) i_50
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 8:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.36
+  Var after: ivtmp.36
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4)
+    Step:	(sizetype) m_25(D) * 4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37
+  Var after: ivtmp.37
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4) + (unsigned long) vector_27(D)
+    Step:	(sizetype) m_25(D) * 4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38
+  Var after: ivtmp.38
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 11:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.39
+  Var after: ivtmp.39
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 12:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.40
+  Var after: ivtmp.40
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	(long unsigned int) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+<Important Candidates>:	 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 5, 12
+  Group 1:	0, 1, 2, 3, 4, 6, 7
+  Group 2:	0, 1, 2, 3, 4, 8, 9, 10, 11, 12
+  Group 3:	0, 1, 2, 3, 4, 10, 11, 12
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	6
+  3	6
+  4	4
+  5	9
+  6	5
+  7	5
+  8	10
+  9	9
+  10	10
+  11	9
+  12	5
+
+
+<Invariant Vars>:
+Inv 6:	_7
+Inv 8:	_10
+Inv 7:	n_23(D)	(eliminable)
+Inv 1:	j_24	(eliminable)
+Inv 2:	m_25(D)	(eliminable)
+Inv 3:	l_26(D)	(eliminable)
+Inv 4:	vector_27(D)
+Inv 5:	i_50	(eliminable)
+Inv 9:	_87
+
+<Invariant Expressions>:
+inv_expr 1: 	(long unsigned int) m_25(D) * 4
+inv_expr 2: 	((unsigned long) l_26(D) - (unsigned long) i_50) * 4
+inv_expr 3: 	(unsigned long) i_50 * 18446744073709551612 + (unsigned long) l_26(D) * 4
+inv_expr 4: 	((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 5: 	((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 6: 	((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 7: 	(signed int) i_50 + 1
+inv_expr 8: 	(unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 9: 	((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 10: 	((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 11: 	(((signed long) i_50 - (signed long) l_26(D)) + 1) * 4
+inv_expr 12: 	(signed long) vector_27(D) + 4
+inv_expr 13: 	(((signed long) ((i_50 + 1) * m_25(D)) * 4 + (signed long) vector_27(D)) + (signed long) i_50 * 4) + 4
+inv_expr 14: 	(((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 15: 	4 - (signed long) vector_27(D)
+inv_expr 16: 	(((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	1	0	NIL;	NIL;
+  8	8	2	2;	NIL;
+  9	8	1	3;	NIL;
+  10	8	2	4;	NIL;
+  11	8	1	4;	NIL;
+  12	10	1	5;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	6;	NIL;
+  1	2	0	8;	NIL;
+  2	3	0	9;	NIL;
+  3	0	0	NIL;	7
+  4	0	0	NIL;	7
+  6	0	0	NIL;	7
+  7	0	0	NIL;	7
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	6	0	11;	NIL;
+  8	0	0	NIL;	NIL;
+  9	4	0	NIL;	NIL;
+  10	4	0	NIL;	NIL;
+  11	4	0	12;	NIL;
+  12	9	0	13;	NIL;
+
+Group 3:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	7	0	14;	NIL;
+  8	8	0	NIL;	NIL;
+  9	4	0	15;	NIL;
+  10	0	0	NIL;	NIL;
+  11	4	0	NIL;	NIL;
+  12	9	0	16;	NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	111
+  20	116
+  21	121
+  22	126
+  23	151
+  24	176
+  25	201
+  26	226
+  27	275
+  28	324
+  29	373
+  30	422
+  31	471
+  32	520
+  33	569
+  34	618
+  35	667
+  36	716
+  37	765
+  38	814
+  39	863
+  40	912
+  41	961
+  42	1010
+  43	1059
+  44	1108
+  45	1157
+  46	1206
+  47	1255
+  48	1304
+  49	1353
+  50	1402
+  51	1451
+  52	1500
+
+Initial set of candidates:
+  cost: 35 (complexity 0)
+  reg_cost: 8
+  cand_cost: 13
+  cand_group_cost: 14 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:5, cost=(1,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:5, cost=(6,0)
+   group:3 --> iv_cand:5, cost=(7,0)
+  invariant variables: 7
+  invariant expressions: 1, 11, 14
+
+Improved to:
+  cost: 33 (complexity 2)
+  reg_cost: 7
+  cand_cost: 14
+  cand_group_cost: 12 (complexity 2)
+  candidates: 4, 10
+   group:0 --> iv_cand:10, cost=(8,2)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:10, cost=(4,0)
+   group:3 --> iv_cand:10, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 4
+
+Initial set of candidates:
+  cost: 33 (complexity 2)
+  reg_cost: 7
+  cand_cost: 14
+  cand_group_cost: 12 (complexity 2)
+  candidates: 4, 10
+   group:0 --> iv_cand:10, cost=(8,2)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:10, cost=(4,0)
+   group:3 --> iv_cand:10, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 4
+
+Original cost 33 (complexity 2)
+
+Final cost 33 (complexity 2)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: j_51
+  Var after: j_30
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	i_50 + 1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38_68
+  Var after: ivtmp.38_67
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;;  header 4, latch 11
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_23(D) + 4294967294
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;;  nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+  single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+      _66 = (sizetype) m_25(D);
+      _65 = _66 * 4;
+      _63 = i_50 + 1;
+      _62 = m_25(D) * _63;
+      _61 = (sizetype) _62;
+      _60 = (sizetype) i_50;
+      _59 = _60 + _61;
+      _58 = _59 + 1;
+      ivtmp.38_64 = _58 * 4;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        # ivtmp.38_68 = PHI <ivtmp.38_67(12), ivtmp.38_64(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        _49 = (sizetype) i_50;
+        _48 = _49 * 18446744073709551612;
+        _47 = (sizetype) l_26(D);
+        _46 = _47 * 4;
+        _44 = _46 + _48;
+        _43 = vector_27(D) + _44;
+        _41 = _43 + 18446744073709551612;
+        _31 = _43 + ivtmp.38_68;
+        # VUSE <.MEM_52>
+        t_28 = MEM[(float *)_31 + -4B];
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = ivtmp.38_68;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+        _75 = (sizetype) _11;
+        _74 = _75 + 1;
+        _73 = _74 * 4;
+        _72 = vector_27(D) + _73;
+        _20 = (unsigned long) vector_27(D);
+        _19 = _20 + ivtmp.38_68;
+        ivtmp.25_76 = _19;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        ivtmp.38_67 = ivtmp.38_68 + _65;
+        if (j_30 != n_23(D))
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          # ivtmp.25_78 = PHI <ivtmp.25_77(13), ivtmp.25_76(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _71 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_71];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _69 = (void *) ivtmp.25_78;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_69];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _70 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_70] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          ivtmp.25_77 = ivtmp.25_78 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+  number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_6
+  Type:	int
+  Base:	0
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_7
+  Type:	int
+  Base:	0
+  Step:	(int) ((unsigned int) m_25(D) + 1)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	j_24
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_41
+  Type:	float *
+  Base:	vector_27(D) + ((sizetype) l_26(D) * 4 + 18446744073709551612)
+  Step:	18446744073709551612
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_43
+  Type:	float *
+  Base:	vector_27(D) + (sizetype) l_26(D) * 4
+  Step:	18446744073709551612
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_44
+  Type:	sizetype
+  Base:	(sizetype) l_26(D) * 4
+  Step:	18446744073709551612
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_48
+  Type:	sizetype
+  Base:	0
+  Step:	18446744073709551612
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_49
+  Type:	sizetype
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_50
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_60
+  Type:	sizetype
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_62
+  Type:	int
+  Base:	m_25(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_63
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_87
+  Type:	int
+  Base:	n_23(D) + -1
+  Step:	-1
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+
+<IV Groups>:
+Group 0:
+  Type:	COMPARE
+  Use 0.0:
+    At stmt:	if (n_23(D) > j_24)
+    At pos:	j_24
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 1:
+  Type:	COMPARE
+  Use 1.0:
+    At stmt:	if (j_24 < _45)
+    At pos:	j_24
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 2:
+  Type:	COMPARE
+  Use 2.0:
+    At stmt:	if (i_40 < _87)
+    At pos:	_87
+    IV struct:
+      Type:	int
+      Base:	n_23(D) + -1
+      Step:	-1
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 3:
+  Type:	GENERIC
+  Use 3.0:
+    At stmt:	j_24 = i_50 + 1;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 4:
+  Type:	GENERIC
+  Use 4.0:
+    At stmt:	_43 = vector_27(D) + _44;
+    At pos:	
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + (sizetype) l_26(D) * 4
+      Step:	18446744073709551612
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	No-overflow
+Group 5:
+  Type:	GENERIC
+  Use 5.0:
+    At stmt:	i_50 = PHI <j_24(11), 0(10)>
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	0
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 6:
+  Type:	GENERIC
+  Use 6.0:
+    At stmt:	_7 = _6 + i_50;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	0
+      Step:	(int) ((unsigned int) m_25(D) + 1)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 7:
+  Type:	GENERIC
+  Use 7.0:
+    At stmt:	_62 = m_25(D) * _63;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	m_25(D)
+      Step:	m_25(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 8:
+  Type:	GENERIC
+  Use 8.0:
+    At stmt:	_60 = (sizetype) i_50;
+    At pos:	
+    IV struct:
+      Type:	sizetype
+      Base:	0
+      Step:	1
+      Biv:	N
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.41
+  Var after: ivtmp.41
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.42
+  Var after: ivtmp.42
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.43
+  Var after: ivtmp.43
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.44
+  Var after: ivtmp.44
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (n_23(D) + -1)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.45
+  Var after: ivtmp.45
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) n_23(D)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.46
+  Var after: ivtmp.46
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:	18446744073709551612
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.47
+  Var after: ivtmp.47
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	(unsigned int) m_25(D) + 1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.48
+  Var after: ivtmp.48
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) m_25(D)
+    Step:	(unsigned int) m_25(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3
+  Group 1:	0, 1, 2, 3
+  Group 2:	0, 1, 2, 3, 4, 5
+  Group 3:	0, 1, 2, 3
+  Group 4:	0, 1, 2, 3, 6
+  Group 5:	0, 1, 2, 3
+  Group 6:	0, 1, 2, 3, 7
+  Group 7:	0, 1, 2, 3, 8
+  Group 8:	0, 1, 2, 3
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	5
+  3	4
+  4	5
+  5	5
+  6	6
+  7	5
+  8	5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1:	n_23(D)
+Inv 4:	m_25(D)
+Inv 5:	l_26(D)
+Inv 3:	vector_27(D)
+Inv 2:	_45	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	(unsigned int) m_25(D) + 1
+inv_expr 2: 	(signed int) n_23(D) + 1
+inv_expr 3: 	(signed int) n_23(D) + -1
+inv_expr 4: 	(signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	4	0	NIL;	NIL;
+  1	4	0	NIL;	NIL;
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  4	4	0	NIL;	NIL;
+  5	4	0	2;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	NIL;
+  1	0	0	NIL;	2
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  4	0	0	NIL;	NIL;
+  5	0	0	NIL;	NIL;
+  6	3	0	NIL;	NIL;
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	80	0	3;	NIL;
+  1	80	0	3;	NIL;
+  2	80	0	NIL;	NIL;
+  3	80	0	NIL;	NIL;
+  4	0	0	NIL;	NIL;
+  5	80	0	NIL;	NIL;
+
+Group 3:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	4	0	NIL;	NIL;
+  1	4	0	NIL;	NIL;
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  4	4	0	NIL;	NIL;
+  5	4	0	2;	NIL;
+
+Group 4:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	17	0	4;	NIL;
+  6	0	0	NIL;	NIL;
+
+Group 5:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	NIL;
+  1	0	0	NIL;	NIL;
+  2	4	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  4	4	0	3;	NIL;
+  5	4	0	NIL;	NIL;
+
+Group 6:
+  cand	cost	compl.	inv.expr.	inv.vars
+  7	0	0	NIL;	NIL;
+
+Group 7:
+  cand	cost	compl.	inv.expr.	inv.vars
+  8	0	0	NIL;	NIL;
+
+Group 8:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	0	0	NIL;	NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	111
+  20	116
+  21	121
+  22	126
+  23	151
+  24	176
+  25	201
+  26	226
+  27	275
+  28	324
+  29	373
+  30	422
+  31	471
+  32	520
+  33	569
+  34	618
+  35	667
+  36	716
+  37	765
+  38	814
+  39	863
+  40	912
+  41	961
+  42	1010
+  43	1059
+  44	1108
+  45	1157
+  46	1206
+  47	1255
+  48	1304
+  49	1353
+  50	1402
+  51	1451
+  52	1500
+
+Initial set of candidates:
+  cost: 126 (complexity 0)
+  reg_cost: 10
+  cand_cost: 19
+  cand_group_cost: 97 (complexity 0)
+  candidates: 1, 3, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:3, cost=(80,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 53 (complexity 0)
+  reg_cost: 12
+  cand_cost: 24
+  cand_group_cost: 17 (complexity 0)
+  candidates: 1, 3, 4, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 6, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 55 (complexity 0)
+  reg_cost: 10
+  cand_cost: 20
+  cand_group_cost: 25 (complexity 0)
+  candidates: 1, 4, 7, 8
+   group:0 --> iv_cand:4, cost=(4,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:4, cost=(4,0)
+   group:4 --> iv_cand:1, cost=(17,0)
+   group:5 --> iv_cand:1, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 45 (complexity 0)
+  reg_cost: 11
+  cand_cost: 26
+  cand_group_cost: 8 (complexity 0)
+  candidates: 1, 4, 6, 7, 8
+   group:0 --> iv_cand:4, cost=(4,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:4, cost=(4,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 6, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(0,0)
+   group:4 --> iv_cand:6, cost=(0,0)
+   group:5 --> iv_cand:3, cost=(0,0)
+   group:6 --> iv_cand:7, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+  Var befor: ivtmp.42_18
+  Var after: ivtmp.42_17
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Var befor: i_50
+  Var after: j_24
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.44_16
+  Var after: ivtmp.44_15
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (n_23(D) + -1)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.46_92
+  Var after: ivtmp.46_93
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:	18446744073709551612
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.47_98
+  Var after: ivtmp.47_99
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	(unsigned int) m_25(D) + 1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.48_102
+  Var after: ivtmp.48_103
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) m_25(D)
+    Step:	(unsigned int) m_25(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/before.s b/gcc/testsuite/before.s
new file mode 100644
index 00000000000..e13834bdf59
--- /dev/null
+++ b/gcc/testsuite/before.s
@@ -0,0 +1,152 @@
+	.file	1 "fp_foo.c"
+	.section .mdebug.abi64
+	.previous
+	.nan	2008
+	.module	fp=64
+	.module	oddspreg
+	.module	arch=mips64r6
+	.abicalls
+	.text
+	.align	2
+	.align	3
+	.globl	daxpy
+	.set	nomips16
+	.set	nomicromips
+	.ent	daxpy
+	.type	daxpy, @function
+daxpy:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	blezc	$6,.L7
+	dlsa	$6,$6,$4,2
+	.align	3
+.L3:
+	lwc1	$f1,0($5)
+	daddiu	$4,$4,4
+	lwc1	$f0,-4($4)
+	daddiu	$5,$5,4
+	maddf.s	$f0,$f1,$f15
+	bne	$4,$6,.L3
+	swc1	$f0,-4($4)
+
+.L7:
+	jrc	$31
+	.set	macro
+	.set	reorder
+	.end	daxpy
+	.size	daxpy, .-daxpy
+	.align	2
+	.align	3
+	.globl	dgefa
+	.set	nomips16
+	.set	nomicromips
+	.ent	dgefa
+	.type	dgefa, @function
+dgefa:
+	.frame	$sp,48,$31		# vars= 0, regs= 6/0, args= 0, gp= 0
+	.mask	0x101f0000,-8
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	li	$2,1			# 0x1
+	bgec	$2,$6,.L23
+	daddiu	$sp,$sp,-48
+	addiu	$14,$6,-1
+	move	$11,$6
+	sd	$20,32($sp)
+	sd	$19,24($sp)
+	addiu	$20,$5,1
+	sd	$18,16($sp)
+	move	$18,$4
+	sd	$17,8($sp)
+	dlsa	$10,$7,$4,2
+	sd	$16,0($sp)
+	move	$17,$5
+	dsll	$12,$5,2
+	move	$25,$5
+	move	$13,$0
+	move	$24,$0
+	move	$15,$0
+	move	$19,$14
+	.align	3
+.L11:
+	addiu	$8,$15,1
+	addiu	$16,$15,1
+	move	$15,$8
+	bgec	$8,$11,.L15
+	daddu	$5,$25,$24
+	daddiu	$9,$13,1
+	dsubu	$6,$0,$13
+	dsll	$5,$5,2
+	dlsa	$9,$9,$18,2
+	dsll	$6,$6,2
+	move	$7,$14
+	.align	3
+.L14:
+	daddu	$3,$10,$5
+	move	$2,$9
+	lwc1	$f2,0($3)
+	move	$4,$0
+	.align	3
+.L13:
+	daddu	$3,$6,$2
+	lwc1	$f0,0($2)
+	daddu	$3,$3,$5
+	daddiu	$2,$2,4
+	lwc1	$f1,0($3)
+	addiu	$4,$4,1
+	maddf.s	$f0,$f2,$f1
+	swc1	$f0,-4($2)
+	bltc	$4,$7,.L13
+	addiu	$8,$8,1
+	bne	$11,$8,.L14
+	daddu	$5,$5,$12
+
+.L15:
+	daddiu	$24,$24,1
+	addu	$13,$20,$13
+	addiu	$14,$14,-1
+	daddiu	$10,$10,-4
+	bne	$19,$16,.L11
+	addu	$25,$17,$25
+
+	ld	$20,32($sp)
+	ld	$19,24($sp)
+	ld	$18,16($sp)
+	ld	$17,8($sp)
+	ld	$16,0($sp)
+	jr	$31
+	daddiu	$sp,$sp,48
+
+.L23:
+	jrc	$31
+	.set	macro
+	.set	reorder
+	.end	dgefa
+	.size	dgefa, .-dgefa
+	.section	.text.startup,"ax",@progbits
+	.align	2
+	.align	3
+	.globl	main
+	.set	nomips16
+	.set	nomicromips
+	.ent	main
+	.type	main, @function
+main:
+	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
+	.mask	0x00000000,0
+	.fmask	0x00000000,0
+	.set	noreorder
+	.set	nomacro
+	jr	$31
+	move	$2,$0
+
+	.set	macro
+	.set	reorder
+	.end	main
+	.size	main, .-main
+	.ident	"GCC: (GNU) 14.0.1 20240214 (experimental)"
+	.section	.note.GNU-stack,"",@progbits
diff --git a/gcc/testsuite/before.txt b/gcc/testsuite/before.txt
new file mode 100644
index 00000000000..c87764b8ae9
--- /dev/null
+++ b/gcc/testsuite/before.txt
@@ -0,0 +1,2694 @@
+tree_ssa_iv_optimize
+;;
+;; Loop 1
+;;  header 3, latch 6
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_12(D) + 4294967295
+;;  upper_bound 2147483646
+;;  likely_upper_bound 2147483646
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900)
+;;  nodes: 3 6
+Processing loop 1 at fp_foo.c:3
+  single exit 3 -> 7, exit condition if (n_12(D) > i_17)
+
+
+
+Loops in function: daxpy
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_5 bb_4 })
+  {
+    <bb 2> [local count: 118111600]:
+    if (n_12(D) > 0)
+      goto <bb 5>; [89.00%]
+    else
+      goto <bb 4>; [11.00%]
+
+  }
+  bb_5 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 5> [local count: 105119324]:
+
+  }
+  bb_7 (preds = {bb_3 }, succs = {bb_4 })
+  {
+    <bb 7> [local count: 105119324]:
+    # .MEM_22 = PHI <.MEM_16(3)>
+
+  }
+  bb_4 (preds = {bb_2 bb_7 }, succs = {bb_1 })
+  {
+    <bb 4> [local count: 118111600]:
+    # .MEM_29 = PHI <.MEM_11(D)(2), .MEM_22(7)>
+    # VUSE <.MEM_29>
+    return;
+
+  }
+  loop_1 (header = 3, latch = 6, finite_p
+  niter (unsigned int) n_12(D) + 4294967295
+  upper_bound 2147483646
+  likely_upper_bound 2147483646
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:105119324 (estimated locally, freq 0.8900))
+  {
+    bb_3 (preds = {bb_6 bb_5 }, succs = {bb_6 bb_7 })
+    {
+      <bb 3> [local count: 955630224]:
+      # i_20 = PHI <i_17(6), 0(5)>
+      # .MEM_21 = PHI <.MEM_16(6), .MEM_11(D)(5)>
+      _1 = (long unsigned int) i_20;
+      _2 = _1 * 4;
+      _3 = vector1_13(D) + _2;
+      # VUSE <.MEM_21>
+      _4 = *_3;
+      _5 = vector2_14(D) + _2;
+      # VUSE <.MEM_21>
+      _6 = *_5;
+      _7 = _6 * fp_const_15(D);
+      _8 = _4 + _7;
+      # .MEM_16 = VDEF <.MEM_21>
+      *_3 = _8;
+      i_17 = i_20 + 1;
+      if (n_12(D) > i_17)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 7>; [11.00%]
+
+    }
+    bb_6 (preds = {bb_3 }, succs = {bb_3 })
+    {
+      <bb 6> [local count: 850510900]:
+      goto <bb 3>; [100.00%]
+
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_12(D)
+  bounds on difference of bases: 0 ... 2147483646
+  result:
+    # of iterations (unsigned int) n_12(D) + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) n_12(D) + 4294967295
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_1
+  Type:	long unsigned int
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_2
+  Type:	long unsigned int
+  Base:	0
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_3
+  Type:	float *
+  Base:	vector1_13(D)
+  Step:	4
+  Object:	(void *) vector1_13(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_5
+  Type:	float *
+  Base:	vector2_14(D)
+  Step:	4
+  Object:	(void *) vector2_14(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_17
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_20
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	_4 = *_3;
+    At pos:	*_3
+    IV struct:
+      Type:	float *
+      Base:	vector1_13(D)
+      Step:	4
+      Object:	(void *) vector1_13(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+  Use 0.1:
+    At stmt:	*_3 = _8;
+    At pos:	*_3
+    IV struct:
+      Type:	float *
+      Base:	vector1_13(D)
+      Step:	4
+      Object:	(void *) vector1_13(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:	_6 = *_5;
+    At pos:	*_5
+    IV struct:
+      Type:	float *
+      Base:	vector2_14(D)
+      Step:	4
+      Object:	(void *) vector2_14(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 2:
+  Type:	COMPARE
+  Use 2.0:
+    At stmt:	if (n_12(D) > i_17)
+    At pos:	i_17
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.6
+  Var after: ivtmp.6
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.7
+  Var after: ivtmp.7
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.8
+  Var after: ivtmp.8
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.9
+  Var after: ivtmp.9
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector1_13(D)
+    Step:	4
+    Object:	(void *) vector1_13(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.10
+  Var after: ivtmp.10
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector2_14(D)
+    Step:	4
+    Object:	(void *) vector2_14(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.11
+  Var after: ivtmp.11
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 7:
+  Var befor: ivtmp.12
+  Var after: ivtmp.12
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	4
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 7
+  Group 1:	0, 1, 2, 3, 5, 7
+  Group 2:	0, 1, 2, 3, 6
+
+<Candidate Costs>:
+  cand	cost
+force_expr_to_var_cost size costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+force_expr_to_var_cost speed costs:
+  integer 0
+  symbol 5
+  address 5
+  other 24
+
+  0	5
+  1	5
+  2	5
+  3	4
+  4	5
+  5	5
+  6	5
+  7	5
+
+
+<Invariant Vars>:
+Inv 4:	n_12(D)	(eliminable)
+Inv 1:	vector1_13(D)	(eliminable)
+Inv 2:	vector2_14(D)	(eliminable)
+Inv 3:	fp_const_15(D)	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	(unsigned long) vector1_13(D) + 18446744073709551612
+inv_expr 2: 	(unsigned long) vector2_14(D) + 18446744073709551612
+inv_expr 3: 	(unsigned long) n_12(D) * 4 + (unsigned long) vector1_13(D)
+inv_expr 4: 	(unsigned long) n_12(D) * 4 + (unsigned long) vector2_14(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	18	0	NIL;	1
+  2	20	0	1;	NIL;
+  4	2	0	NIL;	NIL;
+  7	10	0	NIL;	1
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	9	0	NIL;	2
+  2	10	0	2;	NIL;
+  5	1	0	NIL;	NIL;
+  7	5	0	NIL;	2
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	4
+  1	0	0	NIL;	4
+  2	1	0	NIL;	4
+  3	0	0	NIL;	4
+  4	1	0	3;	NIL;
+  5	1	0	4;	NIL;
+  6	0	0	NIL;	4
+  7	1	0	NIL;	4
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	38
+  20	40
+  21	42
+  22	44
+  23	115
+  24	120
+  25	125
+  26	130
+  27	179
+  28	228
+  29	277
+  30	326
+  31	375
+  32	424
+  33	473
+  34	522
+  35	571
+  36	620
+  37	669
+  38	718
+  39	767
+  40	816
+  41	865
+  42	914
+  43	963
+  44	1012
+  45	1061
+  46	1110
+  47	1159
+  48	1208
+  49	1257
+  50	1306
+  51	1355
+  52	1404
+
+Initial set of candidates:
+  cost: 37 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 27 (complexity 0)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(18,0)
+   group:1 --> iv_cand:1, cost=(9,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 0)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 24 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 0)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables: 
+  invariant expressions: 3
+
+Initial set of candidates:
+  cost: 26 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 16 (complexity 0)
+  candidates: 7
+   group:0 --> iv_cand:7, cost=(10,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 1, 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 24 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 8 (complexity 0)
+  candidates: 4, 7
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:7, cost=(5,0)
+   group:2 --> iv_cand:7, cost=(1,0)
+  invariant variables: 2, 4
+  invariant expressions: 
+
+Improved to:
+  cost: 19 (complexity 0)
+  reg_cost: 5
+  cand_cost: 10
+  cand_group_cost: 4 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:5, cost=(1,0)
+   group:2 --> iv_cand:4, cost=(1,0)
+  invariant variables: 
+  invariant expressions: 3
+
+Original cost 19 (complexity 0)
+
+Final cost 19 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: ivtmp.9_28
+  Var after: ivtmp.9_27
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector1_13(D)
+    Step:	4
+    Object:	(void *) vector1_13(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.10_25
+  Var after: ivtmp.10_24
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) vector2_14(D)
+    Step:	4
+    Object:	(void *) vector2_14(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (n_12(D) > i_17)
+tree_ssa_iv_optimize
+;;
+;; Loop 3
+;;  header 8, latch 13
+;;  depth 3, outer 2, finite_p
+;;  niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628)
+;;  nodes: 8 13
+Processing loop 3 at fp_foo.c:3
+  single exit 8 -> 9, exit condition if (i_40 < _87)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          # VUSE <.MEM_57>
+          _35 = *_34;
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          # VUSE <.MEM_57>
+          _37 = *_36;
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          # .MEM_42 = VDEF <.MEM_57>
+          *_34 = _39;
+          i_40 = i_56 + 1;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 3
+  exit condition [1, + , 1](no_overflow) < _87
+  bounds on difference of bases: -2147483649 ... 2147483646
+  result:
+    zero if _87 <= 0
+    # of iterations (unsigned int) _87 + 4294967295, bounded by 2147483646
+  number of iterations (unsigned int) _87 + 4294967295; zero if _87 <= 0
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_21
+  Type:	sizetype
+  Base:	((sizetype) _7 + 1) * 4
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_29
+  Type:	sizetype
+  Base:	((sizetype) _11 + 1) * 4
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_32
+  Type:	long unsigned int
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_33
+  Type:	long unsigned int
+  Base:	0
+  Step:	4
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_34
+  Type:	float *
+  Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+  Step:	4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_36
+  Type:	float *
+  Base:	vector_27(D) + ((sizetype) _11 + 1) * 4
+  Step:	4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_40
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	i_56
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	_35 = *_34;
+    At pos:	*_34
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+  Use 0.1:
+    At stmt:	*_34 = _39;
+    At pos:	*_34
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _7 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	REFERENCE ADDRESS
+  Use 1.0:
+    At stmt:	_37 = *_36;
+    At pos:	*_36
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((sizetype) _11 + 1) * 4
+      Step:	4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 2:
+  Type:	COMPARE
+  Use 2.0:
+    At stmt:	if (i_40 < _87)
+    At pos:	i_40
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.20
+  Var after: ivtmp.20
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.21
+  Var after: ivtmp.21
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.22
+  Var after: ivtmp.22
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.23
+  Var after: ivtmp.23
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.24
+  Var after: ivtmp.24
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) ((sizetype) _7 * 4) + (unsigned long) vector_27(D)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.25
+  Var after: ivtmp.25
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _11 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Var befor: ivtmp.26
+  Var after: ivtmp.26
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) ((sizetype) _11 * 4) + (unsigned long) vector_27(D)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.27
+  Var after: ivtmp.27
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 9:
+  Var befor: ivtmp.28
+  Var after: ivtmp.28
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	4
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 5, 9
+  Group 1:	0, 1, 2, 3, 6, 7, 9
+  Group 2:	0, 1, 2, 3, 8
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	5
+  3	4
+  4	6
+  5	6
+  6	6
+  7	6
+  8	5
+  9	5
+
+
+<Invariant Vars>:
+Inv 6:	_7	(eliminable)
+Inv 1:	_10	(eliminable)
+Inv 7:	_11	(eliminable)
+Inv 3:	_14	(eliminable)
+Inv 2:	vector_27(D)	(eliminable)
+Inv 4:	t_28	(eliminable)
+Inv 5:	_87	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	((unsigned long) _7 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 2: 	(unsigned long) _7 * 4 + (unsigned long) vector_27(D)
+inv_expr 3: 	((unsigned long) _7 - (unsigned long) _11) * 4
+inv_expr 4: 	((unsigned long) _11 * 18446744073709551612 + (unsigned long) _7 * 4) + 4
+inv_expr 5: 	((unsigned long) _11 * 4 + (unsigned long) vector_27(D)) + 4
+inv_expr 6: 	(unsigned long) _11 * 4 + (unsigned long) vector_27(D)
+inv_expr 7: 	((unsigned long) _11 - (unsigned long) _7) * 4
+inv_expr 8: 	((unsigned long) _7 * 18446744073709551612 + (unsigned long) _11 * 4) + 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	22	0	1;	NIL;
+  2	22	0	2;	NIL;
+  4	2	0	NIL;	NIL;
+  5	2	2	NIL;	NIL;
+  6	16	0	3;	NIL;
+  7	18	0	4;	NIL;
+  9	14	0	1;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	11	0	5;	NIL;
+  2	11	0	6;	NIL;
+  4	8	0	7;	NIL;
+  5	9	0	8;	NIL;
+  6	1	0	NIL;	NIL;
+  7	1	1	NIL;	NIL;
+  9	7	0	5;	NIL;
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	5
+  1	0	0	NIL;	5
+  2	4	0	NIL;	5
+  3	0	0	NIL;	5
+  8	4	0	NIL;	5
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 0
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	38
+  20	40
+  21	42
+  22	44
+  23	115
+  24	120
+  25	125
+  26	130
+  27	179
+  28	228
+  29	277
+  30	326
+  31	375
+  32	424
+  33	473
+  34	522
+  35	571
+  36	620
+  37	669
+  38	718
+  39	767
+  40	816
+  41	865
+  42	914
+  43	963
+  44	1012
+  45	1061
+  46	1110
+  47	1159
+  48	1208
+  49	1257
+  50	1306
+  51	1355
+  52	1404
+
+Initial set of candidates:
+  cost: 43 (complexity 0)
+  reg_cost: 5
+  cand_cost: 5
+  cand_group_cost: 33 (complexity 0)
+  candidates: 1
+   group:0 --> iv_cand:1, cost=(22,0)
+   group:1 --> iv_cand:1, cost=(11,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 5
+
+Improved to:
+  cost: 27 (complexity 0)
+  reg_cost: 6
+  cand_cost: 11
+  cand_group_cost: 10 (complexity 0)
+  candidates: 1, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:1, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 0)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Initial set of candidates:
+  cost: 37 (complexity 0)
+  reg_cost: 7
+  cand_cost: 9
+  cand_group_cost: 21 (complexity 0)
+  candidates: 3, 9
+   group:0 --> iv_cand:9, cost=(14,0)
+   group:1 --> iv_cand:9, cost=(7,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 1, 5
+
+Improved to:
+  cost: 26 (complexity 0)
+  reg_cost: 6
+  cand_cost: 10
+  cand_group_cost: 10 (complexity 0)
+  candidates: 3, 4
+   group:0 --> iv_cand:4, cost=(2,0)
+   group:1 --> iv_cand:4, cost=(8,0)
+   group:2 --> iv_cand:3, cost=(0,0)
+  invariant variables: 5
+  invariant expressions: 7
+
+Original cost 26 (complexity 0)
+
+Final cost 26 (complexity 0)
+
+Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
+Candidate 3:
+  Var befor: i_56
+  Var after: i_40
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Var befor: ivtmp.23_85
+  Var after: ivtmp.23_84
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((sizetype) _7 + 1) * 4)
+    Step:	4
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+  allowed multipliers:
+
+;;
+;; Loop 2
+;;  header 7, latch 12
+;;  depth 2, outer 1, finite_p
+;;  niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009)
+;;  nodes: 7 12 9 8 13
+Processing loop 2 at fp_foo.c:9
+  single exit 9 -> 17, exit condition if (n_23(D) > j_30)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        # VUSE <.MEM_52>
+        t_28 = *_5;
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = _13 * 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        if (n_23(D) > j_30)
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _78 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_78];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _76 = (sizetype) _7;
+          _75 = _76 * 18446744073709551612;
+          _74 = _75 + ivtmp.23_85;
+          _73 = (void *) _74;
+          _72 = (sizetype) _11;
+          _71 = _72 * 4;
+          _70 = _73 + _71;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_70];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _77 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_77] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 2
+  exit condition [i_50 + 2, + , 1](no_overflow) < n_23(D)
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2, bounded by 2147483645
+  number of iterations ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_1
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_2
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D) + l_26(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_3
+  Type:	long unsigned int
+  Base:	(long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)
+  Step:	(long unsigned int) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_4
+  Type:	long unsigned int
+  Base:	((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:	(long unsigned int) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_5
+  Type:	float *
+  Base:	vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+  Step:	(long unsigned int) m_25(D) * 4
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_11
+  Type:	int
+  Base:	(i_50 + 1) * m_25(D) + i_50
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_12
+  Type:	sizetype
+  Base:	(sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_13
+  Type:	sizetype
+  Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_14
+  Type:	sizetype
+  Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+  Step:	(sizetype) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	j_30
+  Type:	int
+  Base:	i_50 + 2
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	j_51
+  Type:	int
+  Base:	i_50 + 1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_71
+  Type:	sizetype
+  Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+  Step:	(sizetype) m_25(D) * 4
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_72
+  Type:	sizetype
+  Base:	(sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50
+  Step:	(sizetype) m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+
+<IV Groups>:
+Group 0:
+  Type:	REFERENCE ADDRESS
+  Use 0.0:
+    At stmt:	t_28 = *_5;
+    At pos:	*_5
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4
+      Step:	(long unsigned int) m_25(D) * 4
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 1:
+  Type:	COMPARE
+  Use 1.0:
+    At stmt:	if (n_23(D) > j_30)
+    At pos:	j_30
+    IV struct:
+      Type:	int
+      Base:	i_50 + 2
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 2:
+  Type:	GENERIC
+  Use 2.0:
+    At stmt:	_14 = _13 * 4;
+    At pos:	
+    IV struct:
+      Type:	sizetype
+      Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+      Step:	(sizetype) m_25(D) * 4
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 3:
+  Type:	GENERIC
+  Use 3.0:
+    At stmt:	_71 = _72 * 4;
+    At pos:	
+    IV struct:
+      Type:	sizetype
+      Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+      Step:	(sizetype) m_25(D) * 4
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.29
+  Var after: ivtmp.29
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.30
+  Var after: ivtmp.30
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.31
+  Var after: ivtmp.31
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(sizetype) (i_50 + 2)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Var befor: ivtmp.32
+  Var after: ivtmp.32
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(sizetype) (i_50 + 1)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	i_50 + 1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.33
+  Var after: ivtmp.33
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + ((long unsigned int) ((i_50 + 1) * m_25(D)) + (long unsigned int) l_26(D)) * 4)
+    Step:	(unsigned long) ((long unsigned int) m_25(D) * 4)
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.34
+  Var after: ivtmp.34
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (i_50 + 2)
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 7:
+  Var befor: ivtmp.35
+  Var after: ivtmp.35
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) i_50
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 8:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.36
+  Var after: ivtmp.36
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	(((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) + 1) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37
+  Var after: ivtmp.37
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 10:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.38
+  Var after: ivtmp.38
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	0
+    Step:	(long unsigned int) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+<Important Candidates>:	 0, 1, 2, 3, 4,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3, 4, 5, 10
+  Group 1:	0, 1, 2, 3, 4, 6, 7
+  Group 2:	0, 1, 2, 3, 4, 8, 9, 10
+  Group 3:	0, 1, 2, 3, 4, 9, 10
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	6
+  3	6
+  4	4
+  5	9
+  6	5
+  7	5
+  8	10
+  9	9
+  10	5
+
+Scaling cost based on bb prob by 8.00: 6 (scratch: 2) -> 34
+Scaling cost based on bb prob by 8.00: 4 (scratch: 0) -> 32
+Scaling cost based on bb prob by 8.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 8.00: 8 (scratch: 4) -> 36
+
+<Invariant Vars>:
+Inv 6:	_7
+Inv 8:	_10
+Inv 7:	n_23(D)	(eliminable)
+Inv 1:	j_24	(eliminable)
+Inv 2:	m_25(D)	(eliminable)
+Inv 3:	l_26(D)	(eliminable)
+Inv 4:	vector_27(D)
+Inv 5:	i_50	(eliminable)
+Inv 9:	_87
+
+<Invariant Expressions>:
+inv_expr 1: 	(long unsigned int) m_25(D) * 4
+inv_expr 2: 	(((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4) + 18446744073709551612
+inv_expr 3: 	((unsigned long) l_26(D) * 4 + (unsigned long) vector_27(D)) - (unsigned long) i_50 * 4
+inv_expr 4: 	((unsigned long) ((i_50 + 1) * m_25(D)) + (unsigned long) l_26(D)) * 4 + (unsigned long) vector_27(D)
+inv_expr 5: 	((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967295
+inv_expr 6: 	(signed int) i_50 + 1
+inv_expr 7: 	(unsigned long) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294) + 1
+inv_expr 8: 	((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 3
+inv_expr 9: 	((sizetype) i_50 + (sizetype) (((unsigned int) n_23(D) - (unsigned int) i_50) + 4294967294)) + 2
+inv_expr 10: 	(((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4) + 4
+inv_expr 11: 	(((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) + 1) * 4
+inv_expr 12: 	((signed long) i_50 * 4 - (signed long) vector_27(D)) - (signed long) l_26(D) * 4
+inv_expr 13: 	((signed long) ((i_50 + 1) * m_25(D)) + (signed long) i_50) * 4
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	1	0	NIL;	NIL;
+  8	9	0	2;	NIL;
+  9	8	0	3;	NIL;
+  10	10	0	4;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	5;	NIL;
+  1	2	0	7;	NIL;
+  2	3	0	8;	NIL;
+  3	0	0	NIL;	7
+  4	0	0	NIL;	7
+  6	0	0	NIL;	7
+  7	0	0	NIL;	7
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	7	0	10;	NIL;
+  8	0	0	NIL;	NIL;
+  9	4	0	NIL;	NIL;
+  10	9	0	11;	NIL;
+
+Group 3:
+  cand	cost	compl.	inv.expr.	inv.vars
+  5	34	0	12;	NIL;
+  8	32	0	NIL;	NIL;
+  9	0	0	NIL;	NIL;
+  10	36	0	13;	NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	111
+  20	116
+  21	121
+  22	126
+  23	151
+  24	176
+  25	201
+  26	226
+  27	275
+  28	324
+  29	373
+  30	422
+  31	471
+  32	520
+  33	569
+  34	618
+  35	667
+  36	716
+  37	765
+  38	814
+  39	863
+  40	912
+  41	961
+  42	1010
+  43	1059
+  44	1108
+  45	1157
+  46	1206
+  47	1255
+  48	1304
+  49	1353
+  50	1402
+  51	1451
+  52	1500
+
+Initial set of candidates:
+  cost: 63 (complexity 0)
+  reg_cost: 8
+  cand_cost: 13
+  cand_group_cost: 42 (complexity 0)
+  candidates: 4, 5
+   group:0 --> iv_cand:5, cost=(1,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:5, cost=(7,0)
+   group:3 --> iv_cand:5, cost=(34,0)
+  invariant variables: 7
+  invariant expressions: 1, 10, 12
+
+Improved to:
+  cost: 32 (complexity 0)
+  reg_cost: 7
+  cand_cost: 13
+  cand_group_cost: 12 (complexity 0)
+  candidates: 4, 9
+   group:0 --> iv_cand:9, cost=(8,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:9, cost=(4,0)
+   group:3 --> iv_cand:9, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 3
+
+Initial set of candidates:
+  cost: 32 (complexity 0)
+  reg_cost: 7
+  cand_cost: 13
+  cand_group_cost: 12 (complexity 0)
+  candidates: 4, 9
+   group:0 --> iv_cand:9, cost=(8,0)
+   group:1 --> iv_cand:4, cost=(0,0)
+   group:2 --> iv_cand:9, cost=(4,0)
+   group:3 --> iv_cand:9, cost=(0,0)
+  invariant variables: 7
+  invariant expressions: 1, 3
+
+Original cost 32 (complexity 0)
+
+Final cost 32 (complexity 0)
+
+Selected IV set for loop 2 at fp_foo.c:9, 10 avg niters, 2 IVs:
+Candidate 4:
+  Var befor: j_51
+  Var after: j_30
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	i_50 + 1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 9:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.37_69
+  Var after: ivtmp.37_68
+  Incr POS: before exit test
+  IV struct:
+    Type:	sizetype
+    Base:	((sizetype) ((i_50 + 1) * m_25(D)) + (sizetype) i_50) * 4
+    Step:	(sizetype) m_25(D) * 4
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (n_23(D) > j_30)
+;;
+;; Loop 1
+;;  header 4, latch 11
+;;  depth 1, outer 0, finite_p
+;;  niter (unsigned int) n_23(D) + 4294967294
+;;  upper_bound 2147483645
+;;  likely_upper_bound 2147483645
+;;  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900)
+;;  nodes: 4 11 5 15 17 9 8 13 7 12 6
+Processing loop 1 at fp_foo.c:8
+  single exit 5 -> 16, exit condition if (j_24 < _45)
+
+
+
+Loops in function: dgefa
+loop_0 (header = 0, latch = 1)
+{
+  bb_2 (preds = {bb_0 }, succs = {bb_10 bb_14 })
+  {
+    <bb 2> [local count: 1804255]:
+    _45 = n_23(D) + -1;
+    if (n_23(D) > 1)
+      goto <bb 10>; [89.00%]
+    else
+      goto <bb 14>; [11.00%]
+
+  }
+  bb_14 (preds = {bb_2 }, succs = {bb_3 })
+  {
+    <bb 14> [local count: 198468]:
+
+  }
+  bb_3 (preds = {bb_14 bb_16 }, succs = {bb_1 })
+  {
+    <bb 3> [local count: 1804255]:
+    # .MEM_88 = PHI <.MEM_22(D)(14), .MEM_53(16)>
+    # VUSE <.MEM_88>
+    return;
+
+  }
+  bb_10 (preds = {bb_2 }, succs = {bb_4 })
+  {
+    <bb 10> [local count: 1605787]:
+
+  }
+  bb_16 (preds = {bb_5 }, succs = {bb_3 })
+  {
+    <bb 16> [local count: 1605787]:
+    # .MEM_53 = PHI <.MEM_89(5)>
+    goto <bb 3>; [100.00%]
+
+  }
+  loop_1 (header = 4, latch = 11, finite_p
+  niter (unsigned int) n_23(D) + 4294967294
+  upper_bound 2147483645
+  likely_upper_bound 2147483645
+  iterations by profile: 8.090909 (unreliable, maybe flat) entry count:1605787 (estimated locally, freq 0.8900))
+  {
+    bb_4 (preds = {bb_11 bb_10 }, succs = {bb_6 bb_15 })
+    {
+      <bb 4> [local count: 14598063]:
+      # i_50 = PHI <j_24(11), 0(10)>
+      # .MEM_54 = PHI <.MEM_89(11), .MEM_22(D)(10)>
+      j_24 = i_50 + 1;
+      if (n_23(D) > j_24)
+        goto <bb 6>; [89.00%]
+      else
+        goto <bb 15>; [11.00%]
+
+    }
+    bb_15 (preds = {bb_4 }, succs = {bb_5 })
+    {
+      <bb 15> [local count: 1605787]:
+
+    }
+    bb_5 (preds = {bb_15 bb_17 }, succs = {bb_11 bb_16 })
+    {
+      <bb 5> [local count: 14598063]:
+      # .MEM_89 = PHI <.MEM_54(15), .MEM_86(17)>
+      if (j_24 < _45)
+        goto <bb 11>; [89.00%]
+      else
+        goto <bb 16>; [11.00%]
+
+    }
+    bb_11 (preds = {bb_5 }, succs = {bb_4 })
+    {
+      <bb 11> [local count: 12992276]:
+      goto <bb 4>; [100.00%]
+
+    }
+    bb_6 (preds = {bb_4 }, succs = {bb_7 })
+    {
+      <bb 6> [local count: 12992276]:
+      _6 = m_25(D) * i_50;
+      _7 = _6 + i_50;
+      _8 = (sizetype) _7;
+      _9 = _8 + 1;
+      _10 = _9 * 4;
+      _87 = n_23(D) - j_24;
+      _67 = (sizetype) m_25(D);
+      _66 = _67 * 4;
+      _64 = i_50 + 1;
+      _63 = m_25(D) * _64;
+      _62 = (sizetype) _63;
+      _61 = (sizetype) i_50;
+      _60 = _61 + _62;
+      ivtmp.37_65 = _60 * 4;
+
+    }
+    bb_17 (preds = {bb_9 }, succs = {bb_5 })
+    {
+      <bb 17> [local count: 12992276]:
+      # .MEM_86 = PHI <.MEM_55(9)>
+      goto <bb 5>; [100.00%]
+
+    }
+    loop_2 (header = 7, latch = 12, finite_p
+    niter ((unsigned int) n_23(D) - (unsigned int) i_50) - 2
+    upper_bound 2147483645
+    likely_upper_bound 2147483645
+    iterations by profile: 8.090909 (unreliable, maybe flat) entry count:12992276 (estimated locally, freq 7.2009))
+    {
+      bb_7 (preds = {bb_12 bb_6 }, succs = {bb_8 })
+      {
+        <bb 7> [local count: 118111600]:
+        # j_51 = PHI <j_30(12), j_24(6)>
+        # .MEM_52 = PHI <.MEM_55(12), .MEM_54(6)>
+        # ivtmp.37_69 = PHI <ivtmp.37_68(12), ivtmp.37_65(6)>
+        _1 = m_25(D) * j_51;
+        _2 = _1 + l_26(D);
+        _3 = (long unsigned int) _2;
+        _4 = _3 * 4;
+        _5 = vector_27(D) + _4;
+        _59 = (sizetype) i_50;
+        _58 = _59 * 18446744073709551612;
+        _49 = (sizetype) l_26(D);
+        _48 = _49 * 4;
+        _47 = _48 + _58;
+        _46 = vector_27(D) + _47;
+        _44 = _46 + ivtmp.37_69;
+        # VUSE <.MEM_52>
+        t_28 = MEM[(float *)_44];
+        _11 = _1 + i_50;
+        _12 = (sizetype) _11;
+        _13 = _12 + 1;
+        _14 = ivtmp.37_69 + 4;
+        _82 = (sizetype) _7;
+        _81 = _82 + 1;
+        _80 = _81 * 4;
+        _79 = vector_27(D) + _80;
+        ivtmp.23_83 = (unsigned long) _79;
+
+      }
+      bb_9 (preds = {bb_8 }, succs = {bb_12 bb_17 })
+      {
+        <bb 9> [local count: 118111600]:
+        # .MEM_55 = PHI <.MEM_42(8)>
+        j_30 = j_51 + 1;
+        ivtmp.37_68 = ivtmp.37_69 + _66;
+        if (j_30 != n_23(D))
+          goto <bb 12>; [89.00%]
+        else
+          goto <bb 17>; [11.00%]
+
+      }
+      bb_12 (preds = {bb_9 }, succs = {bb_7 })
+      {
+        <bb 12> [local count: 105119324]:
+        goto <bb 7>; [100.00%]
+
+      }
+      loop_3 (header = 8, latch = 13, finite_p
+      niter _87 > 0 ? (unsigned int) _87 + 4294967295 : 0
+      upper_bound 2147483645
+      likely_upper_bound 2147483645
+      iterations by profile: 7.090909 (unreliable, maybe flat) entry count:118111600 (estimated locally, freq 65.4628))
+      {
+        bb_8 (preds = {bb_13 bb_7 }, succs = {bb_13 bb_9 })
+        {
+          <bb 8> [local count: 955630225]:
+          # i_56 = PHI <i_40(13), 0(7)>
+          # .MEM_57 = PHI <.MEM_42(13), .MEM_52(7)>
+          # ivtmp.23_85 = PHI <ivtmp.23_84(13), ivtmp.23_83(7)>
+          _32 = (long unsigned int) i_56;
+          _33 = _32 * 4;
+          _21 = _10 + _33;
+          _34 = vector_27(D) + _21;
+          _78 = (void *) ivtmp.23_85;
+          # VUSE <.MEM_57>
+          _35 = MEM[(float *)_78];
+          _29 = _14 + _33;
+          _36 = vector_27(D) + _29;
+          _76 = (sizetype) _7;
+          _75 = _76 * 18446744073709551612;
+          _74 = _75 + ivtmp.23_85;
+          _73 = (void *) _74;
+          _72 = (sizetype) _11;
+          _71 = ivtmp.37_69;
+          _70 = _73 + _71;
+          # VUSE <.MEM_57>
+          _37 = MEM[(float *)_70];
+          _38 = t_28 * _37;
+          _39 = _35 + _38;
+          _77 = (void *) ivtmp.23_85;
+          # .MEM_42 = VDEF <.MEM_57>
+          MEM[(float *)_77] = _39;
+          i_40 = i_56 + 1;
+          ivtmp.23_84 = ivtmp.23_85 + 4;
+          if (i_40 < _87)
+            goto <bb 13>; [89.00%]
+          else
+            goto <bb 9>; [11.00%]
+
+        }
+        bb_13 (preds = {bb_8 }, succs = {bb_8 })
+        {
+          <bb 13> [local count: 850510901]:
+          goto <bb 8>; [100.00%]
+
+        }
+      }
+    }
+  }
+}
+Analyzing # of iterations of loop 1
+  exit condition [1, + , 1](no_overflow) < n_23(D) + -1
+  bounds on difference of bases: 0 ... 2147483645
+  result:
+    # of iterations (unsigned int) n_23(D) + 4294967294, bounded by 2147483645
+  number of iterations (unsigned int) n_23(D) + 4294967294
+
+<Induction Vars>:
+IV struct:
+  SSA_NAME:	_6
+  Type:	int
+  Base:	0
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_7
+  Type:	int
+  Base:	0
+  Step:	(int) ((unsigned int) m_25(D) + 1)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	j_24
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_46
+  Type:	float *
+  Base:	vector_27(D) + (sizetype) l_26(D) * 4
+  Step:	18446744073709551612
+  Object:	(void *) vector_27(D)
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_47
+  Type:	sizetype
+  Base:	(sizetype) l_26(D) * 4
+  Step:	18446744073709551612
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	i_50
+  Type:	int
+  Base:	0
+  Step:	1
+  Biv:	Y
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_58
+  Type:	sizetype
+  Base:	0
+  Step:	18446744073709551612
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_59
+  Type:	sizetype
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_61
+  Type:	sizetype
+  Base:	0
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_63
+  Type:	int
+  Base:	m_25(D)
+  Step:	m_25(D)
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+IV struct:
+  SSA_NAME:	_64
+  Type:	int
+  Base:	1
+  Step:	1
+  Biv:	N
+  Overflowness wrto loop niter:	No-overflow
+IV struct:
+  SSA_NAME:	_87
+  Type:	int
+  Base:	n_23(D) + -1
+  Step:	-1
+  Biv:	N
+  Overflowness wrto loop niter:	Overflow
+
+<IV Groups>:
+Group 0:
+  Type:	COMPARE
+  Use 0.0:
+    At stmt:	if (n_23(D) > j_24)
+    At pos:	j_24
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 1:
+  Type:	COMPARE
+  Use 1.0:
+    At stmt:	if (j_24 < _45)
+    At pos:	j_24
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 2:
+  Type:	GENERIC
+  Use 2.0:
+    At stmt:	_7 = _6 + i_50;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	0
+      Step:	(int) ((unsigned int) m_25(D) + 1)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 3:
+  Type:	COMPARE
+  Use 3.0:
+    At stmt:	if (i_40 < _87)
+    At pos:	_87
+    IV struct:
+      Type:	int
+      Base:	n_23(D) + -1
+      Step:	-1
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 4:
+  Type:	GENERIC
+  Use 4.0:
+    At stmt:	j_24 = i_50 + 1;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	1
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 5:
+  Type:	GENERIC
+  Use 5.0:
+    At stmt:	_46 = vector_27(D) + _47;
+    At pos:	
+    IV struct:
+      Type:	float *
+      Base:	vector_27(D) + (sizetype) l_26(D) * 4
+      Step:	18446744073709551612
+      Object:	(void *) vector_27(D)
+      Biv:	N
+      Overflowness wrto loop niter:	No-overflow
+Group 6:
+  Type:	GENERIC
+  Use 6.0:
+    At stmt:	i_50 = PHI <j_24(11), 0(10)>
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	0
+      Step:	1
+      Biv:	Y
+      Overflowness wrto loop niter:	No-overflow
+Group 7:
+  Type:	GENERIC
+  Use 7.0:
+    At stmt:	_63 = m_25(D) * _64;
+    At pos:	
+    IV struct:
+      Type:	int
+      Base:	m_25(D)
+      Step:	m_25(D)
+      Biv:	N
+      Overflowness wrto loop niter:	Overflow
+Group 8:
+  Type:	GENERIC
+  Use 8.0:
+    At stmt:	_61 = (sizetype) i_50;
+    At pos:	
+    IV struct:
+      Type:	sizetype
+      Base:	0
+      Step:	1
+      Biv:	N
+      Overflowness wrto loop niter:	No-overflow
+
+Predict doloop failure due to target specific checks.
+Candidate 0:
+  Var befor: ivtmp.39
+  Var after: ivtmp.39
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 1:
+  Var befor: ivtmp.40
+  Var after: ivtmp.40
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 2:
+  Var befor: ivtmp.41
+  Var after: ivtmp.41
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	1
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.42
+  Var after: ivtmp.42
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	(unsigned int) m_25(D) + 1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.43
+  Var after: ivtmp.43
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (n_23(D) + -1)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 6:
+  Var befor: ivtmp.44
+  Var after: ivtmp.44
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) n_23(D)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Var befor: ivtmp.45
+  Var after: ivtmp.45
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:	18446744073709551612
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.46
+  Var after: ivtmp.46
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) m_25(D)
+    Step:	(unsigned int) m_25(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+<Important Candidates>:	 0, 1, 2, 3,
+
+<Group, Cand> Related:
+  Group 0:	0, 1, 2, 3
+  Group 1:	0, 1, 2, 3
+  Group 2:	0, 1, 2, 3, 4
+  Group 3:	0, 1, 2, 3, 5, 6
+  Group 4:	0, 1, 2, 3
+  Group 5:	0, 1, 2, 3, 7
+  Group 6:	0, 1, 2, 3
+  Group 7:	0, 1, 2, 3, 8
+  Group 8:	0, 1, 2, 3
+
+<Candidate Costs>:
+  cand	cost
+  0	5
+  1	5
+  2	5
+  3	4
+  4	5
+  5	5
+  6	5
+  7	6
+  8	5
+
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 20.00: 0 (scratch: 0) -> 0
+Scaling cost based on bb prob by 20.00: 4 (scratch: 0) -> 80
+Scaling cost based on bb prob by 2.00: 9 (scratch: 1) -> 17
+Scaling cost based on bb prob by 2.00: 0 (scratch: 0) -> 0
+
+<Invariant Vars>:
+Inv 1:	n_23(D)
+Inv 4:	m_25(D)
+Inv 5:	l_26(D)
+Inv 3:	vector_27(D)
+Inv 2:	_45	(eliminable)
+
+<Invariant Expressions>:
+inv_expr 1: 	(unsigned int) m_25(D) + 1
+inv_expr 2: 	(signed int) n_23(D) + 1
+inv_expr 3: 	(signed int) n_23(D) + -1
+inv_expr 4: 	(signed long) l_26(D) * 4 + (signed long) vector_27(D)
+
+<Group-candidate Costs>:
+Group 0:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	4	0	NIL;	NIL;
+  1	4	0	NIL;	NIL;
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  5	4	0	NIL;	NIL;
+  6	4	0	2;	NIL;
+
+Group 1:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	NIL;
+  1	0	0	NIL;	2
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  5	0	0	NIL;	NIL;
+  6	0	0	NIL;	NIL;
+  7	3	0	NIL;	NIL;
+
+Group 2:
+  cand	cost	compl.	inv.expr.	inv.vars
+  4	0	0	NIL;	NIL;
+
+Group 3:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	80	0	3;	NIL;
+  1	80	0	3;	NIL;
+  2	80	0	NIL;	NIL;
+  3	80	0	NIL;	NIL;
+  5	0	0	NIL;	NIL;
+  6	80	0	NIL;	NIL;
+
+Group 4:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	4	0	NIL;	NIL;
+  1	4	0	NIL;	NIL;
+  2	0	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  5	4	0	NIL;	NIL;
+  6	4	0	2;	NIL;
+
+Group 5:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	17	0	4;	NIL;
+  7	0	0	NIL;	NIL;
+
+Group 6:
+  cand	cost	compl.	inv.expr.	inv.vars
+  0	0	0	NIL;	NIL;
+  1	0	0	NIL;	NIL;
+  2	4	0	NIL;	NIL;
+  3	0	0	NIL;	NIL;
+  5	4	0	3;	NIL;
+  6	4	0	NIL;	NIL;
+
+Group 7:
+  cand	cost	compl.	inv.expr.	inv.vars
+  8	0	0	NIL;	NIL;
+
+Group 8:
+  cand	cost	compl.	inv.expr.	inv.vars
+  1	0	0	NIL;	NIL;
+
+
+<Global Costs>:
+  target_avail_regs 26
+  target_clobbered_regs 16
+  target_reg_cost 4
+  target_spill_cost 24
+  regs_used 4
+  cost for size:
+  ivs	cost
+  0	0
+  1	2
+  2	4
+  3	6
+  4	8
+  5	10
+  6	12
+  7	14
+  8	16
+  9	18
+  10	20
+  11	22
+  12	24
+  13	26
+  14	28
+  15	30
+  16	32
+  17	34
+  18	36
+  19	111
+  20	116
+  21	121
+  22	126
+  23	151
+  24	176
+  25	201
+  26	226
+  27	275
+  28	324
+  29	373
+  30	422
+  31	471
+  32	520
+  33	569
+  34	618
+  35	667
+  36	716
+  37	765
+  38	814
+  39	863
+  40	912
+  41	961
+  42	1010
+  43	1059
+  44	1108
+  45	1157
+  46	1206
+  47	1255
+  48	1304
+  49	1353
+  50	1402
+  51	1451
+  52	1500
+
+Initial set of candidates:
+  cost: 126 (complexity 0)
+  reg_cost: 10
+  cand_cost: 19
+  cand_group_cost: 97 (complexity 0)
+  candidates: 1, 3, 4, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:3, cost=(80,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 53 (complexity 0)
+  reg_cost: 12
+  cand_cost: 24
+  cand_group_cost: 17 (complexity 0)
+  candidates: 1, 3, 4, 5, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 5, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Initial set of candidates:
+  cost: 55 (complexity 0)
+  reg_cost: 10
+  cand_cost: 20
+  cand_group_cost: 25 (complexity 0)
+  candidates: 1, 4, 5, 8
+   group:0 --> iv_cand:5, cost=(4,0)
+   group:1 --> iv_cand:5, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:5, cost=(4,0)
+   group:5 --> iv_cand:1, cost=(17,0)
+   group:6 --> iv_cand:1, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1, 4
+
+Improved to:
+  cost: 45 (complexity 0)
+  reg_cost: 11
+  cand_cost: 26
+  cand_group_cost: 8 (complexity 0)
+  candidates: 1, 4, 5, 7, 8
+   group:0 --> iv_cand:5, cost=(4,0)
+   group:1 --> iv_cand:5, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:5, cost=(4,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:1, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Improved to:
+  cost: 43 (complexity 0)
+  reg_cost: 13
+  cand_cost: 30
+  cand_group_cost: 0 (complexity 0)
+  candidates: 1, 3, 4, 5, 7, 8
+   group:0 --> iv_cand:3, cost=(0,0)
+   group:1 --> iv_cand:3, cost=(0,0)
+   group:2 --> iv_cand:4, cost=(0,0)
+   group:3 --> iv_cand:5, cost=(0,0)
+   group:4 --> iv_cand:3, cost=(0,0)
+   group:5 --> iv_cand:7, cost=(0,0)
+   group:6 --> iv_cand:3, cost=(0,0)
+   group:7 --> iv_cand:8, cost=(0,0)
+   group:8 --> iv_cand:1, cost=(0,0)
+  invariant variables: 
+  invariant expressions: 1
+
+Original cost 43 (complexity 0)
+
+Final cost 43 (complexity 0)
+
+Selected IV set for loop 1 at fp_foo.c:8, 10 avg niters, 6 IVs:
+Candidate 1:
+  Var befor: ivtmp.40_43
+  Var after: ivtmp.40_41
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 3:
+  Var befor: i_50
+  Var after: j_24
+  Incr POS: orig biv
+  IV struct:
+    Type:	int
+    Base:	0
+    Step:	1
+    Biv:	N
+    Overflowness wrto loop niter:	No-overflow
+Candidate 4:
+  Depend on inv.exprs: 1
+  Var befor: ivtmp.42_31
+  Var after: ivtmp.42_20
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	0
+    Step:	(unsigned int) m_25(D) + 1
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 5:
+  Var befor: ivtmp.43_17
+  Var after: ivtmp.43_16
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) (n_23(D) + -1)
+    Step:	4294967295
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 7:
+  Var befor: ivtmp.45_91
+  Var after: ivtmp.45_92
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned long
+    Base:	(unsigned long) (vector_27(D) + (sizetype) l_26(D) * 4)
+    Step:	18446744073709551612
+    Object:	(void *) vector_27(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+Candidate 8:
+  Var befor: ivtmp.46_97
+  Var after: ivtmp.46_98
+  Incr POS: before exit test
+  IV struct:
+    Type:	unsigned int
+    Base:	(unsigned int) m_25(D)
+    Step:	(unsigned int) m_25(D)
+    Biv:	N
+    Overflowness wrto loop niter:	Overflow
+
+Replacing exit test: if (j_24 < _45)
diff --git a/gcc/testsuite/fp_foo.c b/gcc/testsuite/fp_foo.c
new file mode 100644
index 00000000000..f65f43d6435
--- /dev/null
+++ b/gcc/testsuite/fp_foo.c
@@ -0,0 +1,19 @@
+
+void daxpy(float *vector1, float *vector2, int n, float fp_const){
+	for (int i = 0; i < n; ++i)
+		vector1[i] += fp_const * vector2[i];
+}
+
+void dgefa(float *vector, int m, int n, int l){
+	for (int i = 0; i < n - 1; ++i){
+		for (int j = i + 1; j < n; ++j){
+			float t = vector[m * j + l];
+			daxpy(&vector[m * i + i + 1],
+                              &vector[m * j + i + 1], n - (i + 1), t);
+		}
+	}
+}
+
+int main(){
+  return 0;
+}
diff --git a/gcc/testsuite/test_script.sh b/gcc/testsuite/test_script.sh
new file mode 100644
index 00000000000..4f19d248efe
--- /dev/null
+++ b/gcc/testsuite/test_script.sh
@@ -0,0 +1,10 @@
+export PREFIX="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install"
+export SOURCE_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/source"
+export BUILD_DIR="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/build"
+export SYSROOT="/home/syrmia/Desktop/Aleksandar/GNU_toolchain/install/sys_root"
+export PATH=$PREFIX/bin:$PATH
+export TARGET=mips64-r6-linux-gnu
+
+
+$PREFIX/bin/mips64-r6-linux-gnu-gcc fp_foo.c -O2 >out.txt -S -o fp_foo.s -march=mips64r6 -mabi=64
+
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7cae5bdefea..2dec5001dca 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4724,7 +4724,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   rtx addr;
   bool simple_inv = true;
   tree comp_inv = NULL_TREE, type = aff_var->type;
-  comp_cost var_cost = no_cost, cost = no_cost;
+  comp_cost var_cost = no_cost, cost = no_cost, autoinc_cost = no_cost;
+  comp_cost acost = no_cost;
   struct mem_address parts = {NULL_TREE, integer_one_node,
 			      NULL_TREE, NULL_TREE, NULL_TREE};
   machine_mode addr_mode = TYPE_MODE (type);
@@ -4755,38 +4756,36 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
 	  if (!ok_with_ratio_p)
 	    parts.step = NULL_TREE;
 	}
-      if (ok_with_ratio_p || ok_without_ratio_p)
+      if (!(ok_with_ratio_p || ok_without_ratio_p))
+    parts.index = NULL_TREE;
+
+      if (maybe_ne (aff_inv->offset, 0))
 	{
-	  if (maybe_ne (aff_inv->offset, 0))
-	    {
-	      parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
-	      /* Addressing mode "base + index [<< scale] + offset".  */
-	      if (!valid_mem_ref_p (mem_mode, as, &parts, code))
-		parts.offset = NULL_TREE;
-	      else
-		aff_inv->offset = 0;
-	    }
+	  parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
+	  /* Addressing mode "base + index[<< scale] + offset".  */
+	  if (!valid_mem_ref_p (mem_mode, as, &parts, code))
+	    parts.offset = NULL_TREE;
+	  else
+	    aff_inv->offset = 0;
+	}
 
-	  move_fixed_address_to_symbol (&parts, aff_inv);
-	  /* Base is fixed address and is moved to symbol part.  */
-	  if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
-	    parts.base = NULL_TREE;
+      move_fixed_address_to_symbol (&parts, aff_inv);
+      /* Base is fixed address and is moved to symbol part.  */
+      if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
+    parts.base = NULL_TREE;
 
-	  /* Addressing mode "symbol + base + index [<< scale] [+ offset]".  */
-	  if (parts.symbol != NULL_TREE
-	      && !valid_mem_ref_p (mem_mode, as, &parts, code))
-	    {
-	      aff_combination_add_elt (aff_inv, parts.symbol, 1);
-	      parts.symbol = NULL_TREE;
-	      /* Reset SIMPLE_INV since symbol address needs to be computed
-		 outside of address expression in this case.  */
-	      simple_inv = false;
-	      /* Symbol part is moved back to base part, it can't be NULL.  */
-	      parts.base = integer_one_node;
-	    }
+      /* Addressing mode "symbol + base + index[<< scale] [+ offset]".  */
+      if (parts.symbol != NULL_TREE
+	   && !valid_mem_ref_p (mem_mode, as, &parts, code))
+	{
+	  aff_combination_add_elt (aff_inv, parts.symbol, 1);
+	  parts.symbol = NULL_TREE;
+	  /* Reset SIMPLE_INV since symbol address needs to be computed
+ outside of address expression in this case.  */
+	  simple_inv = false;
+	 /* Symbol part is moved back to base part, it can't be NULL.  */
+	  parts.base = integer_one_node;
 	}
-      else
-	parts.index = NULL_TREE;
     }
   else
     {
@@ -4799,14 +4798,12 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
 
 	  if (stmt_after_increment (data->current_loop, cand, use->stmt))
 	    ainc_offset += ainc_step;
-	  cost = get_address_cost_ainc (ainc_step, ainc_offset,
+	  autoinc_cost = get_address_cost_ainc (ainc_step, ainc_offset,
 					addr_mode, mem_mode, as, speed);
-	  if (!cost.infinite_cost_p ())
-	    {
-	      *can_autoinc = true;
-	      return cost;
-	    }
-	  cost = no_cost;
+	  if (!autoinc_cost.infinite_cost_p ())
+	    *can_autoinc = true;
+	  else
+	    autoinc_cost = no_cost;
 	}
       if (!aff_combination_zero_p (aff_inv))
 	{
@@ -4852,10 +4849,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   cost += var_cost;
   addr = addr_for_mem_ref (&parts, as, false);
   gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
-  cost += address_cost (addr, mem_mode, as, speed);
+  acost += address_cost (addr, mem_mode, as, speed);
 
   if (parts.symbol != NULL_TREE)
     cost.complexity += 1;
+  /* var_present.  */
+  else if (!aff_combination_const_p (aff_inv))
+    cost.complexity += 1;
   /* Don't increase the complexity of adding a scaled index if it's
      the only kind of index that the target allows.  */
   if (parts.step != NULL_TREE && ok_without_ratio_p)
@@ -4865,6 +4865,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
     cost.complexity += 1;
 
+  cost += (can_autoinc && *can_autoinc) ? autoinc_cost : acost;
   return cost;
 }
 
-- 
2.34.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-12-16 11:37                   ` Dimitrije Milosevic
@ 2022-12-16 11:58                     ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2022-12-16 11:58 UTC (permalink / raw)
  To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

On Fri, Dec 16, 2022 at 12:37 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
>
> Hi Richard,
>
> > The only documentation on complexity I find is
> >
> >   int64_t cost;         /* The runtime cost.  */
> >   unsigned complexity;  /* The estimate of the complexity of the code for
> >                            the computation (in no concrete units --
> >                            complexity field should be larger for more
> >                            complex expressions and addressing modes).  */
> >
> > and complexity is used as tie-breaker only when cost is equal.  Given that
> > shouldn't unsupported addressing modes have higher complexity?  I'll note
> > that there's nothing "unsupported", each "unsupported" address computation
> > is lowered into supported pieces.  "unsupported" maybe means that
> > "cost" isn't fully covered by address-cost and compensation stmts might
> > be costed in quantities not fully compatible with that?
>
> Correct, that's what I was aiming for initially - before f9f69dd that was the case,
> "unsupported" addressing modes had higher complexities.
> Also, that's what I meant by "unsupported" as well, thanks.
>
> > That said, "complexity" seems to only complicate things :/  We do have the
> > tie-breaker on preferring less IVs.  complexity was added in
> > r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
>
> I agree that the complexity part is just (kind of) out there, not really strongly
> defined. I'm not sure how to feel about merging complexity into the cost part
> of an address cost, though.
>
> > If it's really only about the "complexity" value then each
> > compensation step should
> > add to the complexity?
>
> That could be the way to go. Also worth verifying is that we compensate for
> each case of an unsupported addressing mode.

Yes.  Also given complexity is only a tie-breaker we should cost the
compensation
somehow, but then complexity doesn't look necessary ...

Meh.

>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, December 16, 2022 10:58 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Richard,
> >
> > Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
> >
> > > I'm not sure this is accurate but at least the cost of using an unsupported
> > > addressing mode should be at least that of the compensating code to
> > > mangle it to a supported form.
> >
> > I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> > the target architecture. It does, however, adjust the cost for a subset of those.
> > The adjustment code modifies only the cost part of the address cost (which
> > consists of a cost and a complexity).
> > Having said this, I'd propose two approaches:
> >     1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
> >         sure they aren't already covered), leaving complexity for unsupported
> >         addressing modes zero.
>
> The only documentation on complexity I find is
>
>   int64_t cost;         /* The runtime cost.  */
>   unsigned complexity;  /* The estimate of the complexity of the code for
>                            the computation (in no concrete units --
>                            complexity field should be larger for more
>                            complex expressions and addressing modes).  */
>
> and complexity is used as tie-breaker only when cost is equal.  Given that
> shouldn't unsupported addressing modes have higher complexity?  I'll note
> that there's nothing "unsupported", each "unsupported" address computation
> is lowered into supported pieces.  "unsupported" maybe means that
> "cost" isn't fully covered by address-cost and compensation stmts might
> be costed in quantities not fully compatible with that?
>
> That said, "complexity" seems to only complicate things :/  We do have the
> tie-breaker on prefering less IVs.  complexity was added in
> r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.
>
> >     2. Revert the complexity calculation (which my initial patch does), leaving
> >         everything else as it is.
> >     3. A combination of both - if the control path gets into the adjustment code, we
> >         use the reverted complexity calculation.
>
> If it's really only about the "complexity" value then each
> compensation step should
> add to the complexity?
>
> > I'd love to get feedback regarding this, so I could focus on a concrete approach.
> >
> > Kind regards,
> > Dimitrije
> >
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Monday, November 7, 2022 2:35 PM
> > To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> > On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >
> > > Hi Jeff,
> > >
> > > > This is exactly what I was trying to get to.   If the addressing mode
> > > > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > > > is, then we've probably got a problem somewhere else in this code and
> > > > this patch is likely papering over it.
> >
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
> >
> > > I'll take a deeper look into the candidate selection algorithm then. Will
> > > get back to you.
> >
> > Thanks - as said the unfortunate situation is that both the original author and
> > the one who did the last bigger reworks of the code are gone.
> >
> > Richard.
> >
> > > Regards,
> > > Dimitrije
> > >
> > > ________________________________________
> > > From: Jeff Law <jeffreyalaw@gmail.com>
> > > Sent: Tuesday, November 1, 2022 7:46 PM
> > > To: Richard Biener; Dimitrije Milosevic
> > > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> > >
> > >
> > > On 10/28/22 01:00, Richard Biener wrote:
> > > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > > >> Hi Jeff,
> > > >>
> > > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > > >>> preventing the calculation of more complex addressing modes matter?  ie,
> > > >>> what's the point of computing the cost of something like base + off +
> > > >>> scaled index when the target can't utilize it?
> > > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > > >> the lack of other addressing modes somehow. If complexities for both of
> > > >> those are equal to 0, in cases where complexities decide which candidate is
> > > >> to be chosen, a more complex candidate may be picked.
> > > > But something is wrong then - it shouldn't ever pick a candidate with
> > > > an addressing
> > > > mode that isn't supported?  So you say that the cost of expressing
> > > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > > accurately?
> > >
> > > This is exactly what I was trying to get to.   If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
> > >
> > >
> > > Jeff
> > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-12-16  9:58                 ` Richard Biener
@ 2022-12-16 11:37                   ` Dimitrije Milosevic
  2022-12-16 11:58                     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-12-16 11:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic


Hi Richard,

> The only documentation on complexity I find is
>
>   int64_t cost;         /* The runtime cost.  */
>   unsigned complexity;  /* The estimate of the complexity of the code for
>                            the computation (in no concrete units --
>                            complexity field should be larger for more
>                            complex expressions and addressing modes).  */
>
> and complexity is used as tie-breaker only when cost is equal.  Given that
> shouldn't unsupported addressing modes have higher complexity?  I'll note
> that there's nothing "unsupported", each "unsupported" address computation
> is lowered into supported pieces.  "unsupported" maybe means that
> "cost" isn't fully covered by address-cost and compensation stmts might
> be costed in quantities not fully compatible with that?

Correct, that's what I was aiming for initially - before f9f69dd that was the case,
"unsupported" addressing modes had higher complexities.
Also, that's what I meant by "unsupported" as well, thanks.

> That said, "complexity" seems to only complicate things :/  We do have the
> tie-breaker on preferring less IVs.  complexity was added in
> r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.

I agree that the complexity part is just (kind of) out there, not really strongly
defined. I'm not sure how to feel about merging complexity into the cost part
of an address cost, though.

> If it's really only about the "complexity" value then each
> compensation step should
> add to the complexity?

That could be the way to go. Also worth verifying is that we compensate for
each case of an unsupported addressing mode.

Kind regards,
Dimitrije

From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, December 16, 2022 10:58 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
>
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
>
> I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> the target architecture. It does, however, adjust the cost for a subset of those.
> The adjustment code modifies only the cost part of the address cost (which
> consists of a cost and a complexity).
> Having said this, I'd propose two approaches:
>     1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
>         sure they aren't already covered), leaving complexity for unsupported
>         addressing modes zero.

The only documentation on complexity I find is

  int64_t cost;         /* The runtime cost.  */
  unsigned complexity;  /* The estimate of the complexity of the code for
                           the computation (in no concrete units --
                           complexity field should be larger for more
                           complex expressions and addressing modes).  */

and complexity is used as tie-breaker only when cost is equal.  Given that
shouldn't unsupported addressing modes have higher complexity?  I'll note
that there's nothing "unsupported", each "unsupported" address computation
is lowered into supported pieces.  "unsupported" maybe means that
"cost" isn't fully covered by address-cost and compensation stmts might
be costed in quantities not fully compatible with that?

That said, "complexity" seems to only complicate things :/  We do have the
tie-breaker on prefering less IVs.  complexity was added in
r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.

>     2. Revert the complexity calculation (which my initial patch does), leaving
>         everything else as it is.
>     3. A combination of both - if the control path gets into the adjustment code, we
>         use the reverted complexity calculation.

If it's really only about the "complexity" value then each
compensation step should
add to the complexity?

> I'd love to get feedback regarding this, so I could focus on a concrete approach.
>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, November 7, 2022 2:35 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Jeff,
> >
> > > This is exactly what I was trying to get to.   If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
>
> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.
>
> > I'll take a deeper look into the candidate selection algorithm then. Will
> > get back to you.
>
> Thanks - as said the unfortunate situation is that both the original author and
> the one who did the last bigger reworks of the code are gone.
>
> Richard.
>
> > Regards,
> > Dimitrije
> >
> > ________________________________________
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Tuesday, November 1, 2022 7:46 PM
> > To: Richard Biener; Dimitrije Milosevic
> > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> >
> > On 10/28/22 01:00, Richard Biener wrote:
> > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >> Hi Jeff,
> > >>
> > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > >>> preventing the calculation of more complex addressing modes matter?  ie,
> > >>> what's the point of computing the cost of something like base + off +
> > >>> scaled index when the target can't utilize it?
> > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > >> the lack of other addressing modes somehow. If complexities for both of
> > >> those are equal to 0, in cases where complexities decide which candidate is
> > >> to be chosen, a more complex candidate may be picked.
> > > But something is wrong then - it shouldn't ever pick a candidate with
> > > an addressing
> > > mode that isn't supported?  So you say that the cost of expressing
> > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > accurately?
> >
> > This is exactly what I was trying to get to.   If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
> >
> >
> > Jeff
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-12-15 15:26               ` Dimitrije Milosevic
@ 2022-12-16  9:58                 ` Richard Biener
  2022-12-16 11:37                   ` Dimitrije Milosevic
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-12-16  9:58 UTC (permalink / raw)
  To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

On Thu, Dec 15, 2022 at 4:26 PM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Richard,
>
> Sorry for the delayed response, I couldn't find the time to fully focus on this topic.
>
> > I'm not sure this is accurate but at least the cost of using an unsupported
> > addressing mode should be at least that of the compensating code to
> > mangle it to a supported form.
>
> I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
> the target architecture. It does, however, adjust the cost for a subset of those.
> The adjustment code modifies only the cost part of the address cost (which
> consists of a cost and a complexity).
> Having said this, I'd propose two approaches:
>     1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
>         sure they aren't already covered), leaving complexity for unsupported
>         addressing modes zero.

The only documentation on complexity I find is

  int64_t cost;         /* The runtime cost.  */
  unsigned complexity;  /* The estimate of the complexity of the code for
                           the computation (in no concrete units --
                           complexity field should be larger for more
                           complex expressions and addressing modes).  */

and complexity is used as tie-breaker only when cost is equal.  Given that
shouldn't unsupported addressing modes have higher complexity?  I'll note
that there's nothing "unsupported", each "unsupported" address computation
is lowered into supported pieces.  "unsupported" maybe means that
"cost" isn't fully covered by address-cost and compensation stmts might
be costed in quantities not fully compatible with that?

That said, "complexity" seems to only complicate things :/  We do have the
tie-breaker on prefering less IVs.  complexity was added in
r0-85562-g6e8c65f6621fb0 as part of fixing PR34711.

>     2. Revert the complexity calculation (which my initial patch does), leaving
>         everything else as it is.
>     3. A combination of both - if the control path gets into the adjustment code, we
>         use the reverted complexity calculation.

If it's really only about the "complexity" value then each
compensation step should
add to the complexity?

> I'd love to get feedback regarding this, so I could focus on a concrete approach.
>
> Kind regards,
> Dimitrije
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, November 7, 2022 2:35 PM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
> On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
> >
> > Hi Jeff,
> >
> > > This is exactly what I was trying to get to.   If the addressing mode
> > > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > > is, then we've probably got a problem somewhere else in this code and
> > > this patch is likely papering over it.
>
> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.
>
> > I'll take a deeper look into the candidate selection algorithm then. Will
> > get back to you.
>
> Thanks - as said the unfortunate situation is that both the original author and
> the one who did the last bigger reworks of the code are gone.
>
> Richard.
>
> > Regards,
> > Dimitrije
> >
> > ________________________________________
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Tuesday, November 1, 2022 7:46 PM
> > To: Richard Biener; Dimitrije Milosevic
> > Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> > Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
> >
> >
> > On 10/28/22 01:00, Richard Biener wrote:
> > > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > > <Dimitrije.Milosevic@syrmia.com> wrote:
> > >> Hi Jeff,
> > >>
> > >>> THe part I don't understand is, if you only have BASE+OFF, why does
> > >>> preventing the calculation of more complex addressing modes matter?  ie,
> > >>> what's the point of computing the cost of something like base + off +
> > >>> scaled index when the target can't utilize it?
> > >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> > >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> > >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> > >> than a candidate with BASE + INDEX, for example, as it has to compensate
> > >> the lack of other addressing modes somehow. If complexities for both of
> > >> those are equal to 0, in cases where complexities decide which candidate is
> > >> to be chosen, a more complex candidate may be picked.
> > > But something is wrong then - it shouldn't ever pick a candidate with
> > > an addressing
> > > mode that isn't supported?  So you say that the cost of expressing
> > > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > > accurately?
> >
> > This is exactly what I was trying to get to.   If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.
> >
> >
> > Jeff
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-11-07 13:35             ` Richard Biener
@ 2022-12-15 15:26               ` Dimitrije Milosevic
  2022-12-16  9:58                 ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-12-15 15:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

Hi Richard,

Sorry for the delayed response, I couldn't find the time to fully focus on this topic.

> I'm not sure this is accurate but at least the cost of using an unsupported
> addressing mode should be at least that of the compensating code to
> mangle it to a supported form.

I'm pretty sure IVOPTS does not filter out candidates which aren't supported by
the target architecture. It does, however, adjust the cost for a subset of those.
The adjustment code modifies only the cost part of the address cost (which
consists of a cost and a complexity).
Having said this, I'd propose two approaches:
    1. Cover all cases of unsupported addressing modes (if needed, I'm not entirely
        sure they aren't already covered), leaving complexity for unsupported
        addressing modes zero.
    2. Revert the complexity calculation (which my initial patch does), leaving
        everything else as it is.
    3. A combination of both - if the control path gets into the adjustment code, we
        use the reverted complexity calculation.
I'd love to get feedback regarding this, so I could focus on a concrete approach.

Kind regards,
Dimitrije

From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, November 7, 2022 2:35 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > This is exactly what I was trying to get to.   If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.

I'm not sure this is accurate but at least the cost of using an unsupported
addressing mode should be at least that of the compensating code to
mangle it to a supported form.

> I'll take a deeper look into the candidate selection algorithm then. Will
> get back to you.

Thanks - as said the unfortunate situation is that both the original author and
the one who did the last bigger reworks of the code are gone.

Richard.

> Regards,
> Dimitrije
>
> ________________________________________
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Tuesday, November 1, 2022 7:46 PM
> To: Richard Biener; Dimitrije Milosevic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/28/22 01:00, Richard Biener wrote:
> > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> >> Hi Jeff,
> >>
> >>> THe part I don't understand is, if you only have BASE+OFF, why does
> >>> preventing the calculation of more complex addressing modes matter?  ie,
> >>> what's the point of computing the cost of something like base + off +
> >>> scaled index when the target can't utilize it?
> >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> >> than a candidate with BASE + INDEX, for example, as it has to compensate
> >> the lack of other addressing modes somehow. If complexities for both of
> >> those are equal to 0, in cases where complexities decide which candidate is
> >> to be chosen, a more complex candidate may be picked.
> > But something is wrong then - it shouldn't ever pick a candidate with
> > an addressing
> > mode that isn't supported?  So you say that the cost of expressing
> > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > accurately?
>
> This is exactly what I was trying to get to.   If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate.  If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.
>
>
> Jeff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-11-02  8:40           ` Dimitrije Milosevic
@ 2022-11-07 13:35             ` Richard Biener
  2022-12-15 15:26               ` Dimitrije Milosevic
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-11-07 13:35 UTC (permalink / raw)
  To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

On Wed, Nov 2, 2022 at 9:40 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > This is exactly what I was trying to get to.   If the addressing mode
> > isn't supported, then we shouldn't be picking it as a candidate.  If it
> > is, then we've probably got a problem somewhere else in this code and
> > this patch is likely papering over it.

I'm not sure this is accurate but at least the cost of using an unsupported
addressing mode should be at least that of the compensating code to
mangle it to a supported form.

> I'll take a deeper look into the candidate selection algorithm then. Will
> get back to you.

Thanks - as said the unfortunate situation is that both the original author and
the one who did the last bigger reworks of the code are gone.

Richard.

> Regards,
> Dimitrije
>
> ________________________________________
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Tuesday, November 1, 2022 7:46 PM
> To: Richard Biener; Dimitrije Milosevic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/28/22 01:00, Richard Biener wrote:
> > On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> > <Dimitrije.Milosevic@syrmia.com> wrote:
> >> Hi Jeff,
> >>
> >>> THe part I don't understand is, if you only have BASE+OFF, why does
> >>> preventing the calculation of more complex addressing modes matter?  ie,
> >>> what's the point of computing the cost of something like base + off +
> >>> scaled index when the target can't utilize it?
> >> Well, the complexities of all addressing modes other than BASE + OFFSET are
> >> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> >> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> >> than a candidate with BASE + INDEX, for example, as it has to compensate
> >> the lack of other addressing modes somehow. If complexities for both of
> >> those are equal to 0, in cases where complexities decide which candidate is
> >> to be chosen, a more complex candidate may be picked.
> > But something is wrong then - it shouldn't ever pick a candidate with
> > an addressing
> > mode that isn't supported?  So you say that the cost of expressing
> > 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> > accurately?
>
> This is exactly what I was trying to get to.   If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate.  If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.
>
>
> Jeff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-11-01 18:46         ` Jeff Law
@ 2022-11-02  8:40           ` Dimitrije Milosevic
  2022-11-07 13:35             ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-11-02  8:40 UTC (permalink / raw)
  To: Jeff Law, Richard Biener; +Cc: gcc-patches, Djordje Todorovic

Hi Jeff,

> This is exactly what I was trying to get to.   If the addressing mode
> isn't supported, then we shouldn't be picking it as a candidate.  If it
> is, then we've probably got a problem somewhere else in this code and
> this patch is likely papering over it.

I'll take a deeper look into the candidate selection algorithm then. Will
get back to you.

Regards,
Dimitrije

________________________________________
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Tuesday, November 1, 2022 7:46 PM
To: Richard Biener; Dimitrije Milosevic
Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.


On 10/28/22 01:00, Richard Biener wrote:
> On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
>> Hi Jeff,
>>
>>> THe part I don't understand is, if you only have BASE+OFF, why does
>>> preventing the calculation of more complex addressing modes matter?  ie,
>>> what's the point of computing the cost of something like base + off +
>>> scaled index when the target can't utilize it?
>> Well, the complexities of all addressing modes other than BASE + OFFSET are
>> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
>> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
>> than a candidate with BASE + INDEX, for example, as it has to compensate
>> the lack of other addressing modes somehow. If complexities for both of
>> those are equal to 0, in cases where complexities decide which candidate is
>> to be chosen, a more complex candidate may be picked.
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?  So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?

This is exactly what I was trying to get to.   If the addressing mode
isn't supported, then we shouldn't be picking it as a candidate.  If it
is, then we've probably got a problem somewhere else in this code and
this patch is likely papering over it.


Jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-28  7:00       ` Richard Biener
  2022-10-28 13:39         ` Dimitrije Milosevic
@ 2022-11-01 18:46         ` Jeff Law
  2022-11-02  8:40           ` Dimitrije Milosevic
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2022-11-01 18:46 UTC (permalink / raw)
  To: Richard Biener, Dimitrije Milosevic; +Cc: gcc-patches, Djordje Todorovic


On 10/28/22 01:00, Richard Biener wrote:
> On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
> <Dimitrije.Milosevic@syrmia.com> wrote:
>> Hi Jeff,
>>
>>> THe part I don't understand is, if you only have BASE+OFF, why does
>>> preventing the calculation of more complex addressing modes matter?  ie,
>>> what's the point of computing the cost of something like base + off +
>>> scaled index when the target can't utilize it?
>> Well, the complexities of all addressing modes other than BASE + OFFSET are
>> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
>> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
>> than a candidate with BASE + INDEX, for example, as it has to compensate
>> the lack of other addressing modes somehow. If complexities for both of
>> those are equal to 0, in cases where complexities decide which candidate is
>> to be chosen, a more complex candidate may be picked.
> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?  So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?

This is exactly what I was trying to get to.   If the addressing mode 
isn't supported, then we shouldn't be picking it as a candidate.  If it 
is, then we've probably got a problem somewhere else in this code and 
this patch is likely papering over it.


Jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-28  7:00       ` Richard Biener
@ 2022-10-28 13:39         ` Dimitrije Milosevic
  2022-11-01 18:46         ` Jeff Law
  1 sibling, 0 replies; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-10-28 13:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

Hi Richard,

> But something is wrong then - it shouldn't ever pick a candidate with
> an addressing
> mode that isn't supported?

Test case I presented in [0] only has non-"BASE + OFFSET" candidates. Correct me
if I'm wrong, but the candidate selection algorithm doesn't take into account
which addressing modes are supported by the target?

> So you say that the cost of expressing
> 'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
> accurately?

I just took it as an example, but yes.

> The function tries to compensate for that, maybe you can point out
> where it goes wrong?
> That is, at the end it adjusts cost and complexity based on what it
> scrapped before, maybe
> that is just a bit incomplete?

I think the cost.cost part is mostly okay, as the costs are just scaled 
(what was lesser/higher before f9f69dd is lesser/higher after f9f69dd).
As far as the adjustments go, I don't think they are complete. 
On the other hand, as complexity is a valid part of address costs, and
it can be used as a tie-breaker, I feel like it should serve a purpose, 
even for targets like Mips which are limited when it comes to 
addressing modes, rather than being equal to 0.

I guess an alternative would be to fully cover cost.cost adjustments, and
leave the complexities to be 0 for non-supported addressing modes.
Currently, they are implemented as follows:

  if (simple_inv)
    simple_inv = (aff_inv == NULL
		  || aff_combination_const_p (aff_inv)
		  || aff_combination_singleton_var_p (aff_inv));
  if (!aff_combination_zero_p (aff_inv))
    comp_inv = aff_combination_to_tree (aff_inv);
  if (comp_inv != NULL_TREE)
    cost = force_var_cost (data, comp_inv, inv_vars);
  if (ratio != 1 && parts.step == NULL_TREE)
    var_cost += mult_by_coeff_cost (ratio, addr_mode, speed);
  if (comp_inv != NULL_TREE && parts.index == NULL_TREE)
    var_cost += add_cost (speed, addr_mode);

> Note the original author of this is not available so it would help
> (maybe also yourself) to
> walk through the function with a specific candidate / use where you
> think the complexity
> (or cost) is wrong?

I'd like to refer to [0] where candidate costs didn't get adjusted to 
cover the lack of complexity calculation.

Would love to hear your thoughts.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604304.html

Regards,
Dimitrije

From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, October 28, 2022 9:00 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter?  ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.

But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported?  So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?

The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?

Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?


> Regards,
> Dimitrije
>
>
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> >        * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> >        to non-static.
> >        * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> >        * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> >        (compute_min_and_max_offset): Likewise.
> >        (get_address_cost): Revert
> >        complexity calculation.
>
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter?  ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
>
>
> jeff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-28  6:43     ` Dimitrije Milosevic
@ 2022-10-28  7:00       ` Richard Biener
  2022-10-28 13:39         ` Dimitrije Milosevic
  2022-11-01 18:46         ` Jeff Law
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2022-10-28  7:00 UTC (permalink / raw)
  To: Dimitrije Milosevic; +Cc: Jeff Law, gcc-patches, Djordje Todorovic

On Fri, Oct 28, 2022 at 8:43 AM Dimitrije Milosevic
<Dimitrije.Milosevic@syrmia.com> wrote:
>
> Hi Jeff,
>
> > THe part I don't understand is, if you only have BASE+OFF, why does
> > preventing the calculation of more complex addressing modes matter?  ie,
> > what's the point of computing the cost of something like base + off +
> > scaled index when the target can't utilize it?
>
> Well, the complexities of all addressing modes other than BASE + OFFSET are
> equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
> be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
> than a candidate with BASE + INDEX, for example, as it has to compensate
> the lack of other addressing modes somehow. If complexities for both of
> those are equal to 0, in cases where complexities decide which candidate is
> to be chosen, a more complex candidate may be picked.

But something is wrong then - it shouldn't ever pick a candidate with
an addressing
mode that isn't supported?  So you say that the cost of expressing
'BASE + INDEX << SCALE + OFFSET' as 'BASE + OFFSET' is not computed
accurately?

The function tries to compensate for that, maybe you can point out
where it goes wrong?
That is, at the end it adjusts cost and complexity based on what it
scrapped before, maybe
that is just a bit incomplete?

Note the original author of this is not available so it would help
(maybe also yourself) to
walk through the function with a specific candidate / use where you
think the complexity
(or cost) is wrong?


> Regards,
> Dimitrije
>
>
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Friday, October 28, 2022 1:02 AM
> To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
> Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
>
>
> On 10/21/22 07:52, Dimitrije Milosevic wrote:
> > From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
> >
> > This patch reverts the computation of address cost complexity
> > to the legacy one. After f9f69dd, complexity is calculated
> > using the valid_mem_ref_p target hook. Architectures like
> > Mips only allow BASE + OFFSET addressing modes, which in turn
> > prevents the calculation of complexity for other addressing
> > modes, resulting in non-optimal candidate selection.
> >
> > gcc/ChangeLog:
> >
> >        * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> >        to non-static.
> >        * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> >        * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> >        (compute_min_and_max_offset): Likewise.
> >        (get_address_cost): Revert
> >        complexity calculation.
>
> THe part I don't understand is, if you only have BASE+OFF, why does
> preventing the calculation of more complex addressing modes matter?  ie,
> what's the point of computing the cost of something like base + off +
> scaled index when the target can't utilize it?
>
>
> jeff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-27 23:02   ` Jeff Law
@ 2022-10-28  6:43     ` Dimitrije Milosevic
  2022-10-28  7:00       ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-10-28  6:43 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: Djordje Todorovic, richard.guenther

Hi Jeff,

> THe part I don't understand is, if you only have BASE+OFF, why does 
> preventing the calculation of more complex addressing modes matter?  ie, 
> what's the point of computing the cost of something like base + off + 
> scaled index when the target can't utilize it?

Well, the complexities of all addressing modes other than BASE + OFFSET are
equal to 0. For targets like Mips, which only has BASE + OFFSET, it would still
be more complex to use a candidate with BASE + INDEX << SCALE + OFFSET
than a candidate with BASE + INDEX, for example, as it has to compensate
the lack of other addressing modes somehow. If complexities for both of
those are equal to 0, in cases where complexities decide which candidate is
to be chosen, a more complex candidate may be picked.

Regards,
Dimitrije


From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Friday, October 28, 2022 1:02 AM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Cc: Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 

On 10/21/22 07:52, Dimitrije Milosevic wrote:
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
>
> gcc/ChangeLog:
>
>        * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
>        to non-static.
>        * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
>        * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
>        (compute_min_and_max_offset): Likewise.
>        (get_address_cost): Revert
>        complexity calculation.

THe part I don't understand is, if you only have BASE+OFF, why does 
preventing the calculation of more complex addressing modes matter?  ie, 
what's the point of computing the cost of something like base + off + 
scaled index when the target can't utilize it?


jeff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
  2022-10-25 11:08   ` Richard Biener
@ 2022-10-27 23:02   ` Jeff Law
  2022-10-28  6:43     ` Dimitrije Milosevic
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2022-10-27 23:02 UTC (permalink / raw)
  To: Dimitrije Milosevic, gcc-patches; +Cc: djordje.todorovic


On 10/21/22 07:52, Dimitrije Milosevic wrote:
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.
>
> gcc/ChangeLog:
>
> 	* tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
> 	to non-static.
> 	* tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
> 	* tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
> 	(compute_min_and_max_offset): Likewise.
> 	(get_address_cost): Revert
> 	complexity calculation.

THe part I don't understand is, if you only have BASE+OFF, why does 
preventing the calculation of more complex addressing modes matter?  ie, 
what's the point of computing the cost of something like base + off + 
scaled index when the target can't utilize it?


jeff



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-25 11:08   ` Richard Biener
@ 2022-10-25 13:00     ` Dimitrije Milosevic
  0 siblings, 0 replies; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-10-25 13:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Djordje Todorovic

Hi Richard,

> I don't follow how only having BASE + OFFSET addressing prevents
> calculation of complexity for other addressing modes?  Can you explain?

It's the valid_mem_ref_p target hook that prevents complexity calculation
for other addressing modes (for Mips and RISC-V).
Here's the snippet of the algorithm (after f9f69dd) for the complexity 
calculation, which is located at the beginning of the get_address_cost
function in tree-ssa-loop-ivopts.cc:

  if (!aff_combination_const_p (aff_inv))
    {
      parts.index = integer_one_node;
      /* Addressing mode "base + index".  */
      ok_without_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
      if (ratio != 1)
	{
	  parts.step = wide_int_to_tree (type, ratio);
	  /* Addressing mode "base + index << scale".  */
	  ok_with_ratio_p = valid_mem_ref_p (mem_mode, as, &parts);
	  if (!ok_with_ratio_p)
	    parts.step = NULL_TREE;
	}
      ...

The algorithm "builds up" the actual addressing mode step-by-step,
starting from BASE + INDEX. However, if valid_mem_ref_p returns
negative, parts.* is set to NULL_TREE and we bail out. For Mips (or 
RISC-V), it always returns negative (given we are building the addressing
mode up from BASE + INDEX), since Mips allows BASE + OFFSET only
(see the case PLUS in mips_classify_address in config/mips/mips.cc).
The result is that all addressing modes besides BASE + OFFSET, for Mips
(or RISC-V) have complexities of 0. f9f69dd introduced calls to valid_mem_ref_p
target hook, which were not there before, and I'm not sure why exactly.

> Do you have a testcase that shows how both changes improve IV selection
> for MIPS?

Certainly, consider the following test case:

void daxpy(float *vector1, float *vector2, int n, float fp_const)
{
	for (int i = 0; i < n; ++i)
		vector1[i] += fp_const * vector2[i];
}

void dgefa(float *vector, int m, int n, int l)
{
	for (int i = 0; i < n - 1; ++i)
	{
		for (int j = i + 1; j < n; ++j)
		{
			float t = vector[m * j + l];
			daxpy(&vector[m * i + i + 1], &vector[m * j + i + 1], n - (i + 1), t);
		}
	}
}

At the third inner loop (which gets inlined from daxpy), an unoptimal candidate
selection takes place.
Worth noting is that f9f69dd doesn't change the costs (they are, however, multiplied by
a factor, but what was lesser/greater before is lesser/greater after). Here's how complexities
stand:

===== Before f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.	inv.vars
  1	13	1	3;	NIL;
  2	13	2	4;	NIL;
  4	9	1	5;	NIL;
  5	1	0	NIL;	NIL;
  7	9	1	3;	NIL;
===== Before f9f69dd =====
===== After f9f69dd =====
Group 1:
  cand	cost	compl.	inv.expr.	inv.vars
  1	10	0	4;	NIL;
  2	10	0	5;	NIL;
  4	6	0	6;	NIL;
  5	1	0	NIL;	NIL;
  7	6	0	4;	NIL;
===== After f9f69dd =====

Notice how all complexities are zero, even though the candidates have
different addressing modes.

For this particular example, this leads to a different candidate selection:

===== Before f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 2 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
Candidate 5:
  Var befor: ivtmp.18_99
  Var after: ivtmp.18_98
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _14)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== Before f9f69dd =====
===== After f9f69dd =====
Selected IV set for loop 3 at fp_foo.c:3, 10 avg niters, 1 IVs:
Candidate 4:
  Var befor: ivtmp.17_52
  Var after: ivtmp.17_103
  Incr POS: before exit test
  IV struct:
    Type:	unsigned long
    Base:	(unsigned long) (vector_27(D) + _10)
    Step:	4
    Object:	(void *) vector_27(D)
    Biv:	N
    Overflowness wrto loop niter:	Overflow
===== After f9f69dd =====

which, in turn, leads to the following assembly sequence:

===== Before f9f69dd =====
.L83:
	lwc1	$f5,0($3)
	lwc1	$f8,0($2)
	lwc1	$f7,4($2)
	lwc1	$f6,8($2)
	lwc1	$f9,12($2)
	lwc1	$f10,16($2)
	maddf.s	$f8,$f0,$f5
	lwc1	$f11,20($2)
	lwc1	$f12,24($2)
	lwc1	$f13,28($2)
	ld	$12,72($sp)
	swc1	$f8,0($2)
	lwc1	$f14,4($3)
	maddf.s	$f7,$f0,$f14
	swc1	$f7,4($2)
	lwc1	$f15,8($3)
	maddf.s	$f6,$f0,$f15
	swc1	$f6,8($2)
	lwc1	$f16,12($3)
	maddf.s	$f9,$f0,$f16
	swc1	$f9,12($2)
	lwc1	$f17,16($3)
	maddf.s	$f10,$f0,$f17
	swc1	$f10,16($2)
	lwc1	$f18,20($3)
	maddf.s	$f11,$f0,$f18
	swc1	$f11,20($2)
	lwc1	$f19,24($3)
	maddf.s	$f12,$f0,$f19
	swc1	$f12,24($2)
	lwc1	$f20,28($3)
	maddf.s	$f13,$f0,$f20
	swc1	$f13,28($2)
	daddiu	$2,$2,32
	bne	$2,$12,.L83
	daddiu	$3,$3,32
        ...
===== Before f9f69dd =====
===== After f9f69dd =====
.L93:
	dsubu	$18,$2,$4
	lwc1	$f13,0($2)
	daddu	$19,$18,$5
	daddiu	$16,$2,4
	lwc1	$f14,0($19)
	dsubu	$17,$16,$4
	daddu	$25,$17,$5
	lwc1	$f15,4($2)
	daddiu	$19,$2,12
	daddiu	$20,$2,8
	maddf.s	$f13,$f1,$f14
	dsubu	$16,$19,$4
	daddiu	$17,$2,16
	dsubu	$18,$20,$4
	daddu	$19,$16,$5
	daddiu	$16,$2,20
	lwc1	$f10,8($2)
	daddu	$20,$18,$5
	lwc1	$f16,12($2)
	dsubu	$18,$17,$4
	lwc1	$f17,16($2)
	dsubu	$17,$16,$4
	lwc1	$f18,20($2)
	daddiu	$16,$2,24
	lwc1	$f20,24($2)
	daddu	$18,$18,$5
	swc1	$f13,0($2)
	daddu	$17,$17,$5
	lwc1	$f19,0($25)
	daddiu	$25,$2,28
	lwc1	$f11,28($2)
	daddiu	$2,$2,32
	dsubu	$16,$16,$4
	dsubu	$25,$25,$4
	maddf.s	$f15,$f1,$f19
	daddu	$16,$16,$5
	daddu	$25,$25,$5
	swc1	$f15,-28($2)
	lwc1	$f21,0($20)
	ld	$20,48($sp)
	maddf.s	$f10,$f1,$f21
	swc1	$f10,-24($2)
	lwc1	$f22,0($19)
	maddf.s	$f16,$f1,$f22
	swc1	$f16,-20($2)
	lwc1	$f23,0($18)
	maddf.s	$f17,$f1,$f23
	swc1	$f17,-16($2)
	lwc1	$f0,0($17)
	maddf.s	$f18,$f1,$f0
	swc1	$f18,-12($2)
	lwc1	$f7,0($16)
	maddf.s	$f20,$f1,$f7
	swc1	$f20,-8($2)
	lwc1	$f12,0($25)
	maddf.s	$f11,$f1,$f12
	bne	$2,$20,.L93
	swc1	$f11,-4($2)
        ...
===== After f9f69dd =====

Notice the additional instructions used for index calculation, due to
unoptimal candidate selection.

Regards,
Dimitrije

From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, October 25, 2022 1:08 PM
To: Dimitrije Milosevic <Dimitrije.Milosevic@Syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>
Subject: Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity. 
 
On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.

I don't follow how only having BASE + OFFSET addressing prevents
calculation of complexity for other addressing modes?  Can you explain?

Do you have a testcase that shows how both changes improve IV selection
for MIPS?

>
> gcc/ChangeLog:
>
>         * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
>         to non-static.
>         * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
>         * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
>         (compute_min_and_max_offset): Likewise.
>         (get_address_cost): Revert
>         complexity calculation.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
>  gcc/tree-ssa-address.cc     |   2 +-
>  gcc/tree-ssa-address.h      |   2 +
>  gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
>  3 files changed, 207 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
> index ba7b7c93162..442f54f0165 100644
> --- a/gcc/tree-ssa-address.cc
> +++ b/gcc/tree-ssa-address.cc
> @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
>     validity for a memory reference accessing memory of mode MODE in address
>     space AS.  */
>
> -static bool
> +bool
>  multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
>                                  addr_space_t as)
>  {
> diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
> index 95143a099b9..09f36ee2f19 100644
> --- a/gcc/tree-ssa-address.h
> +++ b/gcc/tree-ssa-address.h
> @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
>                      class aff_tree *, tree, tree, tree, bool);
>  extern void copy_ref_info (tree, tree);
>  tree maybe_fold_tmr (tree);
> +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> +                                addr_space_t as);
>
>  extern unsigned int preferred_mem_scale_factor (tree base,
>                                                 machine_mode mem_mode,
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index a6f926a68ef..d53ba05a4f6 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
>    return infinite_cost;
>  }
>
> +static void
> +compute_symbol_and_var_present (tree e1, tree e2,
> +       bool *symbol_present, bool *var_present)
> +{
> +  poly_uint64_pod off1, off2;
> +
> +  e1 = strip_offset (e1, &off1);
> +  e2 = strip_offset (e2, &off2);
> +
> +  STRIP_NOPS (e1);
> +  STRIP_NOPS (e2);
> +
> +  if (TREE_CODE (e1) == ADDR_EXPR)
> +    {
> +      poly_int64_pod diff;
> +      if (ptr_difference_const (e1, e2, &diff))
> +  {
> +    *symbol_present = false;
> +    *var_present = false;
> +    return;
> +  }
> +
> +      if (integer_zerop (e2))
> +  {
> +    tree core;
> +    poly_int64_pod bitsize;
> +    poly_int64_pod bitpos;
> +    widest_int mul;
> +    tree toffset;
> +    machine_mode mode;
> +    int unsignedp, reversep, volatilep;
> +
> +    core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
> +      &toffset, &mode, &unsignedp, &reversep, &volatilep);
> +
> +    if (toffset != 0
> +    || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
> +    || reversep
> +    || !VAR_P (core))
> +      {
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +      }
> +
> +    if (TREE_STATIC (core)
> +    || DECL_EXTERNAL (core))
> +      {
> +    *symbol_present = true;
> +    *var_present = false;
> +    return;
> +      }
> +
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +  }
> +
> +      *symbol_present = false;
> +      *var_present = true;
> +    }
> +  *symbol_present = false;
> +
> +  if (operand_equal_p (e1, e2, 0))
> +    {
> +      *var_present = false;
> +      return;
> +    }
> +
> +  *var_present = true;
> +}
> +
> +static void
> +compute_min_and_max_offset (addr_space_t as,
> +       machine_mode mem_mode, poly_int64_pod *min_offset,
> +       poly_int64_pod *max_offset)
> +{
> +  machine_mode address_mode = targetm.addr_space.address_mode (as);
> +  HOST_WIDE_INT i;
> +  poly_int64_pod off, width;
> +  rtx addr;
> +  rtx reg1;
> +
> +  reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
> +
> +  width = GET_MODE_BITSIZE (address_mode) - 1;
> +  if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
> +         width = HOST_BITS_PER_WIDE_INT - 1;
> +  gcc_assert (width.is_constant ());
> +  addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> +
> +  off = 0;
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = -(HOST_WIDE_INT_1U << i);
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +  if (i == -1)
> +    *min_offset = 0;
> +  else
> +    *min_offset = off;
> +  // *min_offset = (i == -1? 0 : off);
> +
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = (HOST_WIDE_INT_1U << i) - 1;
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    /* For some strict-alignment targets, the offset must be naturally
> +      aligned.  Try an aligned offset if mem_mode is not QImode.  */
> +      off = mem_mode != QImode
> +      ? (HOST_WIDE_INT_1U << i)
> +      - (GET_MODE_SIZE (mem_mode))
> +      : 0;
> +      if (known_gt (off, 0))
> +    {
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +    }
> +  if (i == -1)
> +         off = 0;
> +  *max_offset = off;
> +}
> +
>  /* Return cost of computing USE's address expression by using CAND.
>     AFF_INV and AFF_VAR represent invariant and variant parts of the
>     address expression, respectively.  If AFF_INV is simple, store
> @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    /* Only true if ratio != 1.  */
>    bool ok_with_ratio_p = false;
>    bool ok_without_ratio_p = false;
> +  tree ubase = use->iv->base;
> +  tree cbase = cand->iv->base, cstep = cand->iv->step;
> +  tree utype = TREE_TYPE (ubase), ctype;
> +  unsigned HOST_WIDE_INT cstepi;
> +  bool symbol_present = false, var_present = false, stmt_is_after_increment;
> +  poly_int64_pod min_offset, max_offset;
> +  bool offset_p, ratio_p;
>
>    if (!aff_combination_const_p (aff_inv))
>      {
> @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
>    cost += address_cost (addr, mem_mode, as, speed);
>
> -  if (parts.symbol != NULL_TREE)
> -    cost.complexity += 1;
> -  /* Don't increase the complexity of adding a scaled index if it's
> -     the only kind of index that the target allows.  */
> -  if (parts.step != NULL_TREE && ok_without_ratio_p)
> -    cost.complexity += 1;
> -  if (parts.base != NULL_TREE && parts.index != NULL_TREE)
> -    cost.complexity += 1;
> -  if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
> -    cost.complexity += 1;
> +  if (cst_and_fits_in_hwi (cstep))
> +    cstepi = int_cst_value (cstep);
> +  else
> +    cstepi = 0;
> +
> +  STRIP_NOPS (cbase);
> +  ctype = TREE_TYPE (cbase);
> +
> +  stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
> +    use->stmt);
> +
> +  if (cst_and_fits_in_hwi (cbase))
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +  else if (ratio == 1)
> +    {
> +      tree real_cbase = cbase;
> +
> +      /* Check to see if any adjustment is needed.  */
> +      if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
> +       {
> +         aff_tree real_cbase_aff;
> +         aff_tree cstep_aff;
> +
> +         tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
> +                                  &real_cbase_aff);
> +         tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
> +
> +         aff_combination_add (&real_cbase_aff, &cstep_aff);
> +         real_cbase = aff_combination_to_tree (&real_cbase_aff);
> +       }
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else if (!POINTER_TYPE_P (ctype)
> +          && multiplier_allowed_in_address_p
> +               (ratio, mem_mode,
> +                       TYPE_ADDR_SPACE (TREE_TYPE (utype))))
> +    {
> +      tree real_cbase = cbase;
> +
> +      if (cstepi == 0 && stmt_is_after_increment)
> +       {
> +         if (POINTER_TYPE_P (ctype))
> +           real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
> +         else
> +           real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
> +       }
> +      real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
> +                               build_int_cst (ctype, ratio));
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else
> +    {
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +    }
> +
> +  compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
> +  offset_p = maybe_ne (aff_inv->offset, 0)
> +       && known_le (min_offset, aff_inv->offset)
> +       && known_le (aff_inv->offset, max_offset);
> +  ratio_p = (ratio != 1
> +            && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> +
> +  cost.complexity = (symbol_present != 0) + (var_present != 0)
> +       + offset_p + ratio_p;
>
>    return cost;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
@ 2022-10-25 11:08   ` Richard Biener
  2022-10-25 13:00     ` Dimitrije Milosevic
  2022-10-27 23:02   ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2022-10-25 11:08 UTC (permalink / raw)
  To: Dimitrije Milosevic; +Cc: gcc-patches, djordje.todorovic

On Fri, Oct 21, 2022 at 3:56 PM Dimitrije Milosevic
<dimitrije.milosevic@syrmia.com> wrote:
>
> From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
>
> This patch reverts the computation of address cost complexity
> to the legacy one. After f9f69dd, complexity is calculated
> using the valid_mem_ref_p target hook. Architectures like
> Mips only allow BASE + OFFSET addressing modes, which in turn
> prevents the calculation of complexity for other addressing
> modes, resulting in non-optimal candidate selection.

I don't follow how only having BASE + OFFSET addressing prevents
calculation of complexity for other addressing modes?  Can you explain?

Do you have a testcase that shows how both changes improve IV selection
for MIPS?

>
> gcc/ChangeLog:
>
>         * tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
>         to non-static.
>         * tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
>         * tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
>         (compute_min_and_max_offset): Likewise.
>         (get_address_cost): Revert
>         complexity calculation.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
>  gcc/tree-ssa-address.cc     |   2 +-
>  gcc/tree-ssa-address.h      |   2 +
>  gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
>  3 files changed, 207 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
> index ba7b7c93162..442f54f0165 100644
> --- a/gcc/tree-ssa-address.cc
> +++ b/gcc/tree-ssa-address.cc
> @@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
>     validity for a memory reference accessing memory of mode MODE in address
>     space AS.  */
>
> -static bool
> +bool
>  multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
>                                  addr_space_t as)
>  {
> diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
> index 95143a099b9..09f36ee2f19 100644
> --- a/gcc/tree-ssa-address.h
> +++ b/gcc/tree-ssa-address.h
> @@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
>                      class aff_tree *, tree, tree, tree, bool);
>  extern void copy_ref_info (tree, tree);
>  tree maybe_fold_tmr (tree);
> +bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
> +                                addr_space_t as);
>
>  extern unsigned int preferred_mem_scale_factor (tree base,
>                                                 machine_mode mem_mode,
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index a6f926a68ef..d53ba05a4f6 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
>    return infinite_cost;
>  }
>
> +static void
> +compute_symbol_and_var_present (tree e1, tree e2,
> +       bool *symbol_present, bool *var_present)
> +{
> +  poly_uint64_pod off1, off2;
> +
> +  e1 = strip_offset (e1, &off1);
> +  e2 = strip_offset (e2, &off2);
> +
> +  STRIP_NOPS (e1);
> +  STRIP_NOPS (e2);
> +
> +  if (TREE_CODE (e1) == ADDR_EXPR)
> +    {
> +      poly_int64_pod diff;
> +      if (ptr_difference_const (e1, e2, &diff))
> +  {
> +    *symbol_present = false;
> +    *var_present = false;
> +    return;
> +  }
> +
> +      if (integer_zerop (e2))
> +  {
> +    tree core;
> +    poly_int64_pod bitsize;
> +    poly_int64_pod bitpos;
> +    widest_int mul;
> +    tree toffset;
> +    machine_mode mode;
> +    int unsignedp, reversep, volatilep;
> +
> +    core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
> +      &toffset, &mode, &unsignedp, &reversep, &volatilep);
> +
> +    if (toffset != 0
> +    || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
> +    || reversep
> +    || !VAR_P (core))
> +      {
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +      }
> +
> +    if (TREE_STATIC (core)
> +    || DECL_EXTERNAL (core))
> +      {
> +    *symbol_present = true;
> +    *var_present = false;
> +    return;
> +      }
> +
> +    *symbol_present = false;
> +    *var_present = true;
> +    return;
> +  }
> +
> +      *symbol_present = false;
> +      *var_present = true;
> +    }
> +  *symbol_present = false;
> +
> +  if (operand_equal_p (e1, e2, 0))
> +    {
> +      *var_present = false;
> +      return;
> +    }
> +
> +  *var_present = true;
> +}
> +
> +static void
> +compute_min_and_max_offset (addr_space_t as,
> +       machine_mode mem_mode, poly_int64_pod *min_offset,
> +       poly_int64_pod *max_offset)
> +{
> +  machine_mode address_mode = targetm.addr_space.address_mode (as);
> +  HOST_WIDE_INT i;
> +  poly_int64_pod off, width;
> +  rtx addr;
> +  rtx reg1;
> +
> +  reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
> +
> +  width = GET_MODE_BITSIZE (address_mode) - 1;
> +  if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
> +         width = HOST_BITS_PER_WIDE_INT - 1;
> +  gcc_assert (width.is_constant ());
> +  addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> +
> +  off = 0;
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = -(HOST_WIDE_INT_1U << i);
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +  if (i == -1)
> +    *min_offset = 0;
> +  else
> +    *min_offset = off;
> +  // *min_offset = (i == -1? 0 : off);
> +
> +  for (i = width.to_constant (); i >= 0; i--)
> +    {
> +      off = (HOST_WIDE_INT_1U << i) - 1;
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    /* For some strict-alignment targets, the offset must be naturally
> +      aligned.  Try an aligned offset if mem_mode is not QImode.  */
> +      off = mem_mode != QImode
> +      ? (HOST_WIDE_INT_1U << i)
> +      - (GET_MODE_SIZE (mem_mode))
> +      : 0;
> +      if (known_gt (off, 0))
> +    {
> +      XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +      if (memory_address_addr_space_p (mem_mode, addr, as))
> +    break;
> +    }
> +    }
> +  if (i == -1)
> +         off = 0;
> +  *max_offset = off;
> +}
> +
>  /* Return cost of computing USE's address expression by using CAND.
>     AFF_INV and AFF_VAR represent invariant and variant parts of the
>     address expression, respectively.  If AFF_INV is simple, store
> @@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    /* Only true if ratio != 1.  */
>    bool ok_with_ratio_p = false;
>    bool ok_without_ratio_p = false;
> +  tree ubase = use->iv->base;
> +  tree cbase = cand->iv->base, cstep = cand->iv->step;
> +  tree utype = TREE_TYPE (ubase), ctype;
> +  unsigned HOST_WIDE_INT cstepi;
> +  bool symbol_present = false, var_present = false, stmt_is_after_increment;
> +  poly_int64_pod min_offset, max_offset;
> +  bool offset_p, ratio_p;
>
>    if (!aff_combination_const_p (aff_inv))
>      {
> @@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
>    gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
>    cost += address_cost (addr, mem_mode, as, speed);
>
> -  if (parts.symbol != NULL_TREE)
> -    cost.complexity += 1;
> -  /* Don't increase the complexity of adding a scaled index if it's
> -     the only kind of index that the target allows.  */
> -  if (parts.step != NULL_TREE && ok_without_ratio_p)
> -    cost.complexity += 1;
> -  if (parts.base != NULL_TREE && parts.index != NULL_TREE)
> -    cost.complexity += 1;
> -  if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
> -    cost.complexity += 1;
> +  if (cst_and_fits_in_hwi (cstep))
> +    cstepi = int_cst_value (cstep);
> +  else
> +    cstepi = 0;
> +
> +  STRIP_NOPS (cbase);
> +  ctype = TREE_TYPE (cbase);
> +
> +  stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
> +    use->stmt);
> +
> +  if (cst_and_fits_in_hwi (cbase))
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +  else if (ratio == 1)
> +    {
> +      tree real_cbase = cbase;
> +
> +      /* Check to see if any adjustment is needed.  */
> +      if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
> +       {
> +         aff_tree real_cbase_aff;
> +         aff_tree cstep_aff;
> +
> +         tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
> +                                  &real_cbase_aff);
> +         tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
> +
> +         aff_combination_add (&real_cbase_aff, &cstep_aff);
> +         real_cbase = aff_combination_to_tree (&real_cbase_aff);
> +       }
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else if (!POINTER_TYPE_P (ctype)
> +          && multiplier_allowed_in_address_p
> +               (ratio, mem_mode,
> +                       TYPE_ADDR_SPACE (TREE_TYPE (utype))))
> +    {
> +      tree real_cbase = cbase;
> +
> +      if (cstepi == 0 && stmt_is_after_increment)
> +       {
> +         if (POINTER_TYPE_P (ctype))
> +           real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
> +         else
> +           real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
> +       }
> +      real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
> +                               build_int_cst (ctype, ratio));
> +    compute_symbol_and_var_present (ubase, real_cbase,
> +      &symbol_present, &var_present);
> +    }
> +  else
> +    {
> +    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
> +      &symbol_present, &var_present);
> +    }
> +
> +  compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
> +  offset_p = maybe_ne (aff_inv->offset, 0)
> +       && known_le (min_offset, aff_inv->offset)
> +       && known_le (aff_inv->offset, max_offset);
> +  ratio_p = (ratio != 1
> +            && multiplier_allowed_in_address_p (ratio, mem_mode, as));
> +
> +  cost.complexity = (symbol_present != 0) + (var_present != 0)
> +       + offset_p + ratio_p;
>
>    return cost;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] ivopts: Revert computation of address cost complexity.
  2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
@ 2022-10-21 13:52 ` Dimitrije Milosevic
  2022-10-25 11:08   ` Richard Biener
  2022-10-27 23:02   ` Jeff Law
  0 siblings, 2 replies; 17+ messages in thread
From: Dimitrije Milosevic @ 2022-10-21 13:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: djordje.todorovic, Dimitrije Milošević

From: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>

This patch reverts the computation of address cost complexity
to the legacy one. After f9f69dd, complexity is calculated
using the valid_mem_ref_p target hook. Architectures like
Mips only allow BASE + OFFSET addressing modes, which in turn
prevents the calculation of complexity for other addressing
modes, resulting in non-optimal candidate selection.

gcc/ChangeLog:

	* tree-ssa-address.cc (multiplier_allowed_in_address_p): Change
	to non-static.
	* tree-ssa-address.h (multiplier_allowed_in_address_p): Declare.
	* tree-ssa-loop-ivopts.cc (compute_symbol_and_var_present): Reintroduce.
	(compute_min_and_max_offset): Likewise.
	(get_address_cost): Revert
	complexity calculation.

Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
---
 gcc/tree-ssa-address.cc     |   2 +-
 gcc/tree-ssa-address.h      |   2 +
 gcc/tree-ssa-loop-ivopts.cc | 214 ++++++++++++++++++++++++++++++++++--
 3 files changed, 207 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-ssa-address.cc b/gcc/tree-ssa-address.cc
index ba7b7c93162..442f54f0165 100644
--- a/gcc/tree-ssa-address.cc
+++ b/gcc/tree-ssa-address.cc
@@ -561,7 +561,7 @@ add_to_parts (struct mem_address *parts, tree elt)
    validity for a memory reference accessing memory of mode MODE in address
    space AS.  */
 
-static bool
+bool
 multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
 				 addr_space_t as)
 {
diff --git a/gcc/tree-ssa-address.h b/gcc/tree-ssa-address.h
index 95143a099b9..09f36ee2f19 100644
--- a/gcc/tree-ssa-address.h
+++ b/gcc/tree-ssa-address.h
@@ -38,6 +38,8 @@ tree create_mem_ref (gimple_stmt_iterator *, tree,
 		     class aff_tree *, tree, tree, tree, bool);
 extern void copy_ref_info (tree, tree);
 tree maybe_fold_tmr (tree);
+bool multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, machine_mode mode,
+				 addr_space_t as);
 
 extern unsigned int preferred_mem_scale_factor (tree base,
 						machine_mode mem_mode,
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index a6f926a68ef..d53ba05a4f6 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4774,6 +4774,135 @@ get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
   return infinite_cost;
 }
 
+static void
+compute_symbol_and_var_present (tree e1, tree e2,
+       bool *symbol_present, bool *var_present)
+{
+  poly_uint64_pod off1, off2;
+
+  e1 = strip_offset (e1, &off1);
+  e2 = strip_offset (e2, &off2);
+
+  STRIP_NOPS (e1);
+  STRIP_NOPS (e2);
+
+  if (TREE_CODE (e1) == ADDR_EXPR)
+    {
+      poly_int64_pod diff;
+      if (ptr_difference_const (e1, e2, &diff))
+  {
+    *symbol_present = false;
+    *var_present = false;
+    return;
+  }
+
+      if (integer_zerop (e2))
+  {
+    tree core;
+    poly_int64_pod bitsize;
+    poly_int64_pod bitpos;
+    widest_int mul;
+    tree toffset;
+    machine_mode mode;
+    int unsignedp, reversep, volatilep;
+
+    core = get_inner_reference (TREE_OPERAND (e1, 0), &bitsize, &bitpos,
+      &toffset, &mode, &unsignedp, &reversep, &volatilep);
+
+    if (toffset != 0
+    || !constant_multiple_p (bitpos, BITS_PER_UNIT, &mul)
+    || reversep
+    || !VAR_P (core))
+      {
+    *symbol_present = false;
+    *var_present = true;
+    return;
+      }
+
+    if (TREE_STATIC (core)
+    || DECL_EXTERNAL (core))
+      {
+    *symbol_present = true;
+    *var_present = false;
+    return;
+      }
+
+    *symbol_present = false;
+    *var_present = true;
+    return;
+  }
+
+      *symbol_present = false;
+      *var_present = true;
+    }
+  *symbol_present = false;
+
+  if (operand_equal_p (e1, e2, 0))
+    {
+      *var_present = false;
+      return;
+    }
+
+  *var_present = true;
+}
+
+static void
+compute_min_and_max_offset (addr_space_t as,
+       machine_mode mem_mode, poly_int64_pod *min_offset,
+       poly_int64_pod *max_offset)
+{
+  machine_mode address_mode = targetm.addr_space.address_mode (as);
+  HOST_WIDE_INT i;
+  poly_int64_pod off, width;
+  rtx addr;
+  rtx reg1;
+
+  reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
+
+  width = GET_MODE_BITSIZE (address_mode) - 1;
+  if (known_gt (width, HOST_BITS_PER_WIDE_INT - 1))
+	  width = HOST_BITS_PER_WIDE_INT - 1;
+  gcc_assert (width.is_constant ());
+  addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
+
+  off = 0;
+  for (i = width.to_constant (); i >= 0; i--)
+    {
+      off = -(HOST_WIDE_INT_1U << i);
+      XEXP (addr, 1) = gen_int_mode (off, address_mode);
+      if (memory_address_addr_space_p (mem_mode, addr, as))
+    break;
+    }
+  if (i == -1)
+    *min_offset = 0;
+  else
+    *min_offset = off;
+  // *min_offset = (i == -1? 0 : off);
+
+  for (i = width.to_constant (); i >= 0; i--)
+    {
+      off = (HOST_WIDE_INT_1U << i) - 1;
+      XEXP (addr, 1) = gen_int_mode (off, address_mode);
+      if (memory_address_addr_space_p (mem_mode, addr, as))
+    break;
+    /* For some strict-alignment targets, the offset must be naturally
+      aligned.  Try an aligned offset if mem_mode is not QImode.  */
+      off = mem_mode != QImode
+      ? (HOST_WIDE_INT_1U << i)
+      - (GET_MODE_SIZE (mem_mode))
+      : 0;
+      if (known_gt (off, 0))
+    {
+      XEXP (addr, 1) = gen_int_mode (off, address_mode);
+      if (memory_address_addr_space_p (mem_mode, addr, as))
+    break;
+    }
+    }
+  if (i == -1)
+	  off = 0;
+  *max_offset = off;
+}
+
 /* Return cost of computing USE's address expression by using CAND.
    AFF_INV and AFF_VAR represent invariant and variant parts of the
    address expression, respectively.  If AFF_INV is simple, store
@@ -4802,6 +4931,13 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   /* Only true if ratio != 1.  */
   bool ok_with_ratio_p = false;
   bool ok_without_ratio_p = false;
+  tree ubase = use->iv->base;
+  tree cbase = cand->iv->base, cstep = cand->iv->step;
+  tree utype = TREE_TYPE (ubase), ctype;
+  unsigned HOST_WIDE_INT cstepi;
+  bool symbol_present = false, var_present = false, stmt_is_after_increment;
+  poly_int64_pod min_offset, max_offset;
+  bool offset_p, ratio_p;
 
   if (!aff_combination_const_p (aff_inv))
     {
@@ -4915,16 +5051,74 @@ get_address_cost (struct ivopts_data *data, struct iv_use *use,
   gcc_assert (memory_address_addr_space_p (mem_mode, addr, as));
   cost += address_cost (addr, mem_mode, as, speed);
 
-  if (parts.symbol != NULL_TREE)
-    cost.complexity += 1;
-  /* Don't increase the complexity of adding a scaled index if it's
-     the only kind of index that the target allows.  */
-  if (parts.step != NULL_TREE && ok_without_ratio_p)
-    cost.complexity += 1;
-  if (parts.base != NULL_TREE && parts.index != NULL_TREE)
-    cost.complexity += 1;
-  if (parts.offset != NULL_TREE && !integer_zerop (parts.offset))
-    cost.complexity += 1;
+  if (cst_and_fits_in_hwi (cstep))
+    cstepi = int_cst_value (cstep);
+  else
+    cstepi = 0;
+
+  STRIP_NOPS (cbase);
+  ctype = TREE_TYPE (cbase);
+
+  stmt_is_after_increment = stmt_after_increment (data->current_loop, cand,
+    use->stmt);
+
+  if (cst_and_fits_in_hwi (cbase))
+    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
+      &symbol_present, &var_present);
+  else if (ratio == 1)
+    {
+      tree real_cbase = cbase;
+
+      /* Check to see if any adjustment is needed.  */
+      if (!cst_and_fits_in_hwi (cstep) && stmt_is_after_increment)
+	{
+	  aff_tree real_cbase_aff;
+	  aff_tree cstep_aff;
+
+	  tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
+				   &real_cbase_aff);
+	  tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
+
+	  aff_combination_add (&real_cbase_aff, &cstep_aff);
+	  real_cbase = aff_combination_to_tree (&real_cbase_aff);
+	}
+    compute_symbol_and_var_present (ubase, real_cbase,
+      &symbol_present, &var_present);
+    }
+  else if (!POINTER_TYPE_P (ctype)
+	   && multiplier_allowed_in_address_p
+		(ratio, mem_mode,
+			TYPE_ADDR_SPACE (TREE_TYPE (utype))))
+    {
+      tree real_cbase = cbase;
+
+      if (cstepi == 0 && stmt_is_after_increment)
+	{
+	  if (POINTER_TYPE_P (ctype))
+	    real_cbase = fold_build2 (POINTER_PLUS_EXPR, ctype, cbase, cstep);
+	  else
+	    real_cbase = fold_build2 (PLUS_EXPR, ctype, cbase, cstep);
+	}
+      real_cbase = fold_build2 (MULT_EXPR, ctype, real_cbase,
+				build_int_cst (ctype, ratio));
+    compute_symbol_and_var_present (ubase, real_cbase,
+      &symbol_present, &var_present);
+    }
+  else
+    {
+    compute_symbol_and_var_present (ubase, build_int_cst (utype, 0),
+      &symbol_present, &var_present);
+    }
+
+  compute_min_and_max_offset (as, mem_mode, &min_offset, &max_offset);
+  offset_p = maybe_ne (aff_inv->offset, 0)
+       && known_le (min_offset, aff_inv->offset)
+       && known_le (aff_inv->offset, max_offset);
+  ratio_p = (ratio != 1
+	     && multiplier_allowed_in_address_p (ratio, mem_mode, as));
+
+  cost.complexity = (symbol_present != 0) + (var_present != 0)
+       + offset_p + ratio_p;
 
   return cost;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-04-15 13:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18 11:28 [PATCH 1/2] ivopts: Revert computation of address cost complexity Aleksandar Rakic
  -- strict thread matches above, loose matches on Subject: below --
2024-03-18 20:27 Aleksandar Rakic
2024-04-15 13:30 ` Aleksandar Rakic
2022-10-21 13:52 [PATCH 0/2] ivopts: Fix candidate selection for architectures with limited addressing modes Dimitrije Milosevic
2022-10-21 13:52 ` [PATCH 1/2] ivopts: Revert computation of address cost complexity Dimitrije Milosevic
2022-10-25 11:08   ` Richard Biener
2022-10-25 13:00     ` Dimitrije Milosevic
2022-10-27 23:02   ` Jeff Law
2022-10-28  6:43     ` Dimitrije Milosevic
2022-10-28  7:00       ` Richard Biener
2022-10-28 13:39         ` Dimitrije Milosevic
2022-11-01 18:46         ` Jeff Law
2022-11-02  8:40           ` Dimitrije Milosevic
2022-11-07 13:35             ` Richard Biener
2022-12-15 15:26               ` Dimitrije Milosevic
2022-12-16  9:58                 ` Richard Biener
2022-12-16 11:37                   ` Dimitrije Milosevic
2022-12-16 11:58                     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).