[PATCH PR90078]Capping comp_cost computation in ivopts

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH PR90078]Capping comp_cost computation in ivopts
@ 2019-04-17  7:00 bin.cheng
  2019-04-17  7:19 ` Jakub Jelinek
  0 siblings, 1 reply; 8+ messages in thread
From: bin.cheng @ 2019-04-17  7:00 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

Hi,
As discussed in PR90078, this patch checks possible infinite_cost overflow in ivopts.
Also as discussed, overflow happens mostly because of cost scaling wrto bb_freq/loop_freq.
For the moment, we only implement capping in comp_cost operators, while in next
stage1, we may instead implement capping in get_scaled_computation_cost_at with
more supporting benchmark data.

BTW, I think switching costs around comparison between infinite_cost is unnecessary
since there will be no overflow in integer after capping with infinite_cost.

Bootstrap and test on x86_64, is it OK?

Thanks,
bin

2019-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>

        PR tree-optimization/92078
        * tree-ssa-loop-ivopts.c (comp_cost::operator +,-,+=,-+,/=,*=): Add
        checks for infinite_cost overflow.

2018-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>

        PR tree-optimization/92078
        * gcc/testsuite/g++.dg/tree-ssa/pr90078.C: New test.

[-- Attachment #2: 0001-pr90078.patch --]
[-- Type: application/octet-stream, Size: 8310 bytes --]

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90078.C b/gcc/testsuite/g++.dg/tree-ssa/pr90078.C
new file mode 100644
index 00000000000..e36f50e9d8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr90078.C
@@ -0,0 +1,199 @@
+// { dg-do compile }
+// { dg-options "-std=c++14 -O2 -ftemplate-depth=1000000" }
+
+template <class T, int Dim0, int Dim1, int Dim2> struct Tensor3;
+template <class A, class T, int Dim0, int Dim1, int Dim2, char i, char j,
+          char k>
+struct Tensor3_Expr;
+
+template <class T, int Dim0, int Dim1, int Dim2, int Dim3> struct Tensor4;
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr;
+
+template <char i, int Dim> struct Index
+{};
+template <const int N> struct Number
+{
+  Number(){};
+  operator int() const { return N; }
+};
+
+template <class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2>
+struct Tensor3
+{
+  T data[Tensor_Dim0][Tensor_Dim1][Tensor_Dim2];
+
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return data[N1][N2][N3];
+  }
+
+  template <char i, char j, char k, int Dim0, int Dim1, int Dim2>
+  Tensor3_Expr<const Tensor3<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>, T,
+               Dim0, Dim1, Dim2, i, j, k>
+  operator()(const Index<i, Dim0>, const Index<j, Dim1>,
+             const Index<k, Dim2>) const
+  {
+    return Tensor3_Expr<const Tensor3<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>,
+                        T, Dim0, Dim1, Dim2, i, j, k>(*this);
+  }
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, char i, char j,
+          char k>
+struct Tensor3_Expr
+{
+  A iter;
+
+  Tensor3_Expr(const A &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return iter(N1, N2, N3);
+  }
+};
+
+template <class A, class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2,
+          int Dim0, int Dim1, int Dim2, char i, char j, char k>
+struct Tensor3_Expr<Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>, T, Dim0,
+                   Dim1, Dim2, i, j, k>
+{
+  Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2> &iter;
+
+  Tensor3_Expr(Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2> &a) : iter(a)
+  {}
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return iter(N1, N2, N3);
+  }
+};
+
+template <class A, class B, class T, class U, int Dim0, int Dim1, int Dim23,
+          int Dim4, int Dim5, char i, char j, char k, char l, char m>
+struct Tensor3_times_Tensor3_21
+{
+  Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> iterA;
+  Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> iterB;
+
+  template <int CurrentDim>
+  T eval(const int N1, const int N2, const int N3, const int N4,
+         const Number<CurrentDim> &) const
+  {
+    return iterA(N1, N2, CurrentDim - 1) * iterB(CurrentDim - 1, N3, N4)
+           + eval(N1, N2, N3, N4, Number<CurrentDim - 1>());
+  }
+  T eval(const int N1, const int N2, const int N3, const int N4,
+         const Number<1> &) const
+  {
+    return iterA(N1, N2, 0) * iterB(0, N3, N4);
+  }
+
+  Tensor3_times_Tensor3_21(
+    const Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> &a,
+    const Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> &b)
+      : iterA(a), iterB(b)
+  {}
+  T operator()(const int &N1, const int &N2, const int &N3,
+               const int &N4) const
+  {
+    return eval(N1, N2, N3, N4, Number<Dim23>());
+  }
+};
+
+template <class A, class B, class T, class U, int Dim0, int Dim1, int Dim23,
+          int Dim4, int Dim5, char i, char j, char k, char l, char m>
+Tensor4_Expr<Tensor3_times_Tensor3_21<A, B, T, U, Dim0, Dim1, Dim23, Dim4,
+                                      Dim5, i, j, k, l, m>,
+             T, Dim0, Dim1, Dim4, Dim5, i, j, l, m>
+operator*(const Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> &a,
+          const Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> &b)
+{
+  using TensorExpr = Tensor3_times_Tensor3_21<A, B, T, U, Dim0, Dim1, Dim23,
+                                              Dim4, Dim5, i, j, k, l, m>;
+  return Tensor4_Expr<TensorExpr, T, Dim0, Dim1, Dim4, Dim5, i, j, l, m>(
+    TensorExpr(a, b));
+};
+
+template <class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2,
+          int Tensor_Dim3>
+struct Tensor4
+{
+  T data[Tensor_Dim0][Tensor_Dim1][Tensor_Dim2][Tensor_Dim3];
+
+  Tensor4() {}
+  T &operator()(const int N1, const int N2, const int N3, const int N4)
+  {
+    return data[N1][N2][N3][N4];
+  }
+
+  template <char i, char j, char k, char l, int Dim0, int Dim1, int Dim2,
+            int Dim3>
+  Tensor4_Expr<Tensor4<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2, Tensor_Dim3>,
+               T, Dim0, Dim1, Dim2, Dim3, i, j, k, l>
+  operator()(const Index<i, Dim0>, const Index<j, Dim1>, const Index<k, Dim2>,
+             const Index<l, Dim3>)
+  {
+    return Tensor4_Expr<
+      Tensor4<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2, Tensor_Dim3>, T, Dim0,
+      Dim1, Dim2, Dim3, i, j, k, l>(*this);
+  };
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr
+{
+  A iter;
+
+  Tensor4_Expr(const A &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3, const int N4) const
+  {
+    return iter(N1, N2, N3, N4);
+  }
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr<Tensor4<A, Dim0, Dim1, Dim2, Dim3>, T, Dim0, Dim1, Dim2,
+                   Dim3, i, j, k, l>
+{
+  Tensor4<A, Dim0, Dim1, Dim2, Dim3> &iter;
+
+  Tensor4_Expr(Tensor4<A, Dim0, Dim1, Dim2, Dim3> &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3, const int N4) const
+  {
+    return iter(N1, N2, N3, N4);
+  }
+
+  template <class B, class U, int Dim1_0, int Dim1_1, int Dim1_2, int Dim1_3,
+            char i_1, char j_1, char k_1, char l_1>
+  auto &operator=(const Tensor4_Expr<B, U, Dim1_0, Dim1_1, Dim1_2, Dim1_3, i_1,
+                                     j_1, k_1, l_1> &rhs)
+  {
+    for(int ii = 0; ii < Dim0; ++ii)
+      for(int jj = 0; jj < Dim1; ++jj)
+        for(int kk = 0; kk < Dim2; ++kk)
+          for(int ll = 0; ll < Dim3; ++ll)
+            {
+              iter(ii, jj, kk, ll) = rhs(ii, jj, kk, ll);
+            }
+    return *this;
+  }
+};
+
+int main()
+{
+  Tensor3<float, 100, 100, 1000> t1;
+  Tensor3<float, 1000, 100, 100> t2;
+
+  Index<'l', 100> l;
+  Index<'m', 100> m;
+  Index<'k', 1000> k;
+  Index<'n', 100> n;
+  Index<'o', 100> o;
+
+  Tensor4<float, 100, 100, 100, 100> res;
+  res(l, m, n, o) = t1(l, m, k) * t2(k, n, o);
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index a2b6b2b2312..4ca1f0e9686 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -243,6 +243,9 @@ operator+ (comp_cost cost1, comp_cost cost2)
   if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
     return infinite_cost;
 
+  if (cost1.cost + cost2.cost >= infinite_cost.cost)
+    return infinite_cost;
+
   cost1.cost += cost2.cost;
   cost1.complexity += cost2.complexity;
 
@@ -256,6 +259,8 @@ operator- (comp_cost cost1, comp_cost cost2)
     return infinite_cost;
 
   gcc_assert (!cost2.infinite_cost_p ());
+  if (cost1.cost - cost2.cost >= infinite_cost.cost)
+    return infinite_cost;
 
   cost1.cost -= cost2.cost;
   cost1.complexity -= cost2.complexity;
@@ -276,6 +281,8 @@ comp_cost::operator+= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  if (this->cost + c >= infinite_cost.cost)
+    return infinite_cost;
   this->cost += c;
 
   return *this;
@@ -287,6 +294,8 @@ comp_cost::operator-= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  if (this->cost - c >= infinite_cost.cost)
+    return infinite_cost;
   this->cost -= c;
 
   return *this;
@@ -295,6 +304,7 @@ comp_cost::operator-= (HOST_WIDE_INT c)
 comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
+  gcc_assert (c != 0);
   if (infinite_cost_p ())
     return *this;
 
@@ -309,6 +319,9 @@ comp_cost::operator*= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  if (this->cost * c >= infinite_cost.cost)
+    return infinite_cost;
+
   this->cost *= c;
 
   return *this;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-04-17  7:00 [PATCH PR90078]Capping comp_cost computation in ivopts bin.cheng
@ 2019-04-17  7:19 ` Jakub Jelinek
  2019-04-17 11:14   ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2019-04-17  7:19 UTC (permalink / raw)
  To: bin.cheng; +Cc: GCC Patches

On Wed, Apr 17, 2019 at 02:13:12PM +0800, bin.cheng wrote:
> Hi,
> As discussed in PR90078, this patch checks possible infinite_cost overflow in ivopts.
> Also as discussed, overflow happens mostly because of cost scaling wrto bb_freq/loop_freq.
> For the moment, we only implement capping in comp_cost operators, while in next
> stage1, we may instead implement capping in get_scaled_computation_cost_at with
> more supporting benchmark data.
> 
> BTW, I think switching costs around comparison between infinite_cost is unnecessary
> since there will be no overflow in integer after capping with infinite_cost.
> 
> Bootstrap and test on x86_64, is it OK?
> 
> Thanks,
> bin
> 
> 2019-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>
> 
>         PR tree-optimization/92078
>         * tree-ssa-loop-ivopts.c (comp_cost::operator +,-,+=,-+,/=,*=): Add
>         checks for infinite_cost overflow.
> 
> 2018-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>
> 
>         PR tree-optimization/92078
>         * gcc/testsuite/g++.dg/tree-ssa/pr90078.C: New test.

--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -243,6 +243,9 @@ operator+ (comp_cost cost1, comp_cost cost2)
   if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
     return infinite_cost;
 
+  if (cost1.cost + cost2.cost >= infinite_cost.cost)
+    return infinite_cost;

As
#define INFTY 10000000
what is the reason to keep the previous condition as well?
I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
cost1.cost + cost2.cost >= INFTY too.
Unless costs can go negative.

@@ -256,6 +259,8 @@ operator- (comp_cost cost1, comp_cost cost2)
     return infinite_cost;
 
   gcc_assert (!cost2.infinite_cost_p ());
+  if (cost1.cost - cost2.cost >= infinite_cost.cost)
+    return infinite_cost;

Unless costs can be negative, when you first bail out
for cost1.cost == INFTY, then cost1.cost - cost2.cost won't
be INFTY (but could get negative).  So shouldn't there be a guard against
that instead?  Or, if costs can be negative, shouldn't there be also
guards that it doesn't grow too negative (say smaller than -INFTY)?

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-04-17  7:19 ` Jakub Jelinek
@ 2019-04-17 11:14   ` Bin.Cheng
  2019-04-17 11:35     ` Jakub Jelinek
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2019-04-17 11:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: bin.cheng, GCC Patches

On Wed, Apr 17, 2019 at 3:10 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Apr 17, 2019 at 02:13:12PM +0800, bin.cheng wrote:
> > Hi,
> > As discussed in PR90078, this patch checks possible infinite_cost overflow in ivopts.
> > Also as discussed, overflow happens mostly because of cost scaling wrto bb_freq/loop_freq.
> > For the moment, we only implement capping in comp_cost operators, while in next
> > stage1, we may instead implement capping in get_scaled_computation_cost_at with
> > more supporting benchmark data.
> >
> > BTW, I think switching costs around comparison between infinite_cost is unnecessary
> > since there will be no overflow in integer after capping with infinite_cost.
> >
> > Bootstrap and test on x86_64, is it OK?
> >
> > Thanks,
> > bin
> >
> > 2019-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>
> >
> >         PR tree-optimization/92078
> >         * tree-ssa-loop-ivopts.c (comp_cost::operator +,-,+=,-+,/=,*=): Add
> >         checks for infinite_cost overflow.
> >
> > 2018-04-17  Bin Cheng  <bin.cheng@linux.alibaba.com>
> >
> >         PR tree-optimization/92078
> >         * gcc/testsuite/g++.dg/tree-ssa/pr90078.C: New test.
>
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -243,6 +243,9 @@ operator+ (comp_cost cost1, comp_cost cost2)
>    if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
>      return infinite_cost;
>
> +  if (cost1.cost + cost2.cost >= infinite_cost.cost)
> +    return infinite_cost;
>
> As
> #define INFTY 10000000
> what is the reason to keep the previous condition as well?
> I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> cost1.cost + cost2.cost >= INFTY too.
> Unless costs can go negative.
It's a bit complicated, but in general, costs can go negative.

>
> @@ -256,6 +259,8 @@ operator- (comp_cost cost1, comp_cost cost2)
>      return infinite_cost;
>
>    gcc_assert (!cost2.infinite_cost_p ());
> +  if (cost1.cost - cost2.cost >= infinite_cost.cost)
> +    return infinite_cost;
>
> Unless costs can be negative, when you first bail out
> for cost1.cost == INFTY, then cost1.cost - cost2.cost won't
> be INFTY (but could get negative).  So shouldn't there be a guard against
> that instead?  Or, if costs can be negative, shouldn't there be also
> guards that it doesn't grow too negative (say smaller than -INFTY)?
Negative cost is kind of a result of booking cost cancellation at
different place.  For example, it mostly comes from in modeling auto
increment/decrement addressing mode.  To be specific, when IV's
increment instruction can be merged into addressing mode, we cancel
cost of IV increment operation in cand-use cost.  Very likely 4 will
be subtracted.  In general, we wouldn't expect negative cost can go
too big, so there is no -INFTY logic in ivopts at all.  So this is the
least invasive fix for the moment, I would consider capping
bb_freq/loop_freq in the future which should rule out the overflow
possibility in the first place.

Thanks,
bin
>
>         Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-04-17 11:14   ` Bin.Cheng
@ 2019-04-17 11:35     ` Jakub Jelinek
  2019-05-05  6:02       ` bin.cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2019-04-17 11:35 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, GCC Patches

On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > As
> > #define INFTY 10000000
> > what is the reason to keep the previous condition as well?
> > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > cost1.cost + cost2.cost >= INFTY too.
> > Unless costs can go negative.
> It's a bit complicated, but in general, costs can go negative.

Ok, no objections from me then (but as I don't know anything about it,
not an ack either; you are ivopts maintainer, so you don't need one).

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-04-17 11:35     ` Jakub Jelinek
@ 2019-05-05  6:02       ` bin.cheng
  2019-05-06 10:11         ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: bin.cheng @ 2019-05-05  6:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]

> ------------------------------------------------------------------
> Sender:Jakub Jelinek <jakub@redhat.com>
> Sent At:2019 Apr. 17 (Wed.) 19:27
> Recipient:Bin.Cheng <amker.cheng@gmail.com>
> Cc:bin.cheng <bin.cheng@linux.alibaba.com>; GCC Patches <gcc-patches@gcc.gnu.org>
> Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> 
> 
> On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > As
> > > #define INFTY 10000000
> > > what is the reason to keep the previous condition as well?
> > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > cost1.cost + cost2.cost >= INFTY too.
> > > Unless costs can go negative.
> > It's a bit complicated, but in general, costs can go negative.
> 
> Ok, no objections from me then (but as I don't know anything about it,
> not an ack either; you are ivopts maintainer, so you don't need one).

Hi,
The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
fixed by another patch.  This is the updated patch for PR90078.  It promotes type
of ivopts cost from int to int64_t, as well as change behavior of infinite_cost overflow
from saturation to assert.
Please note, implicit conversions are kept in cost computation as before without
introducing any narrowing.

Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?

Thanks,
bin
2019-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>

        PR tree-optimization/90078
        * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
        (INFTY): Increase the value for infinite cost.
        (struct comp_cost): Promote type of members to int64_t.
        (infinite_cost): Don't set complexity in initialization.
        (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
        overflows to infinite_cost.
        (adjust_setup_cost): Promote type of parameter and cost computation
        to int64_t.
        (struct ainc_cost_data, struct iv_ca): Promote type of member to
        int64_t.
        (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
        cost computation to int64_t.
        (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
        int64_t's format specifier in dump.

2018-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>

        PR tree-optimization/90078
        * g++.dg/tree-ssa/pr90078.C: New test.

[-- Attachment #2: 0002-Fix-pr90078-by-promoting-type-of-IVOPTs-cost-to-int6.patch --]
[-- Type: application/octet-stream, Size: 14220 bytes --]

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90078.C b/gcc/testsuite/g++.dg/tree-ssa/pr90078.C
new file mode 100644
index 00000000000..e36f50e9d8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr90078.C
@@ -0,0 +1,199 @@
+// { dg-do compile }
+// { dg-options "-std=c++14 -O2 -ftemplate-depth=1000000" }
+
+template <class T, int Dim0, int Dim1, int Dim2> struct Tensor3;
+template <class A, class T, int Dim0, int Dim1, int Dim2, char i, char j,
+          char k>
+struct Tensor3_Expr;
+
+template <class T, int Dim0, int Dim1, int Dim2, int Dim3> struct Tensor4;
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr;
+
+template <char i, int Dim> struct Index
+{};
+template <const int N> struct Number
+{
+  Number(){};
+  operator int() const { return N; }
+};
+
+template <class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2>
+struct Tensor3
+{
+  T data[Tensor_Dim0][Tensor_Dim1][Tensor_Dim2];
+
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return data[N1][N2][N3];
+  }
+
+  template <char i, char j, char k, int Dim0, int Dim1, int Dim2>
+  Tensor3_Expr<const Tensor3<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>, T,
+               Dim0, Dim1, Dim2, i, j, k>
+  operator()(const Index<i, Dim0>, const Index<j, Dim1>,
+             const Index<k, Dim2>) const
+  {
+    return Tensor3_Expr<const Tensor3<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>,
+                        T, Dim0, Dim1, Dim2, i, j, k>(*this);
+  }
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, char i, char j,
+          char k>
+struct Tensor3_Expr
+{
+  A iter;
+
+  Tensor3_Expr(const A &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return iter(N1, N2, N3);
+  }
+};
+
+template <class A, class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2,
+          int Dim0, int Dim1, int Dim2, char i, char j, char k>
+struct Tensor3_Expr<Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2>, T, Dim0,
+                   Dim1, Dim2, i, j, k>
+{
+  Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2> &iter;
+
+  Tensor3_Expr(Tensor3<A, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2> &a) : iter(a)
+  {}
+  T operator()(const int N1, const int N2, const int N3) const
+  {
+    return iter(N1, N2, N3);
+  }
+};
+
+template <class A, class B, class T, class U, int Dim0, int Dim1, int Dim23,
+          int Dim4, int Dim5, char i, char j, char k, char l, char m>
+struct Tensor3_times_Tensor3_21
+{
+  Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> iterA;
+  Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> iterB;
+
+  template <int CurrentDim>
+  T eval(const int N1, const int N2, const int N3, const int N4,
+         const Number<CurrentDim> &) const
+  {
+    return iterA(N1, N2, CurrentDim - 1) * iterB(CurrentDim - 1, N3, N4)
+           + eval(N1, N2, N3, N4, Number<CurrentDim - 1>());
+  }
+  T eval(const int N1, const int N2, const int N3, const int N4,
+         const Number<1> &) const
+  {
+    return iterA(N1, N2, 0) * iterB(0, N3, N4);
+  }
+
+  Tensor3_times_Tensor3_21(
+    const Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> &a,
+    const Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> &b)
+      : iterA(a), iterB(b)
+  {}
+  T operator()(const int &N1, const int &N2, const int &N3,
+               const int &N4) const
+  {
+    return eval(N1, N2, N3, N4, Number<Dim23>());
+  }
+};
+
+template <class A, class B, class T, class U, int Dim0, int Dim1, int Dim23,
+          int Dim4, int Dim5, char i, char j, char k, char l, char m>
+Tensor4_Expr<Tensor3_times_Tensor3_21<A, B, T, U, Dim0, Dim1, Dim23, Dim4,
+                                      Dim5, i, j, k, l, m>,
+             T, Dim0, Dim1, Dim4, Dim5, i, j, l, m>
+operator*(const Tensor3_Expr<A, T, Dim0, Dim1, Dim23, i, j, k> &a,
+          const Tensor3_Expr<B, U, Dim23, Dim4, Dim5, k, l, m> &b)
+{
+  using TensorExpr = Tensor3_times_Tensor3_21<A, B, T, U, Dim0, Dim1, Dim23,
+                                              Dim4, Dim5, i, j, k, l, m>;
+  return Tensor4_Expr<TensorExpr, T, Dim0, Dim1, Dim4, Dim5, i, j, l, m>(
+    TensorExpr(a, b));
+};
+
+template <class T, int Tensor_Dim0, int Tensor_Dim1, int Tensor_Dim2,
+          int Tensor_Dim3>
+struct Tensor4
+{
+  T data[Tensor_Dim0][Tensor_Dim1][Tensor_Dim2][Tensor_Dim3];
+
+  Tensor4() {}
+  T &operator()(const int N1, const int N2, const int N3, const int N4)
+  {
+    return data[N1][N2][N3][N4];
+  }
+
+  template <char i, char j, char k, char l, int Dim0, int Dim1, int Dim2,
+            int Dim3>
+  Tensor4_Expr<Tensor4<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2, Tensor_Dim3>,
+               T, Dim0, Dim1, Dim2, Dim3, i, j, k, l>
+  operator()(const Index<i, Dim0>, const Index<j, Dim1>, const Index<k, Dim2>,
+             const Index<l, Dim3>)
+  {
+    return Tensor4_Expr<
+      Tensor4<T, Tensor_Dim0, Tensor_Dim1, Tensor_Dim2, Tensor_Dim3>, T, Dim0,
+      Dim1, Dim2, Dim3, i, j, k, l>(*this);
+  };
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr
+{
+  A iter;
+
+  Tensor4_Expr(const A &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3, const int N4) const
+  {
+    return iter(N1, N2, N3, N4);
+  }
+};
+
+template <class A, class T, int Dim0, int Dim1, int Dim2, int Dim3, char i,
+          char j, char k, char l>
+struct Tensor4_Expr<Tensor4<A, Dim0, Dim1, Dim2, Dim3>, T, Dim0, Dim1, Dim2,
+                   Dim3, i, j, k, l>
+{
+  Tensor4<A, Dim0, Dim1, Dim2, Dim3> &iter;
+
+  Tensor4_Expr(Tensor4<A, Dim0, Dim1, Dim2, Dim3> &a) : iter(a) {}
+  T operator()(const int N1, const int N2, const int N3, const int N4) const
+  {
+    return iter(N1, N2, N3, N4);
+  }
+
+  template <class B, class U, int Dim1_0, int Dim1_1, int Dim1_2, int Dim1_3,
+            char i_1, char j_1, char k_1, char l_1>
+  auto &operator=(const Tensor4_Expr<B, U, Dim1_0, Dim1_1, Dim1_2, Dim1_3, i_1,
+                                     j_1, k_1, l_1> &rhs)
+  {
+    for(int ii = 0; ii < Dim0; ++ii)
+      for(int jj = 0; jj < Dim1; ++jj)
+        for(int kk = 0; kk < Dim2; ++kk)
+          for(int ll = 0; ll < Dim3; ++ll)
+            {
+              iter(ii, jj, kk, ll) = rhs(ii, jj, kk, ll);
+            }
+    return *this;
+  }
+};
+
+int main()
+{
+  Tensor3<float, 100, 100, 1000> t1;
+  Tensor3<float, 1000, 100, 100> t2;
+
+  Index<'l', 100> l;
+  Index<'m', 100> m;
+  Index<'k', 1000> k;
+  Index<'n', 100> n;
+  Index<'o', 100> o;
+
+  Tensor4<float, 100, 100, 100, 100> res;
+  res(l, m, n, o) = t1(l, m, k) * t2(k, n, o);
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 534e1463807..b626d33d209 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -66,6 +66,8 @@ along with GCC; see the file COPYING3.  If not see
    to decide costs more precisely, but getting all the interactions right
    would be complicated.  */
 
+#include <inttypes.h>
+
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -114,7 +116,7 @@ along with GCC; see the file COPYING3.  If not see
    interface between the GIMPLE and RTL worlds.  */
 
 /* The infinite cost.  */
-#define INFTY 10000000
+#define INFTY 1000000000L
 
 /* Returns the expected number of loop iterations for LOOP.
    The average trip count is computed from profile data if it
@@ -180,7 +182,7 @@ struct comp_cost
   comp_cost (): cost (0), complexity (0), scratch (0)
   {}
 
-  comp_cost (int cost, unsigned complexity, int scratch = 0)
+  comp_cost (int64_t cost, unsigned complexity, int64_t scratch = 0)
     : cost (cost), complexity (complexity), scratch (scratch)
   {}
 
@@ -220,16 +222,16 @@ struct comp_cost
   /* Returns true if COST1 is smaller or equal than COST2.  */
   friend bool operator<= (comp_cost cost1, comp_cost cost2);
 
-  int cost;		/* The runtime cost.  */
+  int64_t cost;		/* The runtime cost.  */
   unsigned complexity;  /* The estimate of the complexity of the code for
 			   the computation (in no concrete units --
 			   complexity field should be larger for more
 			   complex expressions and addressing modes).  */
-  int scratch;		/* Scratch used during cost computation.  */
+  int64_t scratch;	/* Scratch used during cost computation.  */
 };
 
 static const comp_cost no_cost;
-static const comp_cost infinite_cost (INFTY, INFTY, INFTY);
+static const comp_cost infinite_cost (INFTY, 0, INFTY);
 
 bool
 comp_cost::infinite_cost_p ()
@@ -243,6 +245,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
   if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
     return infinite_cost;
 
+  gcc_assert (cost1.cost + cost2.cost < infinite_cost.cost);
   cost1.cost += cost2.cost;
   cost1.complexity += cost2.complexity;
 
@@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
     return infinite_cost;
 
   gcc_assert (!cost2.infinite_cost_p ());
+  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);
 
   cost1.cost -= cost2.cost;
   cost1.complexity -= cost2.complexity;
@@ -276,6 +280,7 @@ comp_cost::operator+= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  gcc_assert (this->cost + c < infinite_cost.cost);
   this->cost += c;
 
   return *this;
@@ -287,6 +292,7 @@ comp_cost::operator-= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  gcc_assert (this->cost - c < infinite_cost.cost);
   this->cost -= c;
 
   return *this;
@@ -295,6 +301,7 @@ comp_cost::operator-= (HOST_WIDE_INT c)
 comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
+  gcc_assert (c != 0);
   if (infinite_cost_p ())
     return *this;
 
@@ -309,6 +316,7 @@ comp_cost::operator*= (HOST_WIDE_INT c)
   if (infinite_cost_p ())
     return *this;
 
+  gcc_assert (this->cost * c < infinite_cost.cost);
   this->cost *= c;
 
   return *this;
@@ -638,7 +646,7 @@ struct iv_ca
   comp_cost cand_use_cost;
 
   /* Total cost of candidates.  */
-  unsigned cand_cost;
+  int64_t cand_cost;
 
   /* Number of times each invariant variable is used.  */
   unsigned *n_inv_var_uses;
@@ -4025,16 +4033,16 @@ get_computation_at (struct loop *loop, gimple *at,
    if we're optimizing for speed, amortize it over the per-iteration cost.
    If ROUND_UP_P is true, the result is round up rather than to zero when
    optimizing for speed.  */
-static unsigned
-adjust_setup_cost (struct ivopts_data *data, unsigned cost,
+static int64_t
+adjust_setup_cost (struct ivopts_data *data, int64_t cost,
 		   bool round_up_p = false)
 {
   if (cost == INFTY)
     return cost;
   else if (optimize_loop_for_speed_p (data->current_loop))
     {
-      HOST_WIDE_INT niters = avg_loop_niter (data->current_loop);
-      return ((HOST_WIDE_INT) cost + (round_up_p ? niters - 1 : 0)) / niters;
+      int64_t niters = (int64_t) avg_loop_niter (data->current_loop);
+      return (cost + (round_up_p ? niters - 1 : 0)) / niters;
     }
   else
     return cost;
@@ -4305,7 +4313,7 @@ enum ainc_type
 
 struct ainc_cost_data
 {
-  unsigned costs[AINC_NONE];
+  int64_t costs[AINC_NONE];
 };
 
 static comp_cost
@@ -4566,12 +4574,12 @@ get_scaled_computation_cost_at (ivopts_data *data, gimple *at, comp_cost cost)
       if (scale_factor == 1)
 	return cost;
 
-      int scaled_cost
+      int64_t scaled_cost
 	= cost.scratch + (cost.cost - cost.scratch) * scale_factor;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Scaling cost based on bb prob "
-		 "by %2.2f: %d (scratch: %d) -> %d\n",
+	fprintf (dump_file, "Scaling cost based on bb prob by %2.2f: "
+		 "%" PRId64 " (scratch: %" PRId64 ") -> %" PRId64 "\n",
 		 1.0f * scale_factor, cost.cost, cost.scratch, scaled_cost);
 
       cost.cost = scaled_cost;
@@ -5539,7 +5547,7 @@ determine_group_iv_costs (struct ivopts_data *data)
 		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
-	      fprintf (dump_file, "  %d\t%d\t%d\t",
+	      fprintf (dump_file, "  %d\t%" PRId64 "\t%d\t",
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.cost,
 		       group->cost_map[j].cost.complexity);
@@ -5569,7 +5577,7 @@ static void
 determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
 {
   comp_cost cost_base;
-  unsigned cost, cost_step;
+  int64_t cost, cost_step;
   tree base;
 
   gcc_assert (cand->iv != NULL);
@@ -6139,11 +6147,11 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost,
+  fprintf (file, "  cost: %" PRId64 " (complexity %d)\n", cost.cost,
 	   cost.complexity);
-  fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
-	   ivs->cand_cost, ivs->cand_use_cost.cost,
-	   ivs->cand_use_cost.complexity);
+  fprintf (file, "  cand_cost: %" PRId64 "\n  cand_group_cost: "
+	   "%" PRId64 " (complexity %d)\n", ivs->cand_cost,
+	   ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
   bitmap_print (file, ivs->cands, "  candidates: ","\n");
 
   for (i = 0; i < ivs->upto; i++)
@@ -6151,9 +6159,9 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
       struct iv_group *group = data->vgroups[i];
       struct cost_pair *cp = iv_ca_cand_for_group (ivs, group);
       if (cp)
-        fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
-		 group->id, cp->cand->id, cp->cost.cost,
-		 cp->cost.complexity);
+        fprintf (file, "   group:%d --> iv_cand:%d, cost=("
+		 "%" PRId64 ",%d)\n", group->id, cp->cand->id,
+		 cp->cost.cost, cp->cost.complexity);
       else
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
@@ -6751,9 +6759,9 @@ find_optimal_iv_set (struct ivopts_data *data)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "Original cost %d (complexity %d)\n\n",
+      fprintf (dump_file, "Original cost %" PRId64 " (complexity %d)\n\n",
 	       origcost.cost, origcost.complexity);
-      fprintf (dump_file, "Final cost %d (complexity %d)\n\n",
+      fprintf (dump_file, "Final cost %" PRId64 " (complexity %d)\n\n",
 	       cost.cost, cost.complexity);
     }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-05-05  6:02       ` bin.cheng
@ 2019-05-06 10:11         ` Richard Biener
  2019-05-06 10:24           ` Bin.Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2019-05-06 10:11 UTC (permalink / raw)
  To: bin.cheng; +Cc: Jakub Jelinek, GCC Patches

On Sun, May 5, 2019 at 8:03 AM bin.cheng <bin.cheng@linux.alibaba.com> wrote:
>
> > ------------------------------------------------------------------
> > Sender:Jakub Jelinek <jakub@redhat.com>
> > Sent At:2019 Apr. 17 (Wed.) 19:27
> > Recipient:Bin.Cheng <amker.cheng@gmail.com>
> > Cc:bin.cheng <bin.cheng@linux.alibaba.com>; GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> >
> >
> > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > As
> > > > #define INFTY 10000000
> > > > what is the reason to keep the previous condition as well?
> > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > cost1.cost + cost2.cost >= INFTY too.
> > > > Unless costs can go negative.
> > > It's a bit complicated, but in general, costs can go negative.
> >
> > Ok, no objections from me then (but as I don't know anything about it,
> > not an ack either; you are ivopts maintainer, so you don't need one).
>
> Hi,
> The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
> fixed by another patch.  This is the updated patch for PR90078.  It promotes type
> of ivopts cost from int to int64_t, as well as change behavior of infinite_cost overflow
> from saturation to assert.
> Please note, implicit conversions are kept in cost computation as before without
> introducing any narrowing.
>
> Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?

Do not include system headers in .c files, instead those need to be
(and are already)
included via system.h.

 /* The infinite cost.  */
-#define INFTY 10000000
+#define INFTY 1000000000L

do we actually need this?  What happens on a ilp32 host?  That is, I believe
you can drop the 'L' (it fits into an int anyways)

@@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
     return infinite_cost;

   gcc_assert (!cost2.infinite_cost_p ());
+  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);

   cost1.cost -= cost2.cost;
   cost1.complexity -= cost2.complexity;

probably a pre-existing issue, but we do not seem to handle underflow
here in general, nor check that underflow doesn't get us below -INFTY.

I guess we really don't want negative costs?  That doesn't seem to be
documented and I was also wondering why the cost isn't unsigned...

@@ -638,7 +646,7 @@ struct iv_ca
   comp_cost cand_use_cost;

   /* Total cost of candidates.  */
-  unsigned cand_cost;
+  int64_t cand_cost;

   /* Number of times each invariant variable is used.  */
   unsigned *n_inv_var_uses;

shows this "issue".  Can't we use uint64_t throughout the patch?

Otherwise this looks OK.

Thanks,
Richard.

> Thanks,
> bin
> 2019-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
>
>         PR tree-optimization/90078
>         * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
>         (INFTY): Increase the value for infinite cost.
>         (struct comp_cost): Promote type of members to int64_t.
>         (infinite_cost): Don't set complexity in initialization.
>         (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
>         overflows to infinite_cost.
>         (adjust_setup_cost): Promote type of parameter and cost computation
>         to int64_t.
>         (struct ainc_cost_data, struct iv_ca): Promote type of member to
>         int64_t.
>         (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
>         cost computation to int64_t.
>         (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
>         int64_t's format specifier in dump.
>
> 2018-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
>
>         PR tree-optimization/90078
>         * g++.dg/tree-ssa/pr90078.C: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-05-06 10:11         ` Richard Biener
@ 2019-05-06 10:24           ` Bin.Cheng
  2019-05-06 10:27             ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Bin.Cheng @ 2019-05-06 10:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: bin.cheng, Jakub Jelinek, GCC Patches

On Mon, May 6, 2019 at 6:11 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, May 5, 2019 at 8:03 AM bin.cheng <bin.cheng@linux.alibaba.com> wrote:
> >
> > > ------------------------------------------------------------------
> > > Sender:Jakub Jelinek <jakub@redhat.com>
> > > Sent At:2019 Apr. 17 (Wed.) 19:27
> > > Recipient:Bin.Cheng <amker.cheng@gmail.com>
> > > Cc:bin.cheng <bin.cheng@linux.alibaba.com>; GCC Patches <gcc-patches@gcc.gnu.org>
> > > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> > >
> > >
> > > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > > As
> > > > > #define INFTY 10000000
> > > > > what is the reason to keep the previous condition as well?
> > > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > > cost1.cost + cost2.cost >= INFTY too.
> > > > > Unless costs can go negative.
> > > > It's a bit complicated, but in general, costs can go negative.
> > >
> > > Ok, no objections from me then (but as I don't know anything about it,
> > > not an ack either; you are ivopts maintainer, so you don't need one).
> >
> > Hi,
> > The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
> > fixed by another patch.  This is the updated patch for PR90078.  It promotes type
> > of ivopts cost from int to int64_t, as well as change behavior of infinite_cost overflow
> > from saturation to assert.
> > Please note, implicit conversions are kept in cost computation as before without
> > introducing any narrowing.
> >
> > Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?
>
> Do not include system headers in .c files, instead those need to be
> (and are already)
> included via system.h.
>
>  /* The infinite cost.  */
> -#define INFTY 10000000
> +#define INFTY 1000000000L
>
> do we actually need this?  What happens on a ilp32 host?  That is, I believe
> you can drop the 'L' (it fits into an int anyways)
Yeah, now I think if int64_t is necessary or not.  With the scaling
bound and assertions.
>
> @@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
>      return infinite_cost;
>
>    gcc_assert (!cost2.infinite_cost_p ());
> +  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);
>
>    cost1.cost -= cost2.cost;
>    cost1.complexity -= cost2.complexity;
>
> probably a pre-existing issue, but we do not seem to handle underflow
> here in general, nor check that underflow doesn't get us below -INFTY.
>
> I guess we really don't want negative costs?  That doesn't seem to be
> documented and I was also wondering why the cost isn't unsigned...
>
> @@ -638,7 +646,7 @@ struct iv_ca
>    comp_cost cand_use_cost;
>
>    /* Total cost of candidates.  */
> -  unsigned cand_cost;
> +  int64_t cand_cost;
>
>    /* Number of times each invariant variable is used.  */
>    unsigned *n_inv_var_uses;
>
> shows this "issue".  Can't we use uint64_t throughout the patch?
Oh, it's actually explained in previous message,
https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00697.html
In short, yes the cost can be negative.  The negative cost should be
small in absolute value, and we don't check -INFTY.  With the scaling
bound change, I don't think -INFTY is possible IIUC.

Thanks,
bin
>
> Otherwise this looks OK.
>
> Thanks,
> Richard.
>
> > Thanks,
> > bin
> > 2019-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
> >
> >         PR tree-optimization/90078
> >         * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
> >         (INFTY): Increase the value for infinite cost.
> >         (struct comp_cost): Promote type of members to int64_t.
> >         (infinite_cost): Don't set complexity in initialization.
> >         (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
> >         overflows to infinite_cost.
> >         (adjust_setup_cost): Promote type of parameter and cost computation
> >         to int64_t.
> >         (struct ainc_cost_data, struct iv_ca): Promote type of member to
> >         int64_t.
> >         (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
> >         cost computation to int64_t.
> >         (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
> >         int64_t's format specifier in dump.
> >
> > 2018-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
> >
> >         PR tree-optimization/90078
> >         * g++.dg/tree-ssa/pr90078.C: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH PR90078]Capping comp_cost computation in ivopts
  2019-05-06 10:24           ` Bin.Cheng
@ 2019-05-06 10:27             ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2019-05-06 10:27 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, Jakub Jelinek, GCC Patches

On Mon, May 6, 2019 at 12:24 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
>
> On Mon, May 6, 2019 at 6:11 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Sun, May 5, 2019 at 8:03 AM bin.cheng <bin.cheng@linux.alibaba.com> wrote:
> > >
> > > > ------------------------------------------------------------------
> > > > Sender:Jakub Jelinek <jakub@redhat.com>
> > > > Sent At:2019 Apr. 17 (Wed.) 19:27
> > > > Recipient:Bin.Cheng <amker.cheng@gmail.com>
> > > > Cc:bin.cheng <bin.cheng@linux.alibaba.com>; GCC Patches <gcc-patches@gcc.gnu.org>
> > > > Subject:Re: [PATCH PR90078]Capping comp_cost computation in ivopts
> > > >
> > > >
> > > > On Wed, Apr 17, 2019 at 07:14:05PM +0800, Bin.Cheng wrote:
> > > > > > As
> > > > > > #define INFTY 10000000
> > > > > > what is the reason to keep the previous condition as well?
> > > > > > I mean, if cost1.cost == INFTY or cost2.cost == INFTY,
> > > > > > cost1.cost + cost2.cost >= INFTY too.
> > > > > > Unless costs can go negative.
> > > > > It's a bit complicated, but in general, costs can go negative.
> > > >
> > > > Ok, no objections from me then (but as I don't know anything about it,
> > > > not an ack either; you are ivopts maintainer, so you don't need one).
> > >
> > > Hi,
> > > The previous patch was reverted on GCC-9 because of PR90240.  PR90240 is now
> > > fixed by another patch.  This is the updated patch for PR90078.  It promotes type
> > > of ivopts cost from int to int64_t, as well as change behavior of infinite_cost overflow
> > > from saturation to assert.
> > > Please note, implicit conversions are kept in cost computation as before without
> > > introducing any narrowing.
> > >
> > > Bootstrap/test on x86_64 along with PR90240 patch.  Is it OK?
> >
> > Do not include system headers in .c files, instead those need to be
> > (and are already)
> > included via system.h.
> >
> >  /* The infinite cost.  */
> > -#define INFTY 10000000
> > +#define INFTY 1000000000L
> >
> > do we actually need this?  What happens on a ilp32 host?  That is, I believe
> > you can drop the 'L' (it fits into an int anyways)
> Yeah, now I think if int64_t is necessary or not.  With the scaling
> bound and assertions.
> >
> > @@ -256,6 +259,7 @@ operator- (comp_cost cost1, comp_cost cost2)
> >      return infinite_cost;
> >
> >    gcc_assert (!cost2.infinite_cost_p ());
> > +  gcc_assert (cost1.cost - cost2.cost < infinite_cost.cost);
> >
> >    cost1.cost -= cost2.cost;
> >    cost1.complexity -= cost2.complexity;
> >
> > probably a pre-existing issue, but we do not seem to handle underflow
> > here in general, nor check that underflow doesn't get us below -INFTY.
> >
> > I guess we really don't want negative costs?  That doesn't seem to be
> > documented and I was also wondering why the cost isn't unsigned...
> >
> > @@ -638,7 +646,7 @@ struct iv_ca
> >    comp_cost cand_use_cost;
> >
> >    /* Total cost of candidates.  */
> > -  unsigned cand_cost;
> > +  int64_t cand_cost;
> >
> >    /* Number of times each invariant variable is used.  */
> >    unsigned *n_inv_var_uses;
> >
> > shows this "issue".  Can't we use uint64_t throughout the patch?
> Oh, it's actually explained in previous message,
> https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00697.html
> In short, yes the cost can be negative.  The negative cost should be
> small in absolute value, and we don't check -INFTY.  With the scaling
> bound change, I don't think -INFTY is possible IIUC.

Ah, I see.  The patch is OK then with 1000000000L in the #define
changed to just 1000000000

Richard.

> Thanks,
> bin
> >
> > Otherwise this looks OK.
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > bin
> > > 2019-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
> > >
> > >         PR tree-optimization/90078
> > >         * tree-ssa-loop-ivopts.c (inttypes.h): Include new header file.
> > >         (INFTY): Increase the value for infinite cost.
> > >         (struct comp_cost): Promote type of members to int64_t.
> > >         (infinite_cost): Don't set complexity in initialization.
> > >         (comp_cost::operator +,-,+=,-+,/=,*=): Assert when cost computation
> > >         overflows to infinite_cost.
> > >         (adjust_setup_cost): Promote type of parameter and cost computation
> > >         to int64_t.
> > >         (struct ainc_cost_data, struct iv_ca): Promote type of member to
> > >         int64_t.
> > >         (get_scaled_computation_cost_at, determine_iv_cost): Promote type of
> > >         cost computation to int64_t.
> > >         (determine_group_iv_costs, iv_ca_dump, find_optimal_iv_set): Use
> > >         int64_t's format specifier in dump.
> > >
> > > 2018-05-05  Bin Cheng  <bin.cheng@linux.alibaba.com>
> > >
> > >         PR tree-optimization/90078
> > >         * g++.dg/tree-ssa/pr90078.C: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-06 10:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17  7:00 [PATCH PR90078]Capping comp_cost computation in ivopts bin.cheng
2019-04-17  7:19 ` Jakub Jelinek
2019-04-17 11:14   ` Bin.Cheng
2019-04-17 11:35     ` Jakub Jelinek
2019-05-05  6:02       ` bin.cheng
2019-05-06 10:11         ` Richard Biener
2019-05-06 10:24           ` Bin.Cheng
2019-05-06 10:27             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).