public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add auto-resizing capability to irange's [PR109695]
@ 2023-05-15 10:35 Aldy Hernandez
  2023-05-15 10:42 ` Jakub Jelinek
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 10:35 UTC (permalink / raw)
  To: GCC patches
  Cc: Andrew MacLeod, Richard Biener, Jakub Jelinek, mjambor, Aldy Hernandez

<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges.  int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12.  This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>

int_range_max is the temporary range object we use in the ranger for
integers.  With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree.  Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.

This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.

For example:

int_range<1> foo;	// 1 sub-range with no resizing.
int_range<5> foo;	// 5 sub-ranges with no resizing.
int_range<5, true> foo;	// 5 sub-ranges with resizing.

I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:

	typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit.  This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.

With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%).  This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.

I think this approach is a good alternative, while providing us with
flexibility going forward.  For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP.  We could also use
large sub-ranges for switch analysis to avoid resizing.

Another approach I tried was always resizing.  With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range.  This simplified things, but incurred a 7% penalty on
ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind.  However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.

For the curious, a particular hot spot for IPA in this area was:

ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
  value_range save (m_vr);
  m_vr.union_ (*other_vr);
  return m_vr != save;
}

The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation.  Maybe
the IPA algorithm could be adjusted somehow??.

Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable.  But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.

I'll hold off a day or two, as I'd appreciate feedback here.

gcc/ChangeLog:

	PR tree-optimization/109695
	* value-range.cc (irange::operator=): Resize range.
	(irange::union_): Same.
	(irange::intersect): Same.
	(irange::invert): Same.
	(int_range_max): Default to 3 sub-ranges and resize as needed.
	* value-range.h (irange::maybe_resize): New.
	(~int_range): New.
	(int_range::int_range): Adjust for resizing.
	(int_range::operator=): Same.
---
 gcc/value-range.cc | 14 +++++++
 gcc/value-range.h  | 98 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 82 insertions(+), 30 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index def9299dc0e..cea4ff59254 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -901,6 +901,9 @@ frange::set_nonnegative (tree type)
 irange &
 irange::operator= (const irange &src)
 {
+  int needed = src.num_pairs ();
+  maybe_resize (needed);
+
   unsigned x;
   unsigned lim = src.m_num_ranges;
   if (lim > m_max_ranges)
@@ -1340,6 +1343,7 @@ irange::union_ (const vrange &v)
   // Now it simply needs to be copied, and if there are too many
   // ranges, merge some.  We wont do any analysis as to what the
   // "best" merges are, simply combine the final ranges into one.
+  maybe_resize (i / 2);
   if (i > m_max_ranges * 2)
     {
       res[m_max_ranges * 2 - 1] = res[i - 1];
@@ -1439,6 +1443,11 @@ irange::intersect (const vrange &v)
   if (r.irange_contains_p (*this))
     return intersect_nonzero_bits (r);
 
+  // ?? We could probably come up with something smarter than the
+  // worst case scenario here.
+  int needed = num_pairs () + r.num_pairs ();
+  maybe_resize (needed);
+
   signop sign = TYPE_SIGN (m_type);
   unsigned bld_pair = 0;
   unsigned bld_lim = m_max_ranges;
@@ -1646,6 +1655,11 @@ irange::invert ()
       m_num_ranges = 1;
       return;
     }
+
+  // At this point, we need one extra sub-range to represent the
+  // inverse.
+  maybe_resize (m_num_ranges + 1);
+
   // The algorithm is as follows.  To calculate INVERT ([a,b][c,d]), we
   // generate [-MIN, a-1][b+1, c-1][d+1, MAX].
   //
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 22b0250b11b..0da2a42764a 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -167,9 +167,10 @@ public:
   void set_nonzero_bits (const wide_int &bits);
 
 protected:
+  void maybe_resize (int needed);
   virtual void set (tree, tree, value_range_kind = VR_RANGE) override;
   virtual bool contains_p (tree cst) const override;
-  irange (wide_int *, unsigned);
+  irange (wide_int *, unsigned nranges, bool resizable);
 
    // In-place operators.
   bool irange_contains_p (const irange &) const;
@@ -179,6 +180,8 @@ protected:
 
   void verify_range ();
 
+  // Hard limit on max ranges allowed.
+  static const int HARD_MAX_RANGES = 255;
 private:
   friend void gt_ggc_mx (irange *);
   friend void gt_pch_nx (irange *);
@@ -192,16 +195,22 @@ private:
 
   bool intersect (const wide_int& lb, const wide_int& ub);
   unsigned char m_num_ranges;
-  const unsigned char m_max_ranges;
+  bool m_resizable;
+  unsigned char m_max_ranges;
   tree m_type;
   wide_int m_nonzero_mask;
+protected:
   wide_int *m_base;
 };
 
 // Here we describe an irange with N pairs of ranges.  The storage for
 // the pairs is embedded in the class as an array.
+//
+// If RESIZABLE is true, the storage will be resized on the heap when
+// the number of ranges needed goes past N up to a max of
+// HARD_MAX_RANGES.  This new storage is freed upon destruction.
 
-template<unsigned N>
+template<unsigned N, bool RESIZABLE = false>
 class GTY((user)) int_range : public irange
 {
 public:
@@ -211,7 +220,7 @@ public:
   int_range (tree type);
   int_range (const int_range &);
   int_range (const irange &);
-  virtual ~int_range () = default;
+  virtual ~int_range ();
   int_range& operator= (const int_range &);
 protected:
   int_range (tree, tree, value_range_kind = VR_RANGE);
@@ -451,6 +460,38 @@ is_a <frange> (vrange &v)
   return v.m_discriminator == VR_FRANGE;
 }
 
+// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
+// NEEDED pairs is greater than the current capacity of the range.
+
+inline void
+irange::maybe_resize (int needed)
+{
+  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
+    return;
+
+  if (needed > m_max_ranges)
+    {
+      m_max_ranges = HARD_MAX_RANGES;
+      wide_int *newmem = new wide_int[m_max_ranges * 2];
+      memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
+      m_base = newmem;
+    }
+}
+
+template<unsigned N, bool RESIZABLE>
+inline
+int_range<N, RESIZABLE>::~int_range ()
+{
+  if (RESIZABLE && m_base != m_ranges)
+    delete m_base;
+}
+
+// This is an "infinite" precision irange for use in temporary
+// calculations.  It starts with a sensible default covering 99% of
+// uses, and goes up to HARD_MAX_RANGES when needed.  Any allocated
+// storage is freed upon destruction.
+typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
+
 class vrange_visitor
 {
 public:
@@ -461,10 +502,6 @@ public:
 
 typedef int_range<2> value_range;
 
-// This is an "infinite" precision irange for use in temporary
-// calculations.
-typedef int_range<255> int_range_max;
-
 // This is an "infinite" precision range object for use in temporary
 // calculations for any of the handled types.  The object can be
 // transparently used as a vrange.
@@ -757,8 +794,9 @@ gt_pch_nx (int_range<N> *x, gt_pointer_operator op, void *cookie)
 // Constructors for irange
 
 inline
-irange::irange (wide_int *base, unsigned nranges)
+irange::irange (wide_int *base, unsigned nranges, bool resizable)
   : vrange (VR_IRANGE),
+    m_resizable (resizable),
     m_max_ranges (nranges)
 {
   m_base = base;
@@ -767,52 +805,52 @@ irange::irange (wide_int *base, unsigned nranges)
 
 // Constructors for int_range<>.
 
-template<unsigned N>
+template<unsigned N, bool RESIZABLE>
 inline
-int_range<N>::int_range ()
-  : irange (m_ranges, N)
+int_range<N, RESIZABLE>::int_range ()
+  : irange (m_ranges, N, RESIZABLE)
 {
 }
 
-template<unsigned N>
-int_range<N>::int_range (const int_range &other)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (const int_range &other)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::operator= (other);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree min, tree max, value_range_kind kind)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree min, tree max, value_range_kind kind)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::set (min, max, kind);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree type)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree type)
+  : irange (m_ranges, N, RESIZABLE)
 {
   set_varying (type);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
 			 value_range_kind kind)
-  : irange (m_ranges, N)
+  : irange (m_ranges, N, RESIZABLE)
 {
   set (type, wmin, wmax, kind);
 }
 
-template<unsigned N>
-int_range<N>::int_range (const irange &other)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (const irange &other)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::operator= (other);
 }
 
-template<unsigned N>
-int_range<N>&
-int_range<N>::operator= (const int_range &src)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>&
+int_range<N, RESIZABLE>::operator= (const int_range &src)
 {
   irange::operator= (src);
   return *this;
-- 
2.40.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 10:35 [PATCH] Add auto-resizing capability to irange's [PR109695] Aldy Hernandez
@ 2023-05-15 10:42 ` Jakub Jelinek
  2023-05-15 15:07   ` Aldy Hernandez
  2023-05-15 11:08 ` Richard Biener
  2023-05-15 14:24 ` Bernhard Reutner-Fischer
  2 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-05-15 10:42 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC patches, Andrew MacLeod, Richard Biener, mjambor

On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:
> gcc/ChangeLog:
> 
> 	PR tree-optimization/109695
> 	* value-range.cc (irange::operator=): Resize range.
> 	(irange::union_): Same.
> 	(irange::intersect): Same.
> 	(irange::invert): Same.
> 	(int_range_max): Default to 3 sub-ranges and resize as needed.
> 	* value-range.h (irange::maybe_resize): New.
> 	(~int_range): New.
> 	(int_range::int_range): Adjust for resizing.
> 	(int_range::operator=): Same.

LGTM.

One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
changing it to some larger number than 3 when the members aren't wide_ints
in there but just trees.  Sure, in 13/12 the problem is 10x less severe
than in current trunk, but still we have some cases where we run out of
stack because of it on some hosts.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 10:35 [PATCH] Add auto-resizing capability to irange's [PR109695] Aldy Hernandez
  2023-05-15 10:42 ` Jakub Jelinek
@ 2023-05-15 11:08 ` Richard Biener
  2023-05-15 11:26   ` Jakub Jelinek
  2023-05-15 15:03   ` Aldy Hernandez
  2023-05-15 14:24 ` Bernhard Reutner-Fischer
  2 siblings, 2 replies; 11+ messages in thread
From: Richard Biener @ 2023-05-15 11:08 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC patches, Andrew MacLeod, Jakub Jelinek, mjambor

On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> <tldr>
> We can now have int_range<N, RESIZABLE=false> for automatically
> resizable ranges.  int_range_max is now int_range<3, true>
> for a 69X reduction in size from current trunk, and 6.9X reduction from
> GCC12.  This incurs a 5% performance penalty for VRP that is more than
> covered by our > 13% improvements recently.
> </tldr>
>
> int_range_max is the temporary range object we use in the ranger for
> integers.  With the conversion to wide_int, this structure bloated up
> significantly because wide_ints are huge (80 bytes a piece) and are
> about 10 times as big as a plain tree.  Since the temporary object
> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
> This means the structure grew from 4112 bytes to 40912 bytes.
>
> This patch adds the ability to resize ranges as needed, defaulting to
> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
> of 255) and grows to 255 when the range being calculated does not fit.
>
> For example:
>
> int_range<1> foo;       // 1 sub-range with no resizing.
> int_range<5> foo;       // 5 sub-ranges with no resizing.
> int_range<5, true> foo; // 5 sub-ranges with resizing.
>
> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
> I've set the int_range_max default to that:
>
>         typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
>
> We don't bother growing incrementally, since the default covers most
> cases and we have a 255 hard-limit.  This hard limit could be reduced
> to 128, since my tests never saw a range needing more than 124, but we
> could do that as a follow-up if needed.
>
> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
> and 3.02% for threading, with no noticeable change in overall
> compilation (0.27%).  This is more than covered by our 13.26%
> improvements for the legacy removal + wide_int conversion.

Thanks for doing this.

> I think this approach is a good alternative, while providing us with
> flexibility going forward.  For example, we could try defaulting to a
> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
> large sub-ranges for switch analysis to avoid resizing.
>
> Another approach I tried was always resizing.  With this, we could
> drop the whole int_range<N> nonsense, and have irange just hold a
> resizable range.  This simplified things, but incurred a 7% penalty on
> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
> this wasn't some artifact of valgrind.  However, until we're sure,
> let's avoid massive changes, especially since IPA changes are coming
> up.
>
> For the curious, a particular hot spot for IPA in this area was:
>
> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> {
> ...
> ...
>   value_range save (m_vr);
>   m_vr.union_ (*other_vr);
>   return m_vr != save;
> }
>
> The problem isn't the resizing (since we do that at most once) but the
> fact that for some functions with lots of callers we end up a huge
> range that gets copied and compared for every meet operation.  Maybe
> the IPA algorithm could be adjusted somehow??.

Well, the above just wants to know whether the union_ operation changed
the range.  I suppose that would be an interesting (and easy to compute?)
secondary output of union_ and it seems it already computes that (but
maybe not correctly?).  So I suggest to change the above to

  bool res;
  if (flag_checking)
   {
      value_range save (m_vr);
      res = m_vr.union_ (*other_vr);
      gcc_assert (res == (m_vr != save));
   }
 else
    res = m_vr.union (*other_vr);
 return res;

Btw, why's there a trailing underscore for union but not intersect?

Richard.

> Anywhooo... for now there is nothing to worry about, since value_range
> still has 2 subranges and is not resizable.  But we should probably
> think what if anything we want to do here, as I envision IPA using
> infinite ranges here (well, int_range_max) and handling frange's, etc.
>
> I'll hold off a day or two, as I'd appreciate feedback here.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/109695
>         * value-range.cc (irange::operator=): Resize range.
>         (irange::union_): Same.
>         (irange::intersect): Same.
>         (irange::invert): Same.
>         (int_range_max): Default to 3 sub-ranges and resize as needed.
>         * value-range.h (irange::maybe_resize): New.
>         (~int_range): New.
>         (int_range::int_range): Adjust for resizing.
>         (int_range::operator=): Same.
> ---
>  gcc/value-range.cc | 14 +++++++
>  gcc/value-range.h  | 98 ++++++++++++++++++++++++++++++++--------------
>  2 files changed, 82 insertions(+), 30 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index def9299dc0e..cea4ff59254 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -901,6 +901,9 @@ frange::set_nonnegative (tree type)
>  irange &
>  irange::operator= (const irange &src)
>  {
> +  int needed = src.num_pairs ();
> +  maybe_resize (needed);
> +
>    unsigned x;
>    unsigned lim = src.m_num_ranges;
>    if (lim > m_max_ranges)
> @@ -1340,6 +1343,7 @@ irange::union_ (const vrange &v)
>    // Now it simply needs to be copied, and if there are too many
>    // ranges, merge some.  We wont do any analysis as to what the
>    // "best" merges are, simply combine the final ranges into one.
> +  maybe_resize (i / 2);
>    if (i > m_max_ranges * 2)
>      {
>        res[m_max_ranges * 2 - 1] = res[i - 1];
> @@ -1439,6 +1443,11 @@ irange::intersect (const vrange &v)
>    if (r.irange_contains_p (*this))
>      return intersect_nonzero_bits (r);
>
> +  // ?? We could probably come up with something smarter than the
> +  // worst case scenario here.
> +  int needed = num_pairs () + r.num_pairs ();
> +  maybe_resize (needed);
> +
>    signop sign = TYPE_SIGN (m_type);
>    unsigned bld_pair = 0;
>    unsigned bld_lim = m_max_ranges;
> @@ -1646,6 +1655,11 @@ irange::invert ()
>        m_num_ranges = 1;
>        return;
>      }
> +
> +  // At this point, we need one extra sub-range to represent the
> +  // inverse.
> +  maybe_resize (m_num_ranges + 1);
> +
>    // The algorithm is as follows.  To calculate INVERT ([a,b][c,d]), we
>    // generate [-MIN, a-1][b+1, c-1][d+1, MAX].
>    //
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 22b0250b11b..0da2a42764a 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -167,9 +167,10 @@ public:
>    void set_nonzero_bits (const wide_int &bits);
>
>  protected:
> +  void maybe_resize (int needed);
>    virtual void set (tree, tree, value_range_kind = VR_RANGE) override;
>    virtual bool contains_p (tree cst) const override;
> -  irange (wide_int *, unsigned);
> +  irange (wide_int *, unsigned nranges, bool resizable);
>
>     // In-place operators.
>    bool irange_contains_p (const irange &) const;
> @@ -179,6 +180,8 @@ protected:
>
>    void verify_range ();
>
> +  // Hard limit on max ranges allowed.
> +  static const int HARD_MAX_RANGES = 255;
>  private:
>    friend void gt_ggc_mx (irange *);
>    friend void gt_pch_nx (irange *);
> @@ -192,16 +195,22 @@ private:
>
>    bool intersect (const wide_int& lb, const wide_int& ub);
>    unsigned char m_num_ranges;
> -  const unsigned char m_max_ranges;
> +  bool m_resizable;
> +  unsigned char m_max_ranges;
>    tree m_type;
>    wide_int m_nonzero_mask;
> +protected:
>    wide_int *m_base;
>  };
>
>  // Here we describe an irange with N pairs of ranges.  The storage for
>  // the pairs is embedded in the class as an array.
> +//
> +// If RESIZABLE is true, the storage will be resized on the heap when
> +// the number of ranges needed goes past N up to a max of
> +// HARD_MAX_RANGES.  This new storage is freed upon destruction.
>
> -template<unsigned N>
> +template<unsigned N, bool RESIZABLE = false>
>  class GTY((user)) int_range : public irange
>  {
>  public:
> @@ -211,7 +220,7 @@ public:
>    int_range (tree type);
>    int_range (const int_range &);
>    int_range (const irange &);
> -  virtual ~int_range () = default;
> +  virtual ~int_range ();
>    int_range& operator= (const int_range &);
>  protected:
>    int_range (tree, tree, value_range_kind = VR_RANGE);
> @@ -451,6 +460,38 @@ is_a <frange> (vrange &v)
>    return v.m_discriminator == VR_FRANGE;
>  }
>
> +// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
> +// NEEDED pairs is greater than the current capacity of the range.
> +
> +inline void
> +irange::maybe_resize (int needed)
> +{
> +  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
> +    return;
> +
> +  if (needed > m_max_ranges)
> +    {
> +      m_max_ranges = HARD_MAX_RANGES;
> +      wide_int *newmem = new wide_int[m_max_ranges * 2];
> +      memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
> +      m_base = newmem;
> +    }
> +}
> +
> +template<unsigned N, bool RESIZABLE>
> +inline
> +int_range<N, RESIZABLE>::~int_range ()
> +{
> +  if (RESIZABLE && m_base != m_ranges)
> +    delete m_base;
> +}
> +
> +// This is an "infinite" precision irange for use in temporary
> +// calculations.  It starts with a sensible default covering 99% of
> +// uses, and goes up to HARD_MAX_RANGES when needed.  Any allocated
> +// storage is freed upon destruction.
> +typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
> +
>  class vrange_visitor
>  {
>  public:
> @@ -461,10 +502,6 @@ public:
>
>  typedef int_range<2> value_range;
>
> -// This is an "infinite" precision irange for use in temporary
> -// calculations.
> -typedef int_range<255> int_range_max;
> -
>  // This is an "infinite" precision range object for use in temporary
>  // calculations for any of the handled types.  The object can be
>  // transparently used as a vrange.
> @@ -757,8 +794,9 @@ gt_pch_nx (int_range<N> *x, gt_pointer_operator op, void *cookie)
>  // Constructors for irange
>
>  inline
> -irange::irange (wide_int *base, unsigned nranges)
> +irange::irange (wide_int *base, unsigned nranges, bool resizable)
>    : vrange (VR_IRANGE),
> +    m_resizable (resizable),
>      m_max_ranges (nranges)
>  {
>    m_base = base;
> @@ -767,52 +805,52 @@ irange::irange (wide_int *base, unsigned nranges)
>
>  // Constructors for int_range<>.
>
> -template<unsigned N>
> +template<unsigned N, bool RESIZABLE>
>  inline
> -int_range<N>::int_range ()
> -  : irange (m_ranges, N)
> +int_range<N, RESIZABLE>::int_range ()
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>  }
>
> -template<unsigned N>
> -int_range<N>::int_range (const int_range &other)
> -  : irange (m_ranges, N)
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>::int_range (const int_range &other)
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>    irange::operator= (other);
>  }
>
> -template<unsigned N>
> -int_range<N>::int_range (tree min, tree max, value_range_kind kind)
> -  : irange (m_ranges, N)
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>::int_range (tree min, tree max, value_range_kind kind)
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>    irange::set (min, max, kind);
>  }
>
> -template<unsigned N>
> -int_range<N>::int_range (tree type)
> -  : irange (m_ranges, N)
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>::int_range (tree type)
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>    set_varying (type);
>  }
>
> -template<unsigned N>
> -int_range<N>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
>                          value_range_kind kind)
> -  : irange (m_ranges, N)
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>    set (type, wmin, wmax, kind);
>  }
>
> -template<unsigned N>
> -int_range<N>::int_range (const irange &other)
> -  : irange (m_ranges, N)
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>::int_range (const irange &other)
> +  : irange (m_ranges, N, RESIZABLE)
>  {
>    irange::operator= (other);
>  }
>
> -template<unsigned N>
> -int_range<N>&
> -int_range<N>::operator= (const int_range &src)
> +template<unsigned N, bool RESIZABLE>
> +int_range<N, RESIZABLE>&
> +int_range<N, RESIZABLE>::operator= (const int_range &src)
>  {
>    irange::operator= (src);
>    return *this;
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 11:08 ` Richard Biener
@ 2023-05-15 11:26   ` Jakub Jelinek
  2023-05-15 15:03   ` Aldy Hernandez
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2023-05-15 11:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: Aldy Hernandez, GCC patches, Andrew MacLeod, mjambor

On Mon, May 15, 2023 at 01:08:51PM +0200, Richard Biener wrote:
> Btw, why's there a trailing underscore for union but not intersect?

Because union is a C++ keyword, while intersect is not.

	Jakub


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 10:35 [PATCH] Add auto-resizing capability to irange's [PR109695] Aldy Hernandez
  2023-05-15 10:42 ` Jakub Jelinek
  2023-05-15 11:08 ` Richard Biener
@ 2023-05-15 14:24 ` Bernhard Reutner-Fischer
  2023-05-15 14:27   ` Aldy Hernandez
  2 siblings, 1 reply; 11+ messages in thread
From: Bernhard Reutner-Fischer @ 2023-05-15 14:24 UTC (permalink / raw)
  To: Aldy Hernandez via Gcc-patches
  Cc: rep.dot.nop, Aldy Hernandez, Andrew MacLeod, Richard Biener,
	Jakub Jelinek, mjambor

On Mon, 15 May 2023 12:35:23 +0200
Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

> +// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
> +// NEEDED pairs is greater than the current capacity of the range.
> +
> +inline void
> +irange::maybe_resize (int needed)
> +{
> +  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
> +    return;
> +
> +  if (needed > m_max_ranges)
> +    {
> +      m_max_ranges = HARD_MAX_RANGES;
> +      wide_int *newmem = new wide_int[m_max_ranges * 2];
> +      memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
> +      m_base = newmem;

Please excuse my ignorance, but where's the old m_base freed? I think
the assignment above does not call the destructor, or does it?

thanks,

> +    }
> +}
> +
> +template<unsigned N, bool RESIZABLE>
> +inline
> +int_range<N, RESIZABLE>::~int_range ()
> +{
> +  if (RESIZABLE && m_base != m_ranges)
> +    delete m_base;
> +}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 14:24 ` Bernhard Reutner-Fischer
@ 2023-05-15 14:27   ` Aldy Hernandez
  0 siblings, 0 replies; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 14:27 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Aldy Hernandez via Gcc-patches
  Cc: Andrew MacLeod, Richard Biener, Jakub Jelinek, mjambor



On 5/15/23 16:24, Bernhard Reutner-Fischer wrote:
> On Mon, 15 May 2023 12:35:23 +0200
> Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
>> +// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
>> +// NEEDED pairs is greater than the current capacity of the range.
>> +
>> +inline void
>> +irange::maybe_resize (int needed)
>> +{
>> +  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
>> +    return;
>> +
>> +  if (needed > m_max_ranges)
>> +    {
>> +      m_max_ranges = HARD_MAX_RANGES;
>> +      wide_int *newmem = new wide_int[m_max_ranges * 2];
>> +      memcpy (newmem, m_base, sizeof (wide_int) * num_pairs () * 2);
>> +      m_base = newmem;
> 
> Please excuse my ignorance, but where's the old m_base freed? I think
> the assignment above does not call the destructor, or does it?

The old m_base is never freed because it points to m_ranges, a static 
array in int_range:

template<unsigned N, bool RESIZABLE = false>
class GTY((user)) int_range : public irange
{
...
...
private:
   wide_int m_ranges[N*2];
};

Aldy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 11:08 ` Richard Biener
  2023-05-15 11:26   ` Jakub Jelinek
@ 2023-05-15 15:03   ` Aldy Hernandez
  2023-05-15 17:23     ` Aldy Hernandez
  1 sibling, 1 reply; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 15:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC patches, Andrew MacLeod, Jakub Jelinek, mjambor

[-- Attachment #1: Type: text/plain, Size: 4897 bytes --]



On 5/15/23 13:08, Richard Biener wrote:
> On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>> <tldr>
>> We can now have int_range<N, RESIZABLE=false> for automatically
>> resizable ranges.  int_range_max is now int_range<3, true>
>> for a 69X reduction in size from current trunk, and 6.9X reduction from
>> GCC12.  This incurs a 5% performance penalty for VRP that is more than
>> covered by our > 13% improvements recently.
>> </tldr>
>>
>> int_range_max is the temporary range object we use in the ranger for
>> integers.  With the conversion to wide_int, this structure bloated up
>> significantly because wide_ints are huge (80 bytes a piece) and are
>> about 10 times as big as a plain tree.  Since the temporary object
>> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
>> This means the structure grew from 4112 bytes to 40912 bytes.
>>
>> This patch adds the ability to resize ranges as needed, defaulting to
>> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
>> of 255) and grows to 255 when the range being calculated does not fit.
>>
>> For example:
>>
>> int_range<1> foo;       // 1 sub-range with no resizing.
>> int_range<5> foo;       // 5 sub-ranges with no resizing.
>> int_range<5, true> foo; // 5 sub-ranges with resizing.
>>
>> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
>> I've set the int_range_max default to that:
>>
>>          typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
>>
>> We don't bother growing incrementally, since the default covers most
>> cases and we have a 255 hard-limit.  This hard limit could be reduced
>> to 128, since my tests never saw a range needing more than 124, but we
>> could do that as a follow-up if needed.
>>
>> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
>> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
>> and 3.02% for threading, with no noticeable change in overall
>> compilation (0.27%).  This is more than covered by our 13.26%
>> improvements for the legacy removal + wide_int conversion.
> 
> Thanks for doing this.
> 
>> I think this approach is a good alternative, while providing us with
>> flexibility going forward.  For example, we could try defaulting to a
>> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
>> large sub-ranges for switch analysis to avoid resizing.
>>
>> Another approach I tried was always resizing.  With this, we could
>> drop the whole int_range<N> nonsense, and have irange just hold a
>> resizable range.  This simplified things, but incurred a 7% penalty on
>> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
>> this wasn't some artifact of valgrind.  However, until we're sure,
>> let's avoid massive changes, especially since IPA changes are coming
>> up.
>>
>> For the curious, a particular hot spot for IPA in this area was:
>>
>> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
>> {
>> ...
>> ...
>>    value_range save (m_vr);
>>    m_vr.union_ (*other_vr);
>>    return m_vr != save;
>> }
>>
>> The problem isn't the resizing (since we do that at most once) but the
>> fact that for some functions with lots of callers we end up a huge
>> range that gets copied and compared for every meet operation.  Maybe
>> the IPA algorithm could be adjusted somehow??.
> 
> Well, the above just wants to know whether the union_ operation changed
> the range.  I suppose that would be an interesting (and easy to compute?)
> secondary output of union_ and it seems it already computes that (but
> maybe not correctly?).  So I suggest to change the above to

union_ returns a value specifically for that, which Andrew uses for 
cache optimization.  For that matter, your suggestion was my first 
approach, but I quickly found out we were being overly pessimistic in 
some cases, and I was too lazy to figure out why.

> 
>    bool res;
>    if (flag_checking)
>     {
>        value_range save (m_vr);
>        res = m_vr.union_ (*other_vr);
>        gcc_assert (res == (m_vr != save));
>     }
>   else
>      res = m_vr.union (*other_vr);
>   return res;

With your suggested sanity check I chased the problem to a minor 
inconsistency when unioning nonzero masks.  The issue wasn't a bug, just 
a pessimization.  I'm attaching a patch that corrects the oversight 
(well, not oversight, everything was more expensive with trees)... It 
yields a 6.89% improvement to the ipa-cp pass!!!  Thanks.

I'll push it if it passes tests.

BTW, without the annoying IPA-cp performance regression, this paves the 
way for nuking int_range<N> in favor of just irange, and have everything 
resize as needed.  I'll wait for Andrew to chime in when he returns from 
PTO, since we may want to leave int_range<N> around since it does 
provide flexibility (at the expensive of fugly looking declarations).

Aldy

[-- Attachment #2: 0001-Only-return-changed-true-in-union_nonzero-when-appro.patch --]
[-- Type: text/x-patch, Size: 3350 bytes --]

From 6a7354d3494665d46f8cbfc71c58f784c02142ff Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Mon, 15 May 2023 15:10:11 +0200
Subject: [PATCH] Only return changed=true in union_nonzero when appropriate.

irange::union_ was being overly pessimistic in its return value.  It
was returning false when the nonzero mask was possibly the same.

The reason for this is because the nonzero mask is not entirely kept
up to date.  We avoid setting it up when a new range is set (from a
set, intersect, union, etc), because calculating a mask from a range
is measurably expensive.  However, irange::get_nonzero_bits() will
always return the correct mask because it will calculate the nonzero
mask inherit in the mask on the fly and bitwise or it with the saved
mask.  This was an optimization because last release it was a big
penalty to keep the mask up to date.  This may not necessarily be the
case with the conversion to wide_int's.  We should investigate.

Just to be clear, the result from get_nonzero_bits() is always correct
as seen by the user, but the wide_int in the irange does not contain
all the information, since part of the nonzero bits can be determined
by the range itself, on the fly.

The fix here is to make sure that the result the user sees (callers of
get_nonzero_bits()) changed when unioning bits.  This allows
ipcp_vr_lattice::meet_with_1 to avoid unnecessary copies when
determining if a range changed.

This patch yields an 6.89% improvement to the ipa_cp pass.  I'm
including the IPA changes in this patch, as it's a testcase of sorts for
the change.

gcc/ChangeLog:

	* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Avoid unnecessary
	range copying
	* value-range.cc (irange::union_nonzero_bits): Return TRUE only
	when range changed.
---
 gcc/ipa-cp.cc      | 13 ++++++++++---
 gcc/value-range.cc |  5 +++--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 1f5e0e13872..8cd0fa2cae7 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1040,9 +1040,16 @@ ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
   if (other_vr->varying_p ())
     return set_to_bottom ();
 
-  value_range save (m_vr);
-  m_vr.union_ (*other_vr);
-  return m_vr != save;
+  bool res;
+  if (flag_checking)
+    {
+      value_range save (m_vr);
+      res = m_vr.union_ (*other_vr);
+      gcc_assert (res == (m_vr != save));
+    }
+  else
+    res = m_vr.union_ (*other_vr);
+  return res;
 }
 
 /* Return true if value range information in the lattice is yet unknown.  */
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index def9299dc0e..a341cece86d 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1859,12 +1859,13 @@ irange::union_nonzero_bits (const irange &r)
   bool changed = false;
   if (m_nonzero_mask != r.m_nonzero_mask)
     {
-      m_nonzero_mask = get_nonzero_bits () | r.get_nonzero_bits ();
+      wide_int save = get_nonzero_bits ();
+      m_nonzero_mask = save | r.get_nonzero_bits ();
       // No need to call set_range_from_nonzero_bits, because we'll
       // never narrow the range.  Besides, it would cause endless
       // recursion because of the union_ in
       // set_range_from_nonzero_bits.
-      changed = true;
+      changed = m_nonzero_mask != save;
     }
   normalize_kind ();
   if (flag_checking)
-- 
2.40.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 10:42 ` Jakub Jelinek
@ 2023-05-15 15:07   ` Aldy Hernandez
  2023-05-15 18:14     ` Aldy Hernandez
  0 siblings, 1 reply; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 15:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Biener, mjambor



On 5/15/23 12:42, Jakub Jelinek wrote:
> On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/109695
>> 	* value-range.cc (irange::operator=): Resize range.
>> 	(irange::union_): Same.
>> 	(irange::intersect): Same.
>> 	(irange::invert): Same.
>> 	(int_range_max): Default to 3 sub-ranges and resize as needed.
>> 	* value-range.h (irange::maybe_resize): New.
>> 	(~int_range): New.
>> 	(int_range::int_range): Adjust for resizing.
>> 	(int_range::operator=): Same.
> 
> LGTM.
> 
> One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
> changing it to some larger number than 3 when the members aren't wide_ints
> in there but just trees.  Sure, in 13/12 the problem is 10x less severe
> than in current trunk, but still we have some cases where we run out of
> stack because of it on some hosts.

Sure, but that would require messing around with the gt_* GTY functions, 
and making sure we're allocating the trees from a sensible place, etc 
etc.  I'm less confident in my ability to mess with GTY stuff this late 
in the game.

Thoughts?
Aldy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 15:03   ` Aldy Hernandez
@ 2023-05-15 17:23     ` Aldy Hernandez
  0 siblings, 0 replies; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 17:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC patches, Andrew MacLeod, Jakub Jelinek, mjambor

On Mon, May 15, 2023 at 5:03 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 5/15/23 13:08, Richard Biener wrote:
> > On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >>
> >> <tldr>
> >> We can now have int_range<N, RESIZABLE=false> for automatically
> >> resizable ranges.  int_range_max is now int_range<3, true>
> >> for a 69X reduction in size from current trunk, and 6.9X reduction from
> >> GCC12.  This incurs a 5% performance penalty for VRP that is more than
> >> covered by our > 13% improvements recently.
> >> </tldr>
> >>
> >> int_range_max is the temporary range object we use in the ranger for
> >> integers.  With the conversion to wide_int, this structure bloated up
> >> significantly because wide_ints are huge (80 bytes a piece) and are
> >> about 10 times as big as a plain tree.  Since the temporary object
> >> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
> >> This means the structure grew from 4112 bytes to 40912 bytes.
> >>
> >> This patch adds the ability to resize ranges as needed, defaulting to
> >> no resizing, while int_range_max now defaults to 3 sub-ranges (instead
> >> of 255) and grows to 255 when the range being calculated does not fit.
> >>
> >> For example:
> >>
> >> int_range<1> foo;       // 1 sub-range with no resizing.
> >> int_range<5> foo;       // 5 sub-ranges with no resizing.
> >> int_range<5, true> foo; // 5 sub-ranges with resizing.
> >>
> >> I ran some tests and found that 3 sub-ranges cover 99% of cases, so
> >> I've set the int_range_max default to that:
> >>
> >>          typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
> >>
> >> We don't bother growing incrementally, since the default covers most
> >> cases and we have a 255 hard-limit.  This hard limit could be reduced
> >> to 128, since my tests never saw a range needing more than 124, but we
> >> could do that as a follow-up if needed.
> >>
> >> With 3-subranges, int_range_max is now 592 bytes versus 40912 for
> >> trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
> >> and 3.02% for threading, with no noticeable change in overall
> >> compilation (0.27%).  This is more than covered by our 13.26%
> >> improvements for the legacy removal + wide_int conversion.
> >
> > Thanks for doing this.
> >
> >> I think this approach is a good alternative, while providing us with
> >> flexibility going forward.  For example, we could try defaulting to a
> >> 8 sub-ranges for a noticeable improvement in VRP.  We could also use
> >> large sub-ranges for switch analysis to avoid resizing.
> >>
> >> Another approach I tried was always resizing.  With this, we could
> >> drop the whole int_range<N> nonsense, and have irange just hold a
> >> resizable range.  This simplified things, but incurred a 7% penalty on
> >> ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
> >> this wasn't some artifact of valgrind.  However, until we're sure,
> >> let's avoid massive changes, especially since IPA changes are coming
> >> up.
> >>
> >> For the curious, a particular hot spot for IPA in this area was:
> >>
> >> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> >> {
> >> ...
> >> ...
> >>    value_range save (m_vr);
> >>    m_vr.union_ (*other_vr);
> >>    return m_vr != save;
> >> }
> >>
> >> The problem isn't the resizing (since we do that at most once) but the
> >> fact that for some functions with lots of callers we end up a huge
> >> range that gets copied and compared for every meet operation.  Maybe
> >> the IPA algorithm could be adjusted somehow??.
> >
> > Well, the above just wants to know whether the union_ operation changed
> > the range.  I suppose that would be an interesting (and easy to compute?)
> > secondary output of union_ and it seems it already computes that (but
> > maybe not correctly?).  So I suggest to change the above to
>
> union_ returns a value specifically for that, which Andrew uses for
> cache optimization.  For that matter, your suggestion was my first
> approach, but I quickly found out we were being overly pessimistic in
> some cases, and I was too lazy to figure out why.
>
> >
> >    bool res;
> >    if (flag_checking)
> >     {
> >        value_range save (m_vr);
> >        res = m_vr.union_ (*other_vr);
> >        gcc_assert (res == (m_vr != save));
> >     }
> >   else
> >      res = m_vr.union (*other_vr);
> >   return res;
>
> With your suggested sanity check I chased the problem to a minor
> inconsistency when unioning nonzero masks.  The issue wasn't a bug, just
> a pessimization.  I'm attaching a patch that corrects the oversight
> (well, not oversight, everything was more expensive with trees)... It
> yields a 6.89% improvement to the ipa-cp pass!!!  Thanks.
>
> I'll push it if it passes tests.

Tests passed.  Pushed patch.

I've also pushed the original patch in this email.  We can address
anything else as a follow-up.

Thanks for everyone's feedback.
Aldy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 15:07   ` Aldy Hernandez
@ 2023-05-15 18:14     ` Aldy Hernandez
  2023-05-16  9:24       ` Aldy Hernandez
  0 siblings, 1 reply; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-15 18:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Biener, mjambor

On 5/15/23 17:07, Aldy Hernandez wrote:
> 
> 
> On 5/15/23 12:42, Jakub Jelinek wrote:
>> On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:
>>> gcc/ChangeLog:
>>>
>>>     PR tree-optimization/109695
>>>     * value-range.cc (irange::operator=): Resize range.
>>>     (irange::union_): Same.
>>>     (irange::intersect): Same.
>>>     (irange::invert): Same.
>>>     (int_range_max): Default to 3 sub-ranges and resize as needed.
>>>     * value-range.h (irange::maybe_resize): New.
>>>     (~int_range): New.
>>>     (int_range::int_range): Adjust for resizing.
>>>     (int_range::operator=): Same.
>>
>> LGTM.
>>
>> One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
>> changing it to some larger number than 3 when the members aren't 
>> wide_ints
>> in there but just trees.  Sure, in 13/12 the problem is 10x less severe
>> than in current trunk, but still we have some cases where we run out of
>> stack because of it on some hosts.
> 
> Sure, but that would require messing around with the gt_* GTY functions, 
> and making sure we're allocating the trees from a sensible place, etc 
> etc.  I'm less confident in my ability to mess with GTY stuff this late 
> in the game.

Hmmm, maybe backporting this isn't too bad.  The only time we'd have a 
chunk on the heap is for int_range_max, which will never live in GC 
space.  So I don't think we need to worry about GC at all.

Although, legacy mode in GCC13 does get in a the way a bit.  Sigh.

And unrealted, but speaking of GC... we should remove all GTY markers 
from vrange.  It should never live in GC.  That's why we have 
vrange_storage for, and that is what we put in the tree_ssa_name.

  /* Value range information.  */
   union ssa_name_info_type {
     /* Range and aliasing info for pointers.  */
     struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
     /* Range info for everything else.  */
     struct GTY ((tag ("1"))) vrange_storage * range_info;
   } GTY ((desc ("%1.typed.type ?" \
                 "!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;

That should have been the only use of range GC stuff, but alas another 
one crept in... IPA:

struct GTY (()) ipa_jump_func
{
...
   /* Information about value range, containing valid data only when 
vr_known is
      true.  The pointed to structure is shared betweed different jump
      functions.  Use ipa_set_jfunc_vr to set this field.  */
   value_range *m_vr;
...
};

This means that we can't nuke int_range<N> and default to an always 
resizable range just yet, because we'll end up with the value_range in 
GC memory, and resizable part in the heap.

That m_vr pointer should be a pointer to vrange_storage.  Meh...I'm 
bumping against my IPA work yet again.  I think it's time to start 
dusting off those patches.

Aldy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Add auto-resizing capability to irange's [PR109695]
  2023-05-15 18:14     ` Aldy Hernandez
@ 2023-05-16  9:24       ` Aldy Hernandez
  0 siblings, 0 replies; 11+ messages in thread
From: Aldy Hernandez @ 2023-05-16  9:24 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Biener, mjambor

[-- Attachment #1: Type: text/plain, Size: 2606 bytes --]



On 5/15/23 20:14, Aldy Hernandez wrote:
> On 5/15/23 17:07, Aldy Hernandez wrote:
>>
>>
>> On 5/15/23 12:42, Jakub Jelinek wrote:
>>> On Mon, May 15, 2023 at 12:35:23PM +0200, Aldy Hernandez wrote:
>>>> gcc/ChangeLog:
>>>>
>>>>     PR tree-optimization/109695
>>>>     * value-range.cc (irange::operator=): Resize range.
>>>>     (irange::union_): Same.
>>>>     (irange::intersect): Same.
>>>>     (irange::invert): Same.
>>>>     (int_range_max): Default to 3 sub-ranges and resize as needed.
>>>>     * value-range.h (irange::maybe_resize): New.
>>>>     (~int_range): New.
>>>>     (int_range::int_range): Adjust for resizing.
>>>>     (int_range::operator=): Same.
>>>
>>> LGTM.
>>>
>>> One question is if we shouldn't do it for GCC13/GCC12 as well, perhaps
>>> changing it to some larger number than 3 when the members aren't 
>>> wide_ints
>>> in there but just trees.  Sure, in 13/12 the problem is 10x less severe
>>> than in current trunk, but still we have some cases where we run out of
>>> stack because of it on some hosts.
>>
>> Sure, but that would require messing around with the gt_* GTY 
>> functions, and making sure we're allocating the trees from a sensible 
>> place, etc etc.  I'm less confident in my ability to mess with GTY 
>> stuff this late in the game.
> 
> Hmmm, maybe backporting this isn't too bad.  The only time we'd have a 
> chunk on the heap is for int_range_max, which will never live in GC 
> space.  So I don't think we need to worry about GC at all.
> 
> Although, legacy mode in GCC13 does get in a the way a bit.  Sigh.

I've adapted the patch to GCC13 and tested it on x86-64 Linux.  Please 
look over the new[] I do for trees to make sure I did things right.

int_range_max on GCC13 is currently 4112 bytes.  Here are the numbers 
for various defaults:

                 < 2> =  64 bytes, 3.02% for VRP.
                 < 3> =  80 bytes, 2.67% for VRP.
                 < 8> = 160 bytes, 2.46% for VRP.
                 <16> = 288 bytes, 2.40% for VRP.

Note that we don't have any runway on GCC13, so this would be a net loss 
in performance for VRP.  Threading shows about half as much of a drop 
than VRP.  Overall compilation is within 0.2%, so not noticeable.

I'm surprised 2 sub-ranges doesn't incur a  bigger penalty, but 3 seems 
to be the happy medium.  Anything more than that, and there's no difference.

The patch defaults to 3 sub-ranges.  I must say, 80 bytes looks mighty 
nice.  It's up to you what to do with the patch.  I'm chicken shit at 
heart and hate touching release compilers :).

Aldy

[-- Attachment #2: 0001-GCC13-Add-auto-resizing-capability-to-irange-s-PR109.patch --]
[-- Type: text/x-patch, Size: 12127 bytes --]

From 777aa930b106fea2dd6ed9fe22b42a2717f1472d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Mon, 15 May 2023 12:25:58 +0200
Subject: [PATCH] [GCC13] Add auto-resizing capability to irange's [PR109695]

Backport the following from trunk.

	Note that the patch has been adapted to trees.

	The numbers for various sub-ranges on GCC13 are:
		< 2> =  64 bytes, -3.02% for VRP.
		< 3> =  80 bytes, -2.67% for VRP.
		< 8> = 160 bytes, -2.46% for VRP.
		<16> = 288 bytes, -2.40% for VRP.

<tldr>
We can now have int_range<N, RESIZABLE=false> for automatically
resizable ranges.  int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12.  This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.
</tldr>

int_range_max is the temporary range object we use in the ranger for
integers.  With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree.  Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.

This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.

For example:

int_range<1> foo;	// 1 sub-range with no resizing.
int_range<5> foo;	// 5 sub-ranges with no resizing.
int_range<5, true> foo;	// 5 sub-ranges with resizing.

I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:

	typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit.  This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.

With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%).  This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.

I think this approach is a good alternative, while providing us with
flexibility going forward.  For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP.  We could also use
large sub-ranges for switch analysis to avoid resizing.

Another approach I tried was always resizing.  With this, we could
drop the whole int_range<N> nonsense, and have irange just hold a
resizable range.  This simplified things, but incurred a 7% penalty on
ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind.  However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.

For the curious, a particular hot spot for IPA in this area was:

ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
  value_range save (m_vr);
  m_vr.union_ (*other_vr);
  return m_vr != save;
}

The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation.  Maybe
the IPA algorithm could be adjusted somehow??.

Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable.  But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.

gcc/ChangeLog:

	PR tree-optimization/109695
	* value-range.cc (irange::operator=): Resize range.
	(irange::union_): Same.
	(irange::intersect): Same.
	(irange::invert): Same.
	(int_range_max): Default to 3 sub-ranges and resize as needed.
	* value-range.h (irange::maybe_resize): New.
	(~int_range): New.
	(int_range::int_range): Adjust for resizing.
	(int_range::operator=): Same.
---
 gcc/value-range-storage.h |  2 +-
 gcc/value-range.cc        | 15 ++++++
 gcc/value-range.h         | 96 +++++++++++++++++++++++++++------------
 3 files changed, 83 insertions(+), 30 deletions(-)

diff --git a/gcc/value-range-storage.h b/gcc/value-range-storage.h
index 6da377ebd2e..1ed6f1ccd61 100644
--- a/gcc/value-range-storage.h
+++ b/gcc/value-range-storage.h
@@ -184,7 +184,7 @@ vrange_allocator::alloc_irange (unsigned num_pairs)
   // Allocate the irange and required memory for the vector.
   void *r = alloc (sizeof (irange));
   tree *mem = static_cast <tree *> (alloc (nbytes));
-  return new (r) irange (mem, num_pairs);
+  return new (r) irange (mem, num_pairs, /*resizable=*/false);
 }
 
 inline frange *
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index ec826c2fe1b..753f5e8cc76 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -831,6 +831,10 @@ irange::operator= (const irange &src)
       copy_to_legacy (src);
       return *this;
     }
+
+  int needed = src.num_pairs ();
+  maybe_resize (needed);
+
   if (src.legacy_mode_p ())
     {
       copy_legacy_to_multi_range (src);
@@ -2506,6 +2510,7 @@ irange::irange_union (const irange &r)
   // Now it simply needs to be copied, and if there are too many
   // ranges, merge some.  We wont do any analysis as to what the
   // "best" merges are, simply combine the final ranges into one.
+  maybe_resize (i / 2);
   if (i > m_max_ranges * 2)
     {
       res[m_max_ranges * 2 - 1] = res[i - 1];
@@ -2605,6 +2610,11 @@ irange::irange_intersect (const irange &r)
   if (r.irange_contains_p (*this))
     return intersect_nonzero_bits (r);
 
+  // ?? We could probably come up with something smarter than the
+  // worst case scenario here.
+  int needed = num_pairs () + r.num_pairs ();
+  maybe_resize (needed);
+
   signop sign = TYPE_SIGN (TREE_TYPE(m_base[0]));
   unsigned bld_pair = 0;
   unsigned bld_lim = m_max_ranges;
@@ -2831,6 +2841,11 @@ irange::invert ()
       m_num_ranges = 1;
       return;
     }
+
+  // At this point, we need one extra sub-range to represent the
+  // inverse.
+  maybe_resize (m_num_ranges + 1);
+
   // The algorithm is as follows.  To calculate INVERT ([a,b][c,d]), we
   // generate [-MIN, a-1][b+1, c-1][d+1, MAX].
   //
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 969b2b68418..96e59ecfa72 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -172,7 +172,8 @@ public:
   bool legacy_verbose_intersect (const irange *);	// DEPRECATED
 
 protected:
-  irange (tree *, unsigned);
+  void maybe_resize (int needed);
+  irange (tree *, unsigned nranges, bool resizable);
   // potential promotion to public?
   tree tree_lower_bound (unsigned = 0) const;
   tree tree_upper_bound (unsigned) const;
@@ -200,6 +201,8 @@ protected:
   void copy_to_legacy (const irange &);
   void copy_legacy_to_multi_range (const irange &);
 
+  // Hard limit on max ranges allowed.
+  static const int HARD_MAX_RANGES = 255;
 private:
   friend void gt_ggc_mx (irange *);
   friend void gt_pch_nx (irange *);
@@ -214,15 +217,21 @@ private:
 
   bool intersect (const wide_int& lb, const wide_int& ub);
   unsigned char m_num_ranges;
+  bool m_resizable;
   unsigned char m_max_ranges;
   tree m_nonzero_mask;
+protected:
   tree *m_base;
 };
 
 // Here we describe an irange with N pairs of ranges.  The storage for
 // the pairs is embedded in the class as an array.
+//
+// If RESIZABLE is true, the storage will be resized on the heap when
+// the number of ranges needed goes past N up to a max of
+// HARD_MAX_RANGES.  This new storage is freed upon destruction.
 
-template<unsigned N>
+template<unsigned N, bool RESIZABLE = false>
 class GTY((user)) int_range : public irange
 {
 public:
@@ -233,7 +242,7 @@ public:
   int_range (tree type);
   int_range (const int_range &);
   int_range (const irange &);
-  virtual ~int_range () = default;
+  virtual ~int_range ();
   int_range& operator= (const int_range &);
 private:
   template <unsigned X> friend void gt_ggc_mx (int_range<X> *);
@@ -472,6 +481,38 @@ is_a <frange> (vrange &v)
   return v.m_discriminator == VR_FRANGE;
 }
 
+// For resizable ranges, resize the range up to HARD_MAX_RANGES if the
+// NEEDED pairs is greater than the current capacity of the range.
+
+inline void
+irange::maybe_resize (int needed)
+{
+  if (!m_resizable || m_max_ranges == HARD_MAX_RANGES)
+    return;
+
+  if (needed > m_max_ranges)
+    {
+      m_max_ranges = HARD_MAX_RANGES;
+      tree *newmem = new tree[m_max_ranges * 2];
+      memcpy (newmem, m_base, sizeof (tree) * num_pairs () * 2);
+      m_base = newmem;
+    }
+}
+
+template<unsigned N, bool RESIZABLE>
+inline
+int_range<N, RESIZABLE>::~int_range ()
+{
+  if (RESIZABLE && m_base != m_ranges)
+    delete m_base;
+}
+
+// This is an "infinite" precision irange for use in temporary
+// calculations.  It starts with a sensible default covering 99% of
+// uses, and goes up to HARD_MAX_RANGES when needed.  Any allocated
+// storage is freed upon destruction.
+typedef int_range<3, /*RESIZABLE=*/true> int_range_max;
+
 class vrange_visitor
 {
 public:
@@ -490,10 +531,6 @@ public:
 // There are copy operators to seamlessly copy to/fro multi-ranges.
 typedef int_range<1> value_range;
 
-// This is an "infinite" precision irange for use in temporary
-// calculations.
-typedef int_range<255> int_range_max;
-
 // This is an "infinite" precision range object for use in temporary
 // calculations for any of the handled types.  The object can be
 // transparently used as a vrange.
@@ -872,64 +909,65 @@ gt_pch_nx (int_range<N> *x, gt_pointer_operator op, void *cookie)
 // Constructors for irange
 
 inline
-irange::irange (tree *base, unsigned nranges)
+irange::irange (tree *base, unsigned nranges, bool resizable)
 {
   m_discriminator = VR_IRANGE;
   m_base = base;
   m_max_ranges = nranges;
+  m_resizable = resizable;
   set_undefined ();
 }
 
 // Constructors for int_range<>.
 
-template<unsigned N>
+template<unsigned N, bool RESIZABLE>
 inline
-int_range<N>::int_range ()
-  : irange (m_ranges, N)
+int_range<N, RESIZABLE>::int_range ()
+  : irange (m_ranges, N, RESIZABLE)
 {
 }
 
-template<unsigned N>
-int_range<N>::int_range (const int_range &other)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (const int_range &other)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::operator= (other);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree min, tree max, value_range_kind kind)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree min, tree max, value_range_kind kind)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::set (min, max, kind);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree type)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree type)
+  : irange (m_ranges, N, RESIZABLE)
 {
   set_varying (type);
 }
 
-template<unsigned N>
-int_range<N>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (tree type, const wide_int &wmin, const wide_int &wmax,
 			 value_range_kind kind)
-  : irange (m_ranges, N)
+  : irange (m_ranges, N, RESIZABLE)
 {
   tree min = wide_int_to_tree (type, wmin);
   tree max = wide_int_to_tree (type, wmax);
   set (min, max, kind);
 }
 
-template<unsigned N>
-int_range<N>::int_range (const irange &other)
-  : irange (m_ranges, N)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>::int_range (const irange &other)
+  : irange (m_ranges, N, RESIZABLE)
 {
   irange::operator= (other);
 }
 
-template<unsigned N>
-int_range<N>&
-int_range<N>::operator= (const int_range &src)
+template<unsigned N, bool RESIZABLE>
+int_range<N, RESIZABLE>&
+int_range<N, RESIZABLE>::operator= (const int_range &src)
 {
   irange::operator= (src);
   return *this;
-- 
2.40.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-16  9:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 10:35 [PATCH] Add auto-resizing capability to irange's [PR109695] Aldy Hernandez
2023-05-15 10:42 ` Jakub Jelinek
2023-05-15 15:07   ` Aldy Hernandez
2023-05-15 18:14     ` Aldy Hernandez
2023-05-16  9:24       ` Aldy Hernandez
2023-05-15 11:08 ` Richard Biener
2023-05-15 11:26   ` Jakub Jelinek
2023-05-15 15:03   ` Aldy Hernandez
2023-05-15 17:23     ` Aldy Hernandez
2023-05-15 14:24 ` Bernhard Reutner-Fischer
2023-05-15 14:27   ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).