[RFC] trailing_wide_ints with runtime variable lengths

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] trailing_wide_ints with runtime variable lengths
@ 2022-06-29  9:21 Aldy Hernandez
  2022-07-01 14:12 ` Aldy Hernandez
  2022-07-01 18:26 ` Richard Sandiford
  0 siblings, 2 replies; 10+ messages in thread
From: Aldy Hernandez @ 2022-06-29  9:21 UTC (permalink / raw)
  To: GCC patches
  Cc: Andrew MacLeod, richard.sandiford, Jakub Jelinek, Aldy Hernandez

Currently global ranges are stored in SSA_NAME_RANGE_INFO as a pair of
wide_int-like objects along with the nonzero bits.  We frequently lose
precision when streaming out our higher resolution iranges.  The plan
was always to store the full irange between passes.  However, as was
originally discussed eons ago:

	https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475139.html

...we need a memory efficient way of saving iranges, preferably using
the trailing_wide_ints idiom.

The problem with doing so is that trailing_wide_ints assume a
compile-time specified number of elements.  For irange, we need to
determine the size at run-time.

One solution is to adapt trailing_wide_ints such that N is the maximum
number of elements allowed, and allow setting the actual number at
run-time (defaulting to N).  The attached patch does this, while
requiring no changes to existing users.

It uses a byte to store the number of elements in the
trailing_wide_ints control word.  The control word is currently a
16-bit precision, an 8-bit max-length, and the rest is used for
m_len[N].  On a 64-bit architecture, this allows for 5 elements in
m_len without having to use an extra word.  With this patch, m_len[]
would be smaller by one byte (4) before consuming the padding.  This
shouldn't be a problem as the only users of trailing_wide_ints use N=2
for NUM_POLY_INT_COEFFS in aarch64, and N=3 for range_info_def.

For irange, my plan is to use one more word to fit a maximum of 12
elements (the above 4 plus 8 more).  This would allow for 6 pairs of
sub-ranges which would be more than adequate for our needs.  In
previous tests we found that 99% of ranges fit within 3-4 pairs.  More
precisely, this would allow for 5 pairs, plus the nonzero bits, plus a
spare wide-int for future development.

Ultimately this means that streaming an irange would consume one more
word than what we currently do for range_info_def.  IMO this is a nice
trade-off considering we started storing a slew of wide-ints directly
;-).

I'm not above rolling an altogether different approach, but would
prefer to use the existing trailing infrastructure since it's mostly
what we need.

Thoughts?

p.s. Tested and benchmarked on x86-64 Linux.  There was no discernible
performance change in our benchmark suite.

gcc/ChangeLog:

	* wide-int.h (struct trailing_wide_ints): Add m_num_elements.
	(trailing_wide_ints::set_precision): Add num_elements argument.
	(trailing_wide_ints::extra_size): Same.
---
 gcc/wide-int.h | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index 8041b6104f9..f68ccf0a0c5 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -1373,10 +1373,13 @@ namespace wi
     : public int_traits <wide_int_storage> {};
 }

-/* An array of N wide_int-like objects that can be put at the end of
-   a variable-sized structure.  Use extra_size to calculate how many
-   bytes beyond the sizeof need to be allocated.  Use set_precision
-   to initialize the structure.  */
+/* A variable-lengthed array of wide_int-like objects that can be put
+   at the end of a variable-sized structure.  The number of objects is
+   at most N and can be set at runtime by using set_precision().
+
+   Use extra_size to calculate how many bytes beyond the
+   sizeof need to be allocated.  Use set_precision to initialize the
+   structure.  */
 template <int N>
 struct GTY((user)) trailing_wide_ints
 {
@@ -1387,6 +1390,9 @@ private:
   /* The shared maximum length of each number.  */
   unsigned char m_max_len;

+  /* The number of elements.  */
+  unsigned char m_num_elements;
+
   /* The current length of each number.
      Avoid char array so the whole structure is not a typeless storage
      that will, in turn, turn off TBAA on gimple, trees and RTL.  */
@@ -1399,12 +1405,15 @@ private:
 public:
   typedef WIDE_INT_REF_FOR (trailing_wide_int_storage) const_reference;

-  void set_precision (unsigned int);
+  void set_precision (unsigned int precision, unsigned int num_elements = N);
   unsigned int get_precision () const { return m_precision; }
+  unsigned int num_elements () const { return m_num_elements; }
   trailing_wide_int operator [] (unsigned int);
   const_reference operator [] (unsigned int) const;
-  static size_t extra_size (unsigned int);
-  size_t extra_size () const { return extra_size (m_precision); }
+  static size_t extra_size (unsigned int precision,
+			    unsigned int num_elements = N);
+  size_t extra_size () const { return extra_size (m_precision,
+						  m_num_elements); }
 };

 inline trailing_wide_int_storage::
@@ -1457,11 +1466,14 @@ trailing_wide_int_storage::operator = (const T &x)
 }

 /* Initialize the structure and record that all elements have precision
-   PRECISION.  */
+   PRECISION.  NUM_ELEMENTS can be no more than N.  */
 template <int N>
 inline void
-trailing_wide_ints <N>::set_precision (unsigned int precision)
+trailing_wide_ints <N>::set_precision (unsigned int precision,
+				       unsigned int num_elements)
 {
+  gcc_checking_assert (num_elements <= N);
+  m_num_elements = num_elements;
   m_precision = precision;
   m_max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
 	       / HOST_BITS_PER_WIDE_INT);
@@ -1484,15 +1496,19 @@ trailing_wide_ints <N>::operator [] (unsigned int index) const
 			  m_len[index].len, m_precision);
 }

-/* Return how many extra bytes need to be added to the end of the structure
-   in order to handle N wide_ints of precision PRECISION.  */
+/* Return how many extra bytes need to be added to the end of the
+   structure in order to handle N wide_ints of precision PRECISION.
+   NUM_ELEMENTS is the number of elements, or N if none specified.  */
 template <int N>
 inline size_t
-trailing_wide_ints <N>::extra_size (unsigned int precision)
+trailing_wide_ints <N>::extra_size (unsigned int precision,
+				    unsigned int num_elements)
 {
   unsigned int max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
 			  / HOST_BITS_PER_WIDE_INT);
-  return (N * max_len - 1) * sizeof (HOST_WIDE_INT);
+  if (num_elements > N)
+    num_elements = N;
+  return (num_elements * max_len - 1) * sizeof (HOST_WIDE_INT);
 }

 /* This macro is used in structures that end with a trailing_wide_ints field
-- 
2.36.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-06-29  9:21 [RFC] trailing_wide_ints with runtime variable lengths Aldy Hernandez
@ 2022-07-01 14:12 ` Aldy Hernandez
  2022-07-01 14:58   ` Jakub Jelinek
  2022-07-01 18:26 ` Richard Sandiford
  1 sibling, 1 reply; 10+ messages in thread
From: Aldy Hernandez @ 2022-07-01 14:12 UTC (permalink / raw)
  To: GCC patches; +Cc: Andrew MacLeod, Richard Sandiford, Jakub Jelinek

FYI...if no one has anything to say, I'd like to formally post this for
review.

So.... OK for trunk?
Aldy

On Wed, Jun 29, 2022, 11:22 Aldy Hernandez <aldyh@redhat.com> wrote:

> Currently global ranges are stored in SSA_NAME_RANGE_INFO as a pair of
> wide_int-like objects along with the nonzero bits.  We frequently lose
> precision when streaming out our higher resolution iranges.  The plan
> was always to store the full irange between passes.  However, as was
> originally discussed eons ago:
>
>         https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475139.html
>
> ...we need a memory efficient way of saving iranges, preferably using
> the trailing_wide_ints idiom.
>
> The problem with doing so is that trailing_wide_ints assume a
> compile-time specified number of elements.  For irange, we need to
> determine the size at run-time.
>
> One solution is to adapt trailing_wide_ints such that N is the maximum
> number of elements allowed, and allow setting the actual number at
> run-time (defaulting to N).  The attached patch does this, while
> requiring no changes to existing users.
>
> It uses a byte to store the number of elements in the
> trailing_wide_ints control word.  The control word is currently a
> 16-bit precision, an 8-bit max-length, and the rest is used for
> m_len[N].  On a 64-bit architecture, this allows for 5 elements in
> m_len without having to use an extra word.  With this patch, m_len[]
> would be smaller by one byte (4) before consuming the padding.  This
> shouldn't be a problem as the only users of trailing_wide_ints use N=2
> for NUM_POLY_INT_COEFFS in aarch64, and N=3 for range_info_def.
>
> For irange, my plan is to use one more word to fit a maximum of 12
> elements (the above 4 plus 8 more).  This would allow for 6 pairs of
> sub-ranges which would be more than adequate for our needs.  In
> previous tests we found that 99% of ranges fit within 3-4 pairs.  More
> precisely, this would allow for 5 pairs, plus the nonzero bits, plus a
> spare wide-int for future development.
>
> Ultimately this means that streaming an irange would consume one more
> word than what we currently do for range_info_def.  IMO this is a nice
> trade-off considering we started storing a slew of wide-ints directly
> ;-).
>
> I'm not above rolling an altogether different approach, but would
> prefer to use the existing trailing infrastructure since it's mostly
> what we need.
>
> Thoughts?
>
> p.s. Tested and benchmarked on x86-64 Linux.  There was no discernible
> performance change in our benchmark suite.
>
> gcc/ChangeLog:
>
>         * wide-int.h (struct trailing_wide_ints): Add m_num_elements.
>         (trailing_wide_ints::set_precision): Add num_elements argument.
>         (trailing_wide_ints::extra_size): Same.
> ---
>  gcc/wide-int.h | 42 +++++++++++++++++++++++++++++-------------
>  1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
> index 8041b6104f9..f68ccf0a0c5 100644
> --- a/gcc/wide-int.h
> +++ b/gcc/wide-int.h
> @@ -1373,10 +1373,13 @@ namespace wi
>      : public int_traits <wide_int_storage> {};
>  }
>
> -/* An array of N wide_int-like objects that can be put at the end of
> -   a variable-sized structure.  Use extra_size to calculate how many
> -   bytes beyond the sizeof need to be allocated.  Use set_precision
> -   to initialize the structure.  */
> +/* A variable-lengthed array of wide_int-like objects that can be put
> +   at the end of a variable-sized structure.  The number of objects is
> +   at most N and can be set at runtime by using set_precision().
> +
> +   Use extra_size to calculate how many bytes beyond the
> +   sizeof need to be allocated.  Use set_precision to initialize the
> +   structure.  */
>  template <int N>
>  struct GTY((user)) trailing_wide_ints
>  {
> @@ -1387,6 +1390,9 @@ private:
>    /* The shared maximum length of each number.  */
>    unsigned char m_max_len;
>
> +  /* The number of elements.  */
> +  unsigned char m_num_elements;
> +
>    /* The current length of each number.
>       Avoid char array so the whole structure is not a typeless storage
>       that will, in turn, turn off TBAA on gimple, trees and RTL.  */
> @@ -1399,12 +1405,15 @@ private:
>  public:
>    typedef WIDE_INT_REF_FOR (trailing_wide_int_storage) const_reference;
>
> -  void set_precision (unsigned int);
> +  void set_precision (unsigned int precision, unsigned int num_elements =
> N);
>    unsigned int get_precision () const { return m_precision; }
> +  unsigned int num_elements () const { return m_num_elements; }
>    trailing_wide_int operator [] (unsigned int);
>    const_reference operator [] (unsigned int) const;
> -  static size_t extra_size (unsigned int);
> -  size_t extra_size () const { return extra_size (m_precision); }
> +  static size_t extra_size (unsigned int precision,
> +                           unsigned int num_elements = N);
> +  size_t extra_size () const { return extra_size (m_precision,
> +                                                 m_num_elements); }
>  };
>
>  inline trailing_wide_int_storage::
> @@ -1457,11 +1466,14 @@ trailing_wide_int_storage::operator = (const T &x)
>  }
>
>  /* Initialize the structure and record that all elements have precision
> -   PRECISION.  */
> +   PRECISION.  NUM_ELEMENTS can be no more than N.  */
>  template <int N>
>  inline void
> -trailing_wide_ints <N>::set_precision (unsigned int precision)
> +trailing_wide_ints <N>::set_precision (unsigned int precision,
> +                                      unsigned int num_elements)
>  {
> +  gcc_checking_assert (num_elements <= N);
> +  m_num_elements = num_elements;
>    m_precision = precision;
>    m_max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
>                / HOST_BITS_PER_WIDE_INT);
> @@ -1484,15 +1496,19 @@ trailing_wide_ints <N>::operator [] (unsigned int
> index) const
>                           m_len[index].len, m_precision);
>  }
>
> -/* Return how many extra bytes need to be added to the end of the
> structure
> -   in order to handle N wide_ints of precision PRECISION.  */
> +/* Return how many extra bytes need to be added to the end of the
> +   structure in order to handle N wide_ints of precision PRECISION.
> +   NUM_ELEMENTS is the number of elements, or N if none specified.  */
>  template <int N>
>  inline size_t
> -trailing_wide_ints <N>::extra_size (unsigned int precision)
> +trailing_wide_ints <N>::extra_size (unsigned int precision,
> +                                   unsigned int num_elements)
>  {
>    unsigned int max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
>                           / HOST_BITS_PER_WIDE_INT);
> -  return (N * max_len - 1) * sizeof (HOST_WIDE_INT);
> +  if (num_elements > N)
> +    num_elements = N;
> +  return (num_elements * max_len - 1) * sizeof (HOST_WIDE_INT);
>  }
>
>  /* This macro is used in structures that end with a trailing_wide_ints
> field
> --
> 2.36.1
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 14:12 ` Aldy Hernandez
@ 2022-07-01 14:58   ` Jakub Jelinek
  2022-07-01 16:47     ` Aldy Hernandez
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-07-01 14:58 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 01, 2022 at 04:12:55PM +0200, Aldy Hernandez wrote:
> > --- a/gcc/wide-int.h
> > +++ b/gcc/wide-int.h
> > @@ -1373,10 +1373,13 @@ namespace wi
> >      : public int_traits <wide_int_storage> {};
> >  }
> >
> > -/* An array of N wide_int-like objects that can be put at the end of
> > -   a variable-sized structure.  Use extra_size to calculate how many
> > -   bytes beyond the sizeof need to be allocated.  Use set_precision
> > -   to initialize the structure.  */
> > +/* A variable-lengthed array of wide_int-like objects that can be put
> > +   at the end of a variable-sized structure.  The number of objects is
> > +   at most N and can be set at runtime by using set_precision().
> > +
> > +   Use extra_size to calculate how many bytes beyond the
> > +   sizeof need to be allocated.  Use set_precision to initialize the
> > +   structure.  */
> >  template <int N>
> >  struct GTY((user)) trailing_wide_ints
> >  {
> > @@ -1387,6 +1390,9 @@ private:
> >    /* The shared maximum length of each number.  */
> >    unsigned char m_max_len;
> >
> > +  /* The number of elements.  */
> > +  unsigned char m_num_elements;

IMNSHO you certainly don't want to change like this existing
trailing_wide_ints, you don't want to grow unnecessarily existing
trailing_wide_ints users (e.g. const_poly_int_def).

My brief understanding of wide-int.h is that in some cases stuff like this
is implied from template parameters or exact class instantiation and in
other cases it is present explicitly and class inheritence is used to hide
that stuff nicely.

So, you are looking for something like trailing_wide_ints<N> but where that
N is actually a runtime value?  Then e.g. the
  struct {unsigned char len;} m_len[N];
member can't work properly either, because it isn't constant size.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 14:58   ` Jakub Jelinek
@ 2022-07-01 16:47     ` Aldy Hernandez
  2022-07-01 16:58       ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Aldy Hernandez @ 2022-07-01 16:47 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 1, 2022 at 4:58 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Jul 01, 2022 at 04:12:55PM +0200, Aldy Hernandez wrote:
> > > --- a/gcc/wide-int.h
> > > +++ b/gcc/wide-int.h
> > > @@ -1373,10 +1373,13 @@ namespace wi
> > >      : public int_traits <wide_int_storage> {};
> > >  }
> > >
> > > -/* An array of N wide_int-like objects that can be put at the end of
> > > -   a variable-sized structure.  Use extra_size to calculate how many
> > > -   bytes beyond the sizeof need to be allocated.  Use set_precision
> > > -   to initialize the structure.  */
> > > +/* A variable-lengthed array of wide_int-like objects that can be put
> > > +   at the end of a variable-sized structure.  The number of objects is
> > > +   at most N and can be set at runtime by using set_precision().
> > > +
> > > +   Use extra_size to calculate how many bytes beyond the
> > > +   sizeof need to be allocated.  Use set_precision to initialize the
> > > +   structure.  */
> > >  template <int N>
> > >  struct GTY((user)) trailing_wide_ints
> > >  {
> > > @@ -1387,6 +1390,9 @@ private:
> > >    /* The shared maximum length of each number.  */
> > >    unsigned char m_max_len;
> > >
> > > +  /* The number of elements.  */
> > > +  unsigned char m_num_elements;
>
> IMNSHO you certainly don't want to change like this existing
> trailing_wide_ints, you don't want to grow unnecessarily existing
> trailing_wide_ints users (e.g. const_poly_int_def).

That's precisely what I avoided...touching existing trailing_wide_ints
users.  As I explained, there is no cost to either const_poly_int_def
or range_info_def (though I'm about to nuke the latter).  There is
some padding that is currently used by m_len[N], and I just took a
byte out to represent the run-time length.  That would affect
trailing_wide_int users that have N > 4, but none are.  The use in
const_poly_int_def uses 2 (and range_info_def uses 3):

#define NUM_POLY_INT_COEFFS 2

struct GTY((variable_size)) const_poly_int_def {
  trailing_wide_ints<NUM_POLY_INT_COEFFS> coeffs;
};

>
> My brief understanding of wide-int.h is that in some cases stuff like this
> is implied from template parameters or exact class instantiation and in
> other cases it is present explicitly and class inheritence is used to hide
> that stuff nicely.

Yeah, it took me a while to decipher it, but I did read it :).

>
> So, you are looking for something like trailing_wide_ints<N> but where that
> N is actually a runtime value?  Then e.g. the
>   struct {unsigned char len;} m_len[N];
> member can't work properly either, because it isn't constant size.

What my patch does is store the run-time length in the aforementioned
byte, while defaulting to N/MAX.  There is no size difference (or code
changes) for existing users.  With my change, set_precision() and
extra_size() now take a runtime parameter, but it defaults to N and is
inlined, so there is no penalty for existing users.  I benchmarked to
make sure :).

I could hack up a variable_length_wide_int for what I want, but I'd
end up duplicating a lot of the trailing_wide_int_storage, etc.
Another option would be to stream out the HOST_WIDE_INTs in the
tree_int_cst and reconstruct things myself, but that smells of
reinventing the wheel.

Is there another way of allocating an n-bit wide-int at run-time?  I'm
happy to entertain other alternatives...

Aldy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 16:47     ` Aldy Hernandez
@ 2022-07-01 16:58       ` Jakub Jelinek
  2022-07-01 17:43         ` Aldy Hernandez
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-07-01 16:58 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 01, 2022 at 06:47:48PM +0200, Aldy Hernandez wrote:
> > So, you are looking for something like trailing_wide_ints<N> but where that
> > N is actually a runtime value?  Then e.g. the
> >   struct {unsigned char len;} m_len[N];
> > member can't work properly either, because it isn't constant size.
> 
> What my patch does is store the run-time length in the aforementioned
> byte, while defaulting to N/MAX.  There is no size difference (or code
> changes) for existing users.  With my change, set_precision() and
> extra_size() now take a runtime parameter, but it defaults to N and is
> inlined, so there is no penalty for existing users.  I benchmarked to
> make sure :).

So, you still use N = 3 but can sometimes store say 255 wide_ints in there?
In that case, m_len[N] provides lengths just for the first 3, no?

Anyway, you really want feedback from Richard Sandiford IMHO...

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 16:58       ` Jakub Jelinek
@ 2022-07-01 17:43         ` Aldy Hernandez
  2022-07-01 18:53           ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Aldy Hernandez @ 2022-07-01 17:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 1, 2022, 18:58 Jakub Jelinek <jakub@redhat.com> wrote:

> On Fri, Jul 01, 2022 at 06:47:48PM +0200, Aldy Hernandez wrote:
> > > So, you are looking for something like trailing_wide_ints<N> but where
> that
> > > N is actually a runtime value?  Then e.g. the
> > >   struct {unsigned char len;} m_len[N];
> > > member can't work properly either, because it isn't constant size.
> >
> > What my patch does is store the run-time length in the aforementioned
> > byte, while defaulting to N/MAX.  There is no size difference (or code
> > changes) for existing users.  With my change, set_precision() and
> > extra_size() now take a runtime parameter, but it defaults to N and is
> > inlined, so there is no penalty for existing users.  I benchmarked to
> > make sure :).
>
> So, you still use N = 3 but can sometimes store say 255 wide_ints in there?
> In that case, m_len[N] provides lengths just for the first 3, no?
>

You can still say N=255 and things continue to work as they do now, since
m_len[] is still statically determined. The only difference is that before,
the size of the structure would be 2+1+255+sizeof(int) whereas now it would
be 1 more because of the byte I'm using for num_elements.

The only time this would be an issue would be for say N=253 because the
size of everything except the ints is currently 256 (2+1+253), which is 64
bit aligned, whereas with my patch it would be 257 which takes an
additional word.

This is all theoretical because there are no users of trailing wide ints
greater than 3.

I guess what I'm really doing is changing the semantics of
trailing_wide_ints<N> to trailing_wide_ints<MAX> and what used to be the
number of elements is determined at run time....but the default is MAX so
everything continues to work the same for the current users.

Does that make sense?

Aldy

> Anyway, you really want feedback from Richard Sandiford IMHO...
>
>         Jakub
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 17:43         ` Aldy Hernandez
@ 2022-07-01 18:53           ` Jakub Jelinek
  2022-07-01 20:31             ` Aldy Hernandez
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2022-07-01 18:53 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 01, 2022 at 07:43:28PM +0200, Aldy Hernandez wrote:
> You can still say N=255 and things continue to work as they do now, since
> m_len[] is still statically determined. The only difference is that before,
> the size of the structure would be 2+1+255+sizeof(int) whereas now it would
> be 1 more because of the byte I'm using for num_elements.

So, what N do you want to use for SSA_NAME_RANGE_INFO?
N=255 wouldn't be very space efficient especially if the common case is a
single range or two.
For such cases making e.g. m_len not an embedded array, but pointer to
somewhere after the HOST_WIDE_INT array in the same allocation would be
better.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 18:53           ` Jakub Jelinek
@ 2022-07-01 20:31             ` Aldy Hernandez
  2022-07-01 20:40               ` Aldy Hernandez
  0 siblings, 1 reply; 10+ messages in thread
From: Aldy Hernandez @ 2022-07-01 20:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

On Fri, Jul 1, 2022 at 8:53 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Jul 01, 2022 at 07:43:28PM +0200, Aldy Hernandez wrote:
> > You can still say N=255 and things continue to work as they do now, since
> > m_len[] is still statically determined. The only difference is that before,
> > the size of the structure would be 2+1+255+sizeof(int) whereas now it would
> > be 1 more because of the byte I'm using for num_elements.
>
> So, what N do you want to use for SSA_NAME_RANGE_INFO?
> N=255 wouldn't be very space efficient especially if the common case is a
> single range or two.
> For such cases making e.g. m_len not an embedded array, but pointer to
> somewhere after the HOST_WIDE_INT array in the same allocation would be
> better.

As I mentioned in my original post, 12.  This means that I'm taking
the 4 bytes that are left over from the current padding plus 8
(64-bits).  My trailing wide int structure for SSA_NAME_RANGE_INFO
will be one word larger than what is currently there.  But we'll be
able to store up to 5 pairs plus one for the nonzero bits plus one for
future development (5*2 + 1 + 1 = 12), all without going over the 64
bit alignment.

This is a theoretical max, in reality as I mentioned, 99% of ranges
calculated in infinite precision by the ranger fit into 3-4 pairs.

Aldy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-07-01 20:31             ` Aldy Hernandez
@ 2022-07-01 20:40               ` Aldy Hernandez
  0 siblings, 0 replies; 10+ messages in thread
From: Aldy Hernandez @ 2022-07-01 20:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC patches, Andrew MacLeod, Richard Sandiford

BTW, I don't know if it got lost in all my patches, but we already
have an irange allocator that given an irange, returns a chunk of
memory holding a clone of that irange squished into its minimum
representable pairs (see vrange_allocator and friends).  So we won't
ever be storing 255 or something equally absurd like I had proposed
years ago :).  We'll be storing the smallest representable range
inside a trailing_wide_int.

Aldy

On Fri, Jul 1, 2022 at 10:31 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Fri, Jul 1, 2022 at 8:53 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Fri, Jul 01, 2022 at 07:43:28PM +0200, Aldy Hernandez wrote:
> > > You can still say N=255 and things continue to work as they do now, since
> > > m_len[] is still statically determined. The only difference is that before,
> > > the size of the structure would be 2+1+255+sizeof(int) whereas now it would
> > > be 1 more because of the byte I'm using for num_elements.
> >
> > So, what N do you want to use for SSA_NAME_RANGE_INFO?
> > N=255 wouldn't be very space efficient especially if the common case is a
> > single range or two.
> > For such cases making e.g. m_len not an embedded array, but pointer to
> > somewhere after the HOST_WIDE_INT array in the same allocation would be
> > better.
>
> As I mentioned in my original post, 12.  This means that I'm taking
> the 4 bytes that are left over from the current padding plus 8
> (64-bits).  My trailing wide int structure for SSA_NAME_RANGE_INFO
> will be one word larger than what is currently there.  But we'll be
> able to store up to 5 pairs plus one for the nonzero bits plus one for
> future development (5*2 + 1 + 1 = 12), all without going over the 64
> bit alignment.
>
> This is a theoretical max, in reality as I mentioned, 99% of ranges
> calculated in infinite precision by the ranger fit into 3-4 pairs.
>
> Aldy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] trailing_wide_ints with runtime variable lengths
  2022-06-29  9:21 [RFC] trailing_wide_ints with runtime variable lengths Aldy Hernandez
  2022-07-01 14:12 ` Aldy Hernandez
@ 2022-07-01 18:26 ` Richard Sandiford
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2022-07-01 18:26 UTC (permalink / raw)
  To: Aldy Hernandez via Gcc-patches; +Cc: Aldy Hernandez, Jakub Jelinek

Aldy Hernandez via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Currently global ranges are stored in SSA_NAME_RANGE_INFO as a pair of
> wide_int-like objects along with the nonzero bits.  We frequently lose
> precision when streaming out our higher resolution iranges.  The plan
> was always to store the full irange between passes.  However, as was
> originally discussed eons ago:
>
> 	https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475139.html
>
> ...we need a memory efficient way of saving iranges, preferably using
> the trailing_wide_ints idiom.
>
> The problem with doing so is that trailing_wide_ints assume a
> compile-time specified number of elements.  For irange, we need to
> determine the size at run-time.
>
> One solution is to adapt trailing_wide_ints such that N is the maximum
> number of elements allowed, and allow setting the actual number at
> run-time (defaulting to N).  The attached patch does this, while
> requiring no changes to existing users.
>
> It uses a byte to store the number of elements in the
> trailing_wide_ints control word.  The control word is currently a
> 16-bit precision, an 8-bit max-length, and the rest is used for
> m_len[N].  On a 64-bit architecture, this allows for 5 elements in
> m_len without having to use an extra word.  With this patch, m_len[]
> would be smaller by one byte (4) before consuming the padding.  This
> shouldn't be a problem as the only users of trailing_wide_ints use N=2
> for NUM_POLY_INT_COEFFS in aarch64, and N=3 for range_info_def.
>
> For irange, my plan is to use one more word to fit a maximum of 12
> elements (the above 4 plus 8 more).  This would allow for 6 pairs of
> sub-ranges which would be more than adequate for our needs.  In
> previous tests we found that 99% of ranges fit within 3-4 pairs.  More
> precisely, this would allow for 5 pairs, plus the nonzero bits, plus a
> spare wide-int for future development.
>
> Ultimately this means that streaming an irange would consume one more
> word than what we currently do for range_info_def.  IMO this is a nice
> trade-off considering we started storing a slew of wide-ints directly
> ;-).
>
> I'm not above rolling an altogether different approach, but would
> prefer to use the existing trailing infrastructure since it's mostly
> what we need.
>
> Thoughts?
>
> p.s. Tested and benchmarked on x86-64 Linux.  There was no discernible
> performance change in our benchmark suite.

Thanks for the discussion downthread.  I had some of the same questions
as Jakub, but you've answered them :-)

> gcc/ChangeLog:
>
> 	* wide-int.h (struct trailing_wide_ints): Add m_num_elements.
> 	(trailing_wide_ints::set_precision): Add num_elements argument.
> 	(trailing_wide_ints::extra_size): Same.
> ---
>  gcc/wide-int.h | 42 +++++++++++++++++++++++++++++-------------
>  1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
> index 8041b6104f9..f68ccf0a0c5 100644
> --- a/gcc/wide-int.h
> +++ b/gcc/wide-int.h
> @@ -1373,10 +1373,13 @@ namespace wi
>      : public int_traits <wide_int_storage> {};
>  }
>  
> -/* An array of N wide_int-like objects that can be put at the end of
> -   a variable-sized structure.  Use extra_size to calculate how many
> -   bytes beyond the sizeof need to be allocated.  Use set_precision
> -   to initialize the structure.  */
> +/* A variable-lengthed array of wide_int-like objects that can be put

s/lengthed/length/

> +   at the end of a variable-sized structure.  The number of objects is
> +   at most N and can be set at runtime by using set_precision().
> +
> +   Use extra_size to calculate how many bytes beyond the
> +   sizeof need to be allocated.  Use set_precision to initialize the
> +   structure.  */
>  template <int N>
>  struct GTY((user)) trailing_wide_ints
>  {
> @@ -1387,6 +1390,9 @@ private:
>    /* The shared maximum length of each number.  */
>    unsigned char m_max_len;
>  
> +  /* The number of elements.  */
> +  unsigned char m_num_elements;
> +
>    /* The current length of each number.
>       Avoid char array so the whole structure is not a typeless storage
>       that will, in turn, turn off TBAA on gimple, trees and RTL.  */
> @@ -1399,12 +1405,15 @@ private:
>  public:
>    typedef WIDE_INT_REF_FOR (trailing_wide_int_storage) const_reference;
>  
> -  void set_precision (unsigned int);
> +  void set_precision (unsigned int precision, unsigned int num_elements = N);
>    unsigned int get_precision () const { return m_precision; }
> +  unsigned int num_elements () const { return m_num_elements; }
>    trailing_wide_int operator [] (unsigned int);
>    const_reference operator [] (unsigned int) const;
> -  static size_t extra_size (unsigned int);
> -  size_t extra_size () const { return extra_size (m_precision); }
> +  static size_t extra_size (unsigned int precision,
> +			    unsigned int num_elements = N);
> +  size_t extra_size () const { return extra_size (m_precision,
> +						  m_num_elements); }
>  };
>  
>  inline trailing_wide_int_storage::
> @@ -1457,11 +1466,14 @@ trailing_wide_int_storage::operator = (const T &x)
>  }
>  
>  /* Initialize the structure and record that all elements have precision
> -   PRECISION.  */
> +   PRECISION.  NUM_ELEMENTS can be no more than N.  */
>  template <int N>
>  inline void
> -trailing_wide_ints <N>::set_precision (unsigned int precision)
> +trailing_wide_ints <N>::set_precision (unsigned int precision,
> +				       unsigned int num_elements)
>  {
> +  gcc_checking_assert (num_elements <= N);
> +  m_num_elements = num_elements;
>    m_precision = precision;
>    m_max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
>  	       / HOST_BITS_PER_WIDE_INT);
> @@ -1484,15 +1496,19 @@ trailing_wide_ints <N>::operator [] (unsigned int index) const
>  			  m_len[index].len, m_precision);
>  }
>  
> -/* Return how many extra bytes need to be added to the end of the structure
> -   in order to handle N wide_ints of precision PRECISION.  */
> +/* Return how many extra bytes need to be added to the end of the
> +   structure in order to handle N wide_ints of precision PRECISION.

s/N/NUM_ELEMENTS/

> +   NUM_ELEMENTS is the number of elements, or N if none specified.  */

and here maybe just "NUM_ELEMENTS defaults to N."

>  template <int N>
>  inline size_t
> -trailing_wide_ints <N>::extra_size (unsigned int precision)
> +trailing_wide_ints <N>::extra_size (unsigned int precision,
> +				    unsigned int num_elements)
>  {
>    unsigned int max_len = ((precision + HOST_BITS_PER_WIDE_INT - 1)
>  			  / HOST_BITS_PER_WIDE_INT);
> -  return (N * max_len - 1) * sizeof (HOST_WIDE_INT);
> +  if (num_elements > N)
> +    num_elements = N;

Can we gcc_checking_assert that num_elements <= N instead, like above?
Otherwise we'd get GIGO.

OK with those changes, thanks.

Richard

> +  return (num_elements * max_len - 1) * sizeof (HOST_WIDE_INT);
>  }
>  
>  /* This macro is used in structures that end with a trailing_wide_ints field

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-01 20:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29  9:21 [RFC] trailing_wide_ints with runtime variable lengths Aldy Hernandez
2022-07-01 14:12 ` Aldy Hernandez
2022-07-01 14:58   ` Jakub Jelinek
2022-07-01 16:47     ` Aldy Hernandez
2022-07-01 16:58       ` Jakub Jelinek
2022-07-01 17:43         ` Aldy Hernandez
2022-07-01 18:53           ` Jakub Jelinek
2022-07-01 20:31             ` Aldy Hernandez
2022-07-01 20:40               ` Aldy Hernandez
2022-07-01 18:26 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).