public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
@ 2024-07-29 10:13 Prathamesh Kulkarni
  2024-07-29 11:29 ` Richard Biener
  2024-07-30 12:37 ` Tobias Burnus
  0 siblings, 2 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-07-29 10:13 UTC (permalink / raw)
  To: rguenther, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Hi Richard,
Thanks for your suggestions on RFC email, the attached patch adds support for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int:

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
LTO bootstrap+test on x86_64-linux-gnu in progress.

I am not quite sure how to test it for offloading since currently it's (entirely) broken for aarch64->nvptx.
I can give a try with x86_64->nvptx offloading if required (altho I guess LTO bootstrap should test streaming changes ?)

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh

[-- Attachment #2: p-163-4.txt --]
[-- Type: text/plain, Size: 7660 bytes --]

Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

gcc/ChangeLog:

	* data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
	degree and call poly_int_read_common. 
	(streamer_read_poly_int64): Likewise.
	* data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
	degree.
	(streamer_write_poly_int64): Likewise.
	* data-streamer.h (bp_pack_poly_value): Likewise.
	(poly_int_read_common): New function template.
	(bp_unpack_poly_value): Stream in poly_int degree and call
	poly_int_read_common.
	* poly-int.h (C>::degree): New method.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
	POLY_INT_CST degree, issue a fatal_error if degree is greater than
	NUM_POLY_INT_COEFFS, and zero out remaining coeffs. 
	* tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
	stream out POLY_INT_CST degree.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT work)
 void
 streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 void
 streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..b154c439b8c 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp,
 		    const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val,
 		    unsigned nbits)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = val.degree ();
+  bp_pack_value (bp, degree, BITS_PER_UNIT * sizeof (unsigned HOST_WIDE_INT));
+  for (unsigned i = 0; i < degree; ++i)
     bp_pack_value (bp, val.coeffs[i], nbits);
 }
 
@@ -194,15 +196,33 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+template<unsigned N, typename C>
+inline poly_int<N, C>
+poly_int_read_common (poly_int<N, C> x, unsigned degree)
+{
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+		 "%<poly_int%> degree (%u) exceeds value of "
+		 "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+		 NUM_POLY_INT_COEFFS);
+  for (unsigned i = degree; i < NUM_POLY_INT_COEFFS; i++)
+    x.coeffs[i] = 0;
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
+  unsigned degree
+    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned HOST_WIDE_INT));
+
   poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned i = 0; i < degree; i++)
     x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+
+  return poly_int_read_common (x, degree);
 }
 
 
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..0df70d392f5 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -422,6 +422,8 @@ public:
   poly_int<N, HOST_WIDE_INT> force_shwi () const;
   poly_int<N, unsigned HOST_WIDE_INT> force_uhwi () const;
 
+  unsigned degree (void) const;
+
 #if POLY_INT_CONVERSION
   operator C () const;
 #endif
@@ -678,6 +680,17 @@ poly_int<N, C>::force_uhwi () const
   return r;
 }
 
+template<unsigned N, typename C>
+inline unsigned poly_int<N, C>::degree () const
+{
+  unsigned i;
+  /* Find leading non-zero coeff. In case all coeffs are zero,
+     treat it as degree-1 poly_int.  */
+  for (i = NUM_POLY_INT_COEFFS - 1; i > 0 && !this->coeffs[i]; i--)
+    ;
+  return i + 1;
+}
+
 #if POLY_INT_CONVERSION
 /* Provide a conversion operator to constants.  */
 
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..2394ac209f5 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,20 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = streamer_read_uhwi (ib);
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+		 "%<poly_int%> degree (%u) exceeds value of "
+		 "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+		 NUM_POLY_INT_COEFFS);
+
+  unsigned i;
+  for (i = 0; i < degree; ++i)
     POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+  tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+  for (; i < NUM_POLY_INT_COEFFS; ++i)
+    POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
 }
 
 
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..e28616b9a7a 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -576,7 +576,14 @@ write_ts_vector_tree_pointers (struct output_block *ob, tree expr)
 static void
 write_ts_poly_tree_pointers (struct output_block *ob, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned i;
+  for (i = NUM_POLY_INT_COEFFS - 1;
+       i > 0 && integer_zerop (POLY_INT_CST_COEFF (expr, i));
+       i--)
+    ;
+  unsigned degree = i + 1;
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (i = 0; i < degree; ++i)
     stream_write_tree_ref (ob, POLY_INT_CST_COEFF (expr, i));
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-29 10:13 Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS Prathamesh Kulkarni
@ 2024-07-29 11:29 ` Richard Biener
  2024-07-29 16:13   ` Richard Sandiford
  2024-07-30 12:37 ` Tobias Burnus
  1 sibling, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-07-29 11:29 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: gcc-patches, richard.sandiford

On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:

> Hi Richard,
> Thanks for your suggestions on RFC email, the attached patch adds support for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
> The patch changes streaming of poly_int as follows:
> 
> Streaming out poly_int:
> 
> degree = poly_int.degree();
> stream out degree;
> for (i = 0; i < degree; i++)
>   stream out poly_int.coeffs[i];
> 
> Streaming in poly_int:
> 
> stream in degree;
> if (degree > NUM_POLY_INT_COEFFS)
>   fatal_error();
> stream in coeffs;
> // Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
> for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>   poly_int.coeffs[i] = 0;
> 
> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
> LTO bootstrap+test on x86_64-linux-gnu in progress.
> 
> I am not quite sure how to test it for offloading since currently it's (entirely) broken for aarch64->nvptx.
> I can give a try with x86_64->nvptx offloading if required (altho I guess LTO bootstrap should test streaming changes ?)

+  unsigned degree
+    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
HOST_WIDE_INT));

The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
I'd say we should constrain it to a reasonable upper bound,
like 2?  Maybe even have MAX_NUM_POLY_INT_COEFFS or 
NUM_POLY_INT_COEFFS_BITS in poly-int.h and constrain NUM_POLY_INT_COEFFS.

The patch looks reasonable over all, but Richard S. should have a say
about the abstraction you chose and the poly-int adjustment.

Thanks,
Richard.


> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> 
> Thanks,
> Prathamesh
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-29 11:29 ` Richard Biener
@ 2024-07-29 16:13   ` Richard Sandiford
  2024-07-30  6:21     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2024-07-29 16:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> Thanks for your suggestions on RFC email, the attached patch adds support for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
>> The patch changes streaming of poly_int as follows:
>> 
>> Streaming out poly_int:
>> 
>> degree = poly_int.degree();
>> stream out degree;
>> for (i = 0; i < degree; i++)
>>   stream out poly_int.coeffs[i];
>> 
>> Streaming in poly_int:
>> 
>> stream in degree;
>> if (degree > NUM_POLY_INT_COEFFS)
>>   fatal_error();
>> stream in coeffs;
>> // Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
>> for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>>   poly_int.coeffs[i] = 0;
>> 
>> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
>> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> 
>> I am not quite sure how to test it for offloading since currently it's (entirely) broken for aarch64->nvptx.
>> I can give a try with x86_64->nvptx offloading if required (altho I guess LTO bootstrap should test streaming changes ?)
>
> +  unsigned degree
> +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> HOST_WIDE_INT));
>
> The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> I'd say we should constrain it to a reasonable upper bound,
> like 2?  Maybe even have MAX_NUM_POLY_INT_COEFFS or 
> NUM_POLY_INT_COEFFS_BITS in poly-int.h and constrain NUM_POLY_INT_COEFFS.
>
> The patch looks reasonable over all, but Richard S. should have a say
> about the abstraction you chose and the poly-int adjustment.

Sorry if this has been discussed already, but could we instead stream
NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
It's a target invariant, and poly_int has wormed its way into lots
of things by now :)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-29 16:13   ` Richard Sandiford
@ 2024-07-30  6:21     ` Prathamesh Kulkarni
  2024-07-30  7:33       ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-07-30  6:21 UTC (permalink / raw)
  To: Richard Sandiford, Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, July 29, 2024 9:43 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >
> >> Hi Richard,
> >> Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> >> The patch changes streaming of poly_int as follows:
> >>
> >> Streaming out poly_int:
> >>
> >> degree = poly_int.degree();
> >> stream out degree;
> >> for (i = 0; i < degree; i++)
> >>   stream out poly_int.coeffs[i];
> >>
> >> Streaming in poly_int:
> >>
> >> stream in degree;
> >> if (degree > NUM_POLY_INT_COEFFS)
> >>   fatal_error();
> >> stream in coeffs;
> >> // Set remaining coeffs to zero in case degree < accel's
> >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >>   poly_int.coeffs[i] = 0;
> >>
> >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> linux-gnu.
> >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >>
> >> I am not quite sure how to test it for offloading since currently
> it's (entirely) broken for aarch64->nvptx.
> >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> guess LTO bootstrap should test streaming changes ?)
> >
> > +  unsigned degree
> > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> > HOST_WIDE_INT));
> >
> > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> > I'd say we should constrain it to a reasonable upper bound, like 2?
> > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> in
> > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >
> > The patch looks reasonable over all, but Richard S. should have a
> say
> > about the abstraction you chose and the poly-int adjustment.
> 
> Sorry if this has been discussed already, but could we instead stream
> NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> It's a target invariant, and poly_int has wormed its way into lots of
> things by now :)
Hi Richard,
The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
(as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
(similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html

The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by any target,
and uses that for packing/unpacking degree of poly_int to/from bitstream, which should make it independent of the type used for representing NUM_POLY_INT_COEFFS by
the target.

Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu.
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh
> 
> Thanks,
> Richard

[-- Attachment #2: p-163-5.txt --]
[-- Type: text/plain, Size: 8522 bytes --]

Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

The patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent
number of bits needed to represent value of max NUM_POLY_INT_COEFFS for any
target, which should make packing/unpacking degree of poly_int to/from
bitstream independent of the type used to represent NUM_POLY_INT_COEFF by the
target.

gcc/ChangeLog:

	* data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
	degree and call poly_int_read_common. 
	(streamer_read_poly_int64): Likewise.
	* data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
	degree.
	(streamer_write_poly_int64): Likewise.
	* data-streamer.h (bp_pack_poly_value): Likewise.
	(poly_int_read_common): New function template.
	(bp_unpack_poly_value): Stream in poly_int degree and call
	poly_int_read_common.
	* poly-int.h (poly_int::degree): New method.
	(MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
	POLY_INT_CST degree, issue a fatal_error if degree is greater than
	NUM_POLY_INT_COEFFS, and zero out remaining coeffs. 
	* tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
	stream out POLY_INT_CST degree.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT work)
 void
 streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 void
 streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..ad676cc9287 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp,
 		    const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val,
 		    unsigned nbits)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = val.degree ();
+  bp_pack_value (bp, degree, MAX_NUM_POLY_INT_COEFFS_BITS);
+  for (unsigned i = 0; i < degree; ++i)
     bp_pack_value (bp, val.coeffs[i], nbits);
 }
 
@@ -194,15 +196,33 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+template<unsigned N, typename C>
+inline poly_int<N, C>
+poly_int_read_common (poly_int<N, C> x, unsigned degree)
+{
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+		 "%<poly_int%> degree (%u) exceeds value of "
+		 "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+		 NUM_POLY_INT_COEFFS);
+  for (unsigned i = degree; i < NUM_POLY_INT_COEFFS; i++)
+    x.coeffs[i] = 0;
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
+  unsigned degree
+    = bp_unpack_value (bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned i = 0; i < degree; i++)
     x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+
+  return poly_int_read_common (x, degree);
 }
 
 
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..5f2272313ed 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS (2)
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
@@ -422,6 +426,8 @@ public:
   poly_int<N, HOST_WIDE_INT> force_shwi () const;
   poly_int<N, unsigned HOST_WIDE_INT> force_uhwi () const;
 
+  unsigned degree (void) const;
+
 #if POLY_INT_CONVERSION
   operator C () const;
 #endif
@@ -678,6 +684,18 @@ poly_int<N, C>::force_uhwi () const
   return r;
 }
 
+/* Find leading non-zero coeff. In case all coeffs are zero,
+   treat it as degree-1 poly_int.  */
+
+template<unsigned N, typename C>
+inline unsigned poly_int<N, C>::degree () const
+{
+  unsigned i;
+  for (i = NUM_POLY_INT_COEFFS - 1; i > 0 && !this->coeffs[i]; i--)
+    ;
+  return i + 1;
+}
+
 #if POLY_INT_CONVERSION
 /* Provide a conversion operator to constants.  */
 
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..2394ac209f5 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,20 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = streamer_read_uhwi (ib);
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+		 "%<poly_int%> degree (%u) exceeds value of "
+		 "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+		 NUM_POLY_INT_COEFFS);
+
+  unsigned i;
+  for (i = 0; i < degree; ++i)
     POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+  tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+  for (; i < NUM_POLY_INT_COEFFS; ++i)
+    POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
 }
 
 
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..e28616b9a7a 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -576,7 +576,14 @@ write_ts_vector_tree_pointers (struct output_block *ob, tree expr)
 static void
 write_ts_poly_tree_pointers (struct output_block *ob, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned i;
+  for (i = NUM_POLY_INT_COEFFS - 1;
+       i > 0 && integer_zerop (POLY_INT_CST_COEFF (expr, i));
+       i--)
+    ;
+  unsigned degree = i + 1;
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (i = 0; i < degree; ++i)
     stream_write_tree_ref (ob, POLY_INT_CST_COEFF (expr, i));
 }
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  6:21     ` Prathamesh Kulkarni
@ 2024-07-30  7:33       ` Richard Biener
  2024-07-30  9:01         ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-07-30  7:33 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Sandiford, gcc-patches

On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Monday, July 29, 2024 9:43 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> > patches@gcc.gnu.org
> > Subject: Re: Support streaming of poly_int for offloading when it's
> > degree <= accel's NUM_POLY_INT_COEFFS
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > Richard Biener <rguenther@suse.de> writes:
> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> > >
> > >> Hi Richard,
> > >> Thanks for your suggestions on RFC email, the attached patch adds
> > support for streaming of poly_int when it's degree <= accel's
> > NUM_POLY_INT_COEFFS.
> > >> The patch changes streaming of poly_int as follows:
> > >>
> > >> Streaming out poly_int:
> > >>
> > >> degree = poly_int.degree();
> > >> stream out degree;
> > >> for (i = 0; i < degree; i++)
> > >>   stream out poly_int.coeffs[i];
> > >>
> > >> Streaming in poly_int:
> > >>
> > >> stream in degree;
> > >> if (degree > NUM_POLY_INT_COEFFS)
> > >>   fatal_error();
> > >> stream in coeffs;
> > >> // Set remaining coeffs to zero in case degree < accel's
> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> > >>   poly_int.coeffs[i] = 0;
> > >>
> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> > linux-gnu.
> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> > >>
> > >> I am not quite sure how to test it for offloading since currently
> > it's (entirely) broken for aarch64->nvptx.
> > >> I can give a try with x86_64->nvptx offloading if required (altho I
> > >> guess LTO bootstrap should test streaming changes ?)
> > >
> > > +  unsigned degree
> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> > > HOST_WIDE_INT));
> > >
> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> > > I'd say we should constrain it to a reasonable upper bound, like 2?
> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> > in
> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> > >
> > > The patch looks reasonable over all, but Richard S. should have a
> > say
> > > about the abstraction you chose and the poly-int adjustment.
> > 
> > Sorry if this has been discussed already, but could we instead stream
> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> > It's a target invariant, and poly_int has wormed its way into lots of
> > things by now :)
> Hi Richard,
> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
> During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
> (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
> 
> The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by any target,
> and uses that for packing/unpacking degree of poly_int to/from bitstream, which should make it independent of the type used for representing NUM_POLY_INT_COEFFS by
> the target.

Just as additional comment - maybe we can avoid the POLY_INT_CST tree
side if we'd consistently "canonicalize" a POLY_INT_CST with zero
second coeff as INTEGER_CST instead?  This of course doesn't
generalize to NUM_POLY_INT_COEFFS == 3 vs NUM_POLY_INT_COEFFS == 2.

We still need the poly_int<> streaming support of course where I
would guess that 99% of the cases have a zero second coeff.

Richard.

> Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu.
> Does the patch look OK ?
> 
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> 
> Thanks,
> Prathamesh
> > 
> > Thanks,
> > Richard
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  7:33       ` Richard Biener
@ 2024-07-30  9:01         ` Richard Sandiford
  2024-07-30  9:07           ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2024-07-30  9:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
>
>> 
>> 
>> > -----Original Message-----
>> > From: Richard Sandiford <richard.sandiford@arm.com>
>> > Sent: Monday, July 29, 2024 9:43 PM
>> > To: Richard Biener <rguenther@suse.de>
>> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
>> > patches@gcc.gnu.org
>> > Subject: Re: Support streaming of poly_int for offloading when it's
>> > degree <= accel's NUM_POLY_INT_COEFFS
>> > 
>> > External email: Use caution opening links or attachments
>> > 
>> > 
>> > Richard Biener <rguenther@suse.de> writes:
>> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>> > >
>> > >> Hi Richard,
>> > >> Thanks for your suggestions on RFC email, the attached patch adds
>> > support for streaming of poly_int when it's degree <= accel's
>> > NUM_POLY_INT_COEFFS.
>> > >> The patch changes streaming of poly_int as follows:
>> > >>
>> > >> Streaming out poly_int:
>> > >>
>> > >> degree = poly_int.degree();
>> > >> stream out degree;
>> > >> for (i = 0; i < degree; i++)
>> > >>   stream out poly_int.coeffs[i];
>> > >>
>> > >> Streaming in poly_int:
>> > >>
>> > >> stream in degree;
>> > >> if (degree > NUM_POLY_INT_COEFFS)
>> > >>   fatal_error();
>> > >> stream in coeffs;
>> > >> // Set remaining coeffs to zero in case degree < accel's
>> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>> > >>   poly_int.coeffs[i] = 0;
>> > >>
>> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
>> > linux-gnu.
>> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> > >>
>> > >> I am not quite sure how to test it for offloading since currently
>> > it's (entirely) broken for aarch64->nvptx.
>> > >> I can give a try with x86_64->nvptx offloading if required (altho I
>> > >> guess LTO bootstrap should test streaming changes ?)
>> > >
>> > > +  unsigned degree
>> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
>> > > HOST_WIDE_INT));
>> > >
>> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
>> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
>> > > I'd say we should constrain it to a reasonable upper bound, like 2?
>> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
>> > in
>> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
>> > >
>> > > The patch looks reasonable over all, but Richard S. should have a
>> > say
>> > > about the abstraction you chose and the poly-int adjustment.
>> > 
>> > Sorry if this has been discussed already, but could we instead stream
>> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
>> > It's a target invariant, and poly_int has wormed its way into lots of
>> > things by now :)
>> Hi Richard,
>> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
>> During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
>> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
>> (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html

It's not clear to me what the plan is for VLA host + VLS offloading.
Is the streamed data guaranteed to be "clean" of any host-only
VLA stuff?  E.g. if code does:

  #include <arm_sve.h>

  svint32_t *ptr:
  void foo(svint32_t);

  #pragma GCC target "+nosve"

  ...offloading...

is there a guarantee that the offload target won't see the definition
of ptr and foo?

>> 
>> The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by any target,
>> and uses that for packing/unpacking degree of poly_int to/from bitstream, which should make it independent of the type used for representing NUM_POLY_INT_COEFFS by
>> the target.
>
> Just as additional comment - maybe we can avoid the POLY_INT_CST tree
> side if we'd consistently "canonicalize" a POLY_INT_CST with zero
> second coeff as INTEGER_CST instead?  This of course doesn't
> generalize to NUM_POLY_INT_COEFFS == 3 vs NUM_POLY_INT_COEFFS == 2.

That should already happen, via:

tree
wide_int_to_tree (tree type, const poly_wide_int_ref &value)
{
  if (value.is_constant ())
    return wide_int_to_tree_1 (type, value.coeffs[0]);
  return build_poly_int_cst (type, value);
}

etc.  So if we see POLY_INT_CSTs that could be INTEGER_CSTs, I think
that'd be a bug.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:01         ` Richard Sandiford
@ 2024-07-30  9:07           ` Richard Biener
  2024-07-30  9:17             ` Richard Sandiford
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-07-30  9:07 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Prathamesh Kulkarni, gcc-patches

On Tue, 30 Jul 2024, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
> >
> >> 
> >> 
> >> > -----Original Message-----
> >> > From: Richard Sandiford <richard.sandiford@arm.com>
> >> > Sent: Monday, July 29, 2024 9:43 PM
> >> > To: Richard Biener <rguenther@suse.de>
> >> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> >> > patches@gcc.gnu.org
> >> > Subject: Re: Support streaming of poly_int for offloading when it's
> >> > degree <= accel's NUM_POLY_INT_COEFFS
> >> > 
> >> > External email: Use caution opening links or attachments
> >> > 
> >> > 
> >> > Richard Biener <rguenther@suse.de> writes:
> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >> > >
> >> > >> Hi Richard,
> >> > >> Thanks for your suggestions on RFC email, the attached patch adds
> >> > support for streaming of poly_int when it's degree <= accel's
> >> > NUM_POLY_INT_COEFFS.
> >> > >> The patch changes streaming of poly_int as follows:
> >> > >>
> >> > >> Streaming out poly_int:
> >> > >>
> >> > >> degree = poly_int.degree();
> >> > >> stream out degree;
> >> > >> for (i = 0; i < degree; i++)
> >> > >>   stream out poly_int.coeffs[i];
> >> > >>
> >> > >> Streaming in poly_int:
> >> > >>
> >> > >> stream in degree;
> >> > >> if (degree > NUM_POLY_INT_COEFFS)
> >> > >>   fatal_error();
> >> > >> stream in coeffs;
> >> > >> // Set remaining coeffs to zero in case degree < accel's
> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >> > >>   poly_int.coeffs[i] = 0;
> >> > >>
> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> >> > linux-gnu.
> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >> > >>
> >> > >> I am not quite sure how to test it for offloading since currently
> >> > it's (entirely) broken for aarch64->nvptx.
> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> > >> guess LTO bootstrap should test streaming changes ?)
> >> > >
> >> > > +  unsigned degree
> >> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> >> > > HOST_WIDE_INT));
> >> > >
> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> >> > > I'd say we should constrain it to a reasonable upper bound, like 2?
> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> >> > in
> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >> > >
> >> > > The patch looks reasonable over all, but Richard S. should have a
> >> > say
> >> > > about the abstraction you chose and the poly-int adjustment.
> >> > 
> >> > Sorry if this has been discussed already, but could we instead stream
> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> >> > It's a target invariant, and poly_int has wormed its way into lots of
> >> > things by now :)
> >> Hi Richard,
> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
> >> During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
> >> (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
> 
> It's not clear to me what the plan is for VLA host + VLS offloading.
> Is the streamed data guaranteed to be "clean" of any host-only
> VLA stuff?  E.g. if code does:
> 
>   #include <arm_sve.h>
> 
>   svint32_t *ptr:
>   void foo(svint32_t);
> 
>   #pragma GCC target "+nosve"
> 
>   ...offloading...
> 
> is there a guarantee that the offload target won't see the definition
> of ptr and foo?

No.  If it sees any unsupported poly-* the offload compilation will fail.

I think all current issues are because of poly-* leaking in for cases
where a non-poly would have worked fine, but I have not had a look
myself.

> >> 
> >> The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by any target,
> >> and uses that for packing/unpacking degree of poly_int to/from bitstream, which should make it independent of the type used for representing NUM_POLY_INT_COEFFS by
> >> the target.
> >
> > Just as additional comment - maybe we can avoid the POLY_INT_CST tree
> > side if we'd consistently "canonicalize" a POLY_INT_CST with zero
> > second coeff as INTEGER_CST instead?  This of course doesn't
> > generalize to NUM_POLY_INT_COEFFS == 3 vs NUM_POLY_INT_COEFFS == 2.
> 
> That should already happen, via:
> 
> tree
> wide_int_to_tree (tree type, const poly_wide_int_ref &value)
> {
>   if (value.is_constant ())
>     return wide_int_to_tree_1 (type, value.coeffs[0]);
>   return build_poly_int_cst (type, value);
> }
> 
> etc.  So if we see POLY_INT_CSTs that could be INTEGER_CSTs, I think
> that'd be a bug.

I see.  So we should be able to get rid of the POLY_INT_CST changes
in the patch (and track down failure to canonicalize to INTEGER_CSTs).

For streaming of data structures with poly_int<> we still need to
do something and IMO the approach proposed is fine?

Richard.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:07           ` Richard Biener
@ 2024-07-30  9:17             ` Richard Sandiford
  2024-07-30  9:25               ` Richard Biener
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Sandiford @ 2024-07-30  9:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Prathamesh Kulkarni, gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Tue, 30 Jul 2024, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
>> >
>> >> 
>> >> 
>> >> > -----Original Message-----
>> >> > From: Richard Sandiford <richard.sandiford@arm.com>
>> >> > Sent: Monday, July 29, 2024 9:43 PM
>> >> > To: Richard Biener <rguenther@suse.de>
>> >> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
>> >> > patches@gcc.gnu.org
>> >> > Subject: Re: Support streaming of poly_int for offloading when it's
>> >> > degree <= accel's NUM_POLY_INT_COEFFS
>> >> > 
>> >> > External email: Use caution opening links or attachments
>> >> > 
>> >> > 
>> >> > Richard Biener <rguenther@suse.de> writes:
>> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
>> >> > >
>> >> > >> Hi Richard,
>> >> > >> Thanks for your suggestions on RFC email, the attached patch adds
>> >> > support for streaming of poly_int when it's degree <= accel's
>> >> > NUM_POLY_INT_COEFFS.
>> >> > >> The patch changes streaming of poly_int as follows:
>> >> > >>
>> >> > >> Streaming out poly_int:
>> >> > >>
>> >> > >> degree = poly_int.degree();
>> >> > >> stream out degree;
>> >> > >> for (i = 0; i < degree; i++)
>> >> > >>   stream out poly_int.coeffs[i];
>> >> > >>
>> >> > >> Streaming in poly_int:
>> >> > >>
>> >> > >> stream in degree;
>> >> > >> if (degree > NUM_POLY_INT_COEFFS)
>> >> > >>   fatal_error();
>> >> > >> stream in coeffs;
>> >> > >> // Set remaining coeffs to zero in case degree < accel's
>> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
>> >> > >>   poly_int.coeffs[i] = 0;
>> >> > >>
>> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
>> >> > linux-gnu.
>> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
>> >> > >>
>> >> > >> I am not quite sure how to test it for offloading since currently
>> >> > it's (entirely) broken for aarch64->nvptx.
>> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I
>> >> > >> guess LTO bootstrap should test streaming changes ?)
>> >> > >
>> >> > > +  unsigned degree
>> >> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
>> >> > > HOST_WIDE_INT));
>> >> > >
>> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
>> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
>> >> > > I'd say we should constrain it to a reasonable upper bound, like 2?
>> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
>> >> > in
>> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
>> >> > >
>> >> > > The patch looks reasonable over all, but Richard S. should have a
>> >> > say
>> >> > > about the abstraction you chose and the poly-int adjustment.
>> >> > 
>> >> > Sorry if this has been discussed already, but could we instead stream
>> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
>> >> > It's a target invariant, and poly_int has wormed its way into lots of
>> >> > things by now :)
>> >> Hi Richard,
>> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
>> >> During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
>> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
>> >> (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
>> 
>> It's not clear to me what the plan is for VLA host + VLS offloading.
>> Is the streamed data guaranteed to be "clean" of any host-only
>> VLA stuff?  E.g. if code does:
>> 
>>   #include <arm_sve.h>
>> 
>>   svint32_t *ptr:
>>   void foo(svint32_t);
>> 
>>   #pragma GCC target "+nosve"
>> 
>>   ...offloading...
>> 
>> is there a guarantee that the offload target won't see the definition
>> of ptr and foo?
>
> No.  If it sees any unsupported poly-* the offload compilation will fail.

Could it fail even if the offloading code doesn't refer to ptr and foo
directly?  Or is only "relevant" stuff streamed?

> I think all current issues are because of poly-* leaking in for cases
> where a non-poly would have worked fine, but I have not had a look
> myself.

One of the cases that Prathamesh mentions is streaming the mode sizes.
Are those modes "offload target modes" or "host modes"?  It seems like
it shouldn't be an error for the host to have VLA modes per se.  It's
just that those modes can't be used in the host/offload interface.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:17             ` Richard Sandiford
@ 2024-07-30  9:25               ` Richard Biener
  2024-07-30  9:45                 ` Jakub Jelinek
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Biener @ 2024-07-30  9:25 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Prathamesh Kulkarni, gcc-patches

On Tue, 30 Jul 2024, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 30 Jul 2024, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
> >> >
> >> >> 
> >> >> 
> >> >> > -----Original Message-----
> >> >> > From: Richard Sandiford <richard.sandiford@arm.com>
> >> >> > Sent: Monday, July 29, 2024 9:43 PM
> >> >> > To: Richard Biener <rguenther@suse.de>
> >> >> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> >> >> > patches@gcc.gnu.org
> >> >> > Subject: Re: Support streaming of poly_int for offloading when it's
> >> >> > degree <= accel's NUM_POLY_INT_COEFFS
> >> >> > 
> >> >> > External email: Use caution opening links or attachments
> >> >> > 
> >> >> > 
> >> >> > Richard Biener <rguenther@suse.de> writes:
> >> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >> >> > >
> >> >> > >> Hi Richard,
> >> >> > >> Thanks for your suggestions on RFC email, the attached patch adds
> >> >> > support for streaming of poly_int when it's degree <= accel's
> >> >> > NUM_POLY_INT_COEFFS.
> >> >> > >> The patch changes streaming of poly_int as follows:
> >> >> > >>
> >> >> > >> Streaming out poly_int:
> >> >> > >>
> >> >> > >> degree = poly_int.degree();
> >> >> > >> stream out degree;
> >> >> > >> for (i = 0; i < degree; i++)
> >> >> > >>   stream out poly_int.coeffs[i];
> >> >> > >>
> >> >> > >> Streaming in poly_int:
> >> >> > >>
> >> >> > >> stream in degree;
> >> >> > >> if (degree > NUM_POLY_INT_COEFFS)
> >> >> > >>   fatal_error();
> >> >> > >> stream in coeffs;
> >> >> > >> // Set remaining coeffs to zero in case degree < accel's
> >> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >> >> > >>   poly_int.coeffs[i] = 0;
> >> >> > >>
> >> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> >> >> > linux-gnu.
> >> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >> >> > >>
> >> >> > >> I am not quite sure how to test it for offloading since currently
> >> >> > it's (entirely) broken for aarch64->nvptx.
> >> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> >> > >> guess LTO bootstrap should test streaming changes ?)
> >> >> > >
> >> >> > > +  unsigned degree
> >> >> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> >> >> > > HOST_WIDE_INT));
> >> >> > >
> >> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> >> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> >> >> > > I'd say we should constrain it to a reasonable upper bound, like 2?
> >> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> >> >> > in
> >> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >> >> > >
> >> >> > > The patch looks reasonable over all, but Richard S. should have a
> >> >> > say
> >> >> > > about the abstraction you chose and the poly-int adjustment.
> >> >> > 
> >> >> > Sorry if this has been discussed already, but could we instead stream
> >> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> >> >> > It's a target invariant, and poly_int has wormed its way into lots of
> >> >> > things by now :)
> >> >> Hi Richard,
> >> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
> >> >> During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
> >> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
> >> >> (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
> >> 
> >> It's not clear to me what the plan is for VLA host + VLS offloading.
> >> Is the streamed data guaranteed to be "clean" of any host-only
> >> VLA stuff?  E.g. if code does:
> >> 
> >>   #include <arm_sve.h>
> >> 
> >>   svint32_t *ptr:
> >>   void foo(svint32_t);
> >> 
> >>   #pragma GCC target "+nosve"
> >> 
> >>   ...offloading...
> >> 
> >> is there a guarantee that the offload target won't see the definition
> >> of ptr and foo?
> >
> > No.  If it sees any unsupported poly-* the offload compilation will fail.
> 
> Could it fail even if the offloading code doesn't refer to ptr and foo
> directly?  Or is only "relevant" stuff streamed?

Only "relevant" stuff should be streamed - the offload code and all
trees refered to.

> > I think all current issues are because of poly-* leaking in for cases
> > where a non-poly would have worked fine, but I have not had a look
> > myself.
> 
> One of the cases that Prathamesh mentions is streaming the mode sizes.
> Are those modes "offload target modes" or "host modes"?  It seems like
> it shouldn't be an error for the host to have VLA modes per se.  It's
> just that those modes can't be used in the host/offload interface.

There's a requirement that a mode mapping exists from the host to
target enum machine_mode.  I don't remember exactly how we compute
that mapping and whether streaming of some data (and thus poly-int)
are part of this.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:25               ` Richard Biener
@ 2024-07-30  9:45                 ` Jakub Jelinek
  2024-07-30 11:07                   ` Richard Sandiford
  2024-07-30 11:14                   ` Prathamesh Kulkarni
  0 siblings, 2 replies; 23+ messages in thread
From: Jakub Jelinek @ 2024-07-30  9:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, Prathamesh Kulkarni, gcc-patches

On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> Only "relevant" stuff should be streamed - the offload code and all
> trees refered to.

Yeah.

> > > I think all current issues are because of poly-* leaking in for cases
> > > where a non-poly would have worked fine, but I have not had a look
> > > myself.
> > 
> > One of the cases that Prathamesh mentions is streaming the mode sizes.
> > Are those modes "offload target modes" or "host modes"?  It seems like
> > it shouldn't be an error for the host to have VLA modes per se.  It's
> > just that those modes can't be used in the host/offload interface.
> 
> There's a requirement that a mode mapping exists from the host to
> target enum machine_mode.  I don't remember exactly how we compute
> that mapping and whether streaming of some data (and thus poly-int)
> are part of this.

During streaming out, the code records what machine modes are being streamed
(in streamer_mode_table).
For those modes (and their inner modes) then lto_write_mode_table
should stream a table with mode details like class, bits, size, inner mode,
nunits, real mode format if any, etc.
That table is then streamed in in the offloading compiler and it attempts to
find corresponding modes (and emits fatal_error if there is no such mode;
consider say x86_64 long double with XFmode being used in offloading code
which doesn't have XFmode support).
Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int rather
than int, this has been changed to use bp_pack_poly_value; but that relies
on the same number of coefficients for poly_int, which is not the case when
e.g. offloading aarch64 to gcn or nvptx.

From what I can see, this mode table handling are the only uses of
bp_pack_poly_value.  So the options are either to stream at the start of the
mode table the NUM_POLY_INT_COEFFS value and in bp_unpack_poly_value pass to
it what we've read and fill in any remaining coeffs with zeros, or in each
bp_pack_poly_value stream the number of coefficients and then stream that
back in and fill in remaining ones (and diagnose if it would try to read
non-zero coefficient which isn't stored).
I think streaming NUM_POLY_INT_COEFFS once would be more compact (at least
for non-aarch64/riscv targets).

	Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:45                 ` Jakub Jelinek
@ 2024-07-30 11:07                   ` Richard Sandiford
  2024-07-30 11:14                   ` Prathamesh Kulkarni
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Sandiford @ 2024-07-30 11:07 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Prathamesh Kulkarni, gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
>> Only "relevant" stuff should be streamed - the offload code and all
>> trees refered to.
>
> Yeah.
>
>> > > I think all current issues are because of poly-* leaking in for cases
>> > > where a non-poly would have worked fine, but I have not had a look
>> > > myself.
>> > 
>> > One of the cases that Prathamesh mentions is streaming the mode sizes.
>> > Are those modes "offload target modes" or "host modes"?  It seems like
>> > it shouldn't be an error for the host to have VLA modes per se.  It's
>> > just that those modes can't be used in the host/offload interface.
>> 
>> There's a requirement that a mode mapping exists from the host to
>> target enum machine_mode.  I don't remember exactly how we compute
>> that mapping and whether streaming of some data (and thus poly-int)
>> are part of this.
>
> During streaming out, the code records what machine modes are being streamed
> (in streamer_mode_table).
> For those modes (and their inner modes) then lto_write_mode_table
> should stream a table with mode details like class, bits, size, inner mode,
> nunits, real mode format if any, etc.
> That table is then streamed in in the offloading compiler and it attempts to
> find corresponding modes (and emits fatal_error if there is no such mode;
> consider say x86_64 long double with XFmode being used in offloading code
> which doesn't have XFmode support).
> Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int rather
> than int, this has been changed to use bp_pack_poly_value; but that relies
> on the same number of coefficients for poly_int, which is not the case when
> e.g. offloading aarch64 to gcn or nvptx.
>
> From what I can see, this mode table handling are the only uses of
> bp_pack_poly_value.  So the options are either to stream at the start of the
> mode table the NUM_POLY_INT_COEFFS value and in bp_unpack_poly_value pass to
> it what we've read and fill in any remaining coeffs with zeros, or in each
> bp_pack_poly_value stream the number of coefficients and then stream that
> back in and fill in remaining ones (and diagnose if it would try to read
> non-zero coefficient which isn't stored).
> I think streaming NUM_POLY_INT_COEFFS once would be more compact (at least
> for non-aarch64/riscv targets).

Ah, ok, thanks for the explanation.  In that case, I agree that either
of those two would work (no personal preference for which).

Richard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30  9:45                 ` Jakub Jelinek
  2024-07-30 11:07                   ` Richard Sandiford
@ 2024-07-30 11:14                   ` Prathamesh Kulkarni
  2024-07-31 14:58                     ` Prathamesh Kulkarni
  1 sibling, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-07-30 11:14 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: Richard Sandiford, gcc-patches



> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Tuesday, July 30, 2024 3:16 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; Prathamesh Kulkarni
> <prathameshk@nvidia.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > Only "relevant" stuff should be streamed - the offload code and all
> > trees refered to.
> 
> Yeah.
> 
> > > > I think all current issues are because of poly-* leaking in for
> > > > cases where a non-poly would have worked fine, but I have not
> had
> > > > a look myself.
> > >
> > > One of the cases that Prathamesh mentions is streaming the mode
> sizes.
> > > Are those modes "offload target modes" or "host modes"?  It seems
> > > like it shouldn't be an error for the host to have VLA modes per
> se.
> > > It's just that those modes can't be used in the host/offload
> interface.
> >
> > There's a requirement that a mode mapping exists from the host to
> > target enum machine_mode.  I don't remember exactly how we compute
> > that mapping and whether streaming of some data (and thus poly-int)
> > are part of this.
> 
> During streaming out, the code records what machine modes are being
> streamed (in streamer_mode_table).
> For those modes (and their inner modes) then lto_write_mode_table
> should stream a table with mode details like class, bits, size, inner
> mode, nunits, real mode format if any, etc.
> That table is then streamed in in the offloading compiler and it
> attempts to find corresponding modes (and emits fatal_error if there
> is no such mode; consider say x86_64 long double with XFmode being
> used in offloading code which doesn't have XFmode support).
> Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> rather than int, this has been changed to use bp_pack_poly_value; but
> that relies on the same number of coefficients for poly_int, which is
> not the case when e.g. offloading aarch64 to gcn or nvptx.
Indeed, for the minimal test:
int main()
{
  int x;
  #pragma omp target map (to: x)
  {
    x = 0;
  }
  return x;
}

Streaming out mode_table from AArch64 shows:
mode = SI, mclass = 2, size = 4, prec = 32
mode = DI, mclass = 2, size = 8, prec = 64

While streaming-in for nvptx shows:
mclass = 2, size = 4, prec = 0

The discrepancy happens because of differing value of NUM_POLY_INT_COEFFS between AArch64 and nvptx.
From AArch64 it streams out size and prec as <4, 0> and <32, 0> respectively, where 0 comes from coeffs[1].
While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it incorrectly reads size as 4, and prec as 0.
> 
> From what I can see, this mode table handling are the only uses of
> bp_pack_poly_value.  So the options are either to stream at the start
> of the mode table the NUM_POLY_INT_COEFFS value and in
> bp_unpack_poly_value pass to it what we've read and fill in any
> remaining coeffs with zeros, or in each bp_pack_poly_value stream the
> number of coefficients and then stream that back in and fill in
> remaining ones (and diagnose if it would try to read non-zero
> coefficient which isn't stored).
This is the approach taken in proposed patch (stream-out degree of poly_int followed by coeffs).

> I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> least for non-aarch64/riscv targets).
I will try implementing this, thanks.

Thanks,
Prathamesh
> 
>         Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-29 10:13 Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS Prathamesh Kulkarni
  2024-07-29 11:29 ` Richard Biener
@ 2024-07-30 12:37 ` Tobias Burnus
  2024-07-31 15:01   ` Prathamesh Kulkarni
  1 sibling, 1 reply; 23+ messages in thread
From: Tobias Burnus @ 2024-07-30 12:37 UTC (permalink / raw)
  To: Prathamesh Kulkarni, gcc-patches

Prathamesh Kulkarni wrote:
> Thanks for your suggestions on RFC email, the attached patch adds support for streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.

First, thanks a lot for your patch!

Secondly, it seems as if this patch is indented to fully or partially 
fix the following PRs.
If so, can you add the PR to the commit log such that both "git log"
will help finding the problem report and the commit will show up
in the issue?


https://gcc.gnu.org/PR111937
   PR ipa/111937
   offloading from x86_64-linux-gnu to riscv*-linux-gnu will have issues

https://gcc.gnu.org/PR96265
   PR ipa/96265
   offloading to nvptx-none from aarch64-linux-gnu (and 
riscv*-linux-gnu) does not work

And - marked as duplicate of the latter:

https://gcc.gnu.org/PR114174
   PR lto/114174
   [aarch64] Offloading to nvptx-none

Thanks,

Tobias

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30 11:14                   ` Prathamesh Kulkarni
@ 2024-07-31 14:58                     ` Prathamesh Kulkarni
  2024-07-31 15:15                       ` Jakub Jelinek
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-07-31 14:58 UTC (permalink / raw)
  To: Prathamesh Kulkarni, Jakub Jelinek, Richard Biener
  Cc: Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6078 bytes --]



> -----Original Message-----
> From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Sent: Tuesday, July 30, 2024 4:44 PM
> To: Jakub Jelinek <jakub@redhat.com>; Richard Biener
> <rguenther@suse.de>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: RE: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Tuesday, July 30, 2024 3:16 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Prathamesh
> Kulkarni
> > <prathameshk@nvidia.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: Support streaming of poly_int for offloading when it's
> > degree <= accel's NUM_POLY_INT_COEFFS
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > > Only "relevant" stuff should be streamed - the offload code and
> all
> > > trees refered to.
> >
> > Yeah.
> >
> > > > > I think all current issues are because of poly-* leaking in
> for
> > > > > cases where a non-poly would have worked fine, but I have not
> > had
> > > > > a look myself.
> > > >
> > > > One of the cases that Prathamesh mentions is streaming the mode
> > sizes.
> > > > Are those modes "offload target modes" or "host modes"?  It
> seems
> > > > like it shouldn't be an error for the host to have VLA modes per
> > se.
> > > > It's just that those modes can't be used in the host/offload
> > interface.
> > >
> > > There's a requirement that a mode mapping exists from the host to
> > > target enum machine_mode.  I don't remember exactly how we compute
> > > that mapping and whether streaming of some data (and thus poly-
> int)
> > > are part of this.
> >
> > During streaming out, the code records what machine modes are being
> > streamed (in streamer_mode_table).
> > For those modes (and their inner modes) then lto_write_mode_table
> > should stream a table with mode details like class, bits, size,
> inner
> > mode, nunits, real mode format if any, etc.
> > That table is then streamed in in the offloading compiler and it
> > attempts to find corresponding modes (and emits fatal_error if there
> > is no such mode; consider say x86_64 long double with XFmode being
> > used in offloading code which doesn't have XFmode support).
> > Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> > rather than int, this has been changed to use bp_pack_poly_value;
> but
> > that relies on the same number of coefficients for poly_int, which
> is
> > not the case when e.g. offloading aarch64 to gcn or nvptx.
> Indeed, for the minimal test:
> int main()
> {
>   int x;
>   #pragma omp target map (to: x)
>   {
>     x = 0;
>   }
>   return x;
> }
> 
> Streaming out mode_table from AArch64 shows:
> mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass = 2, size
> = 8, prec = 64
> 
> While streaming-in for nvptx shows:
> mclass = 2, size = 4, prec = 0
> 
> The discrepancy happens because of differing value of
> NUM_POLY_INT_COEFFS between AArch64 and nvptx.
> From AArch64 it streams out size and prec as <4, 0> and <32, 0>
> respectively, where 0 comes from coeffs[1].
> While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it
> incorrectly reads size as 4, and prec as 0.
> >
> > From what I can see, this mode table handling are the only uses of
> > bp_pack_poly_value.  So the options are either to stream at the
> start
> > of the mode table the NUM_POLY_INT_COEFFS value and in
> > bp_unpack_poly_value pass to it what we've read and fill in any
> > remaining coeffs with zeros, or in each bp_pack_poly_value stream
> the
> > number of coefficients and then stream that back in and fill in
> > remaining ones (and diagnose if it would try to read non-zero
> > coefficient which isn't stored).
> This is the approach taken in proposed patch (stream-out degree of
> poly_int followed by coeffs).
> 
> > I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> > least for non-aarch64/riscv targets).
> I will try implementing this, thanks.
Hi,
The attached patch streams-out NUM_POLY_INT_COEFFS only once at beginning of mode_table, which should make LTO bytecode more compact
for non VLA hosts. And changes streaming-in of poly_int as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Set remaining coeffs to zero (like zero-extension).  */
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
        error ();
    }
}

There are a couple of issues in the patch:
(1) The patch streams out NUM_POLY_INT_COEFFS at beginning of mode_table, which should work for bp_unpack_poly_value,
(since AFAIK, it's only called by lto_input_mode_table). However, I am not sure if we will always call lto_input_mode_table
before streaming in poly_int64 / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a different place in LTO bytecode ?

(2) The patch defines POLY_INT_READ_COMMON macro for factoring out common code to read poly_int, however, I am not sure
how to define a callback for different streaming functions like streamer_read_[u]hwi, bp_unpack value since they have different
signatures. The patch uses an (ugly) kludge streamer_read_coeff, which is essentially a call to streaming-in function.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh
> 
> Thanks,
> Prathamesh
> >
> >         Jakub


[-- Attachment #2: p-163-6.txt --]
[-- Type: text/plain, Size: 8273 bytes --]

Partially support streaming of poly_int for offloading.

The patch streams out host NUM_POLY_INT_COEFFS, and changes
streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
	error ();
    }
}

gcc/ChangeLog:
	PR ipa/96265
	PR ipa/111937
	* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
	streaming, and call POLY_INT_READ_COMMON instead.
	(streamer_read_poly_int64): Likewise.
	* data-streamer.cc (host_num_poly_int_coeffs): New variable.
	* data-streamer.h (host_num_poly_int_coeffs): Declare.
	(POLY_INT_READ_COMMON): New macro.
	(bp_unpack_poly_value): Remove code for streaming and call
	POLY_INT_READ_COMMON instead.
	* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
	NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs.
	* lto-streamer-out.cc (lto_write_mode_table): Stream out
	NUM_POLY_INT_COEFFS.
	* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
	streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..e18c6462316 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -183,9 +183,7 @@ poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))
 }
 
 /* Read a poly_int64 from IB.  */
@@ -194,9 +192,7 @@ poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc
index 346b294c72a..d2e9634d62f 100644
--- a/gcc/data-streamer.cc
+++ b/gcc/data-streamer.cc
@@ -28,6 +28,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "data-streamer.h"
 
+/* While streaming-out, host NUM_POLY_INT_COEFFS is stored at beginning
+   of mode_table. While streaming-in, the value is read in
+   host_num_poly_int_coeffs.  */
+
+unsigned host_num_poly_int_coeffs;
+
 /* Pack WORK into BP in a variant of uleb format.  */
 
 void
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..3b26075c79f 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const REAL_VALUE_TYPE *);
 void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *);
 unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *);
 HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *);
+extern unsigned host_num_poly_int_coeffs;
 
 /* In data-streamer-out.cc  */
 void streamer_write_zero (struct output_block *);
@@ -194,15 +195,51 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+/* Common code for reading poly_int.
+   FIXME: streamer_read_coeff is an (ugly) kludge, it relies on the caller
+   passing a "function call" like bp_unpack_value (bp, nbits) or
+   streamer_read_uhwi (ib) which will read the next coeff from respective stream.
+   I am not sure if we could use a callback because streaming functions
+   streamer_read_[u]hwi, bp_unpack_value have different signatures.  */
+
+#define POLY_INT_READ_COMMON(x, streamer_read_coeff)			\
+{									\
+  unsigned i;								\
+									\
+  if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)			\
+    {									\
+      for (i = 0; i < host_num_poly_int_coeffs; i++)			\
+	x.coeffs[i] = streamer_read_coeff;				\
+      for (; i < NUM_POLY_INT_COEFFS; i++)				\
+	x.coeffs[i] = 0;						\
+    }									\
+  else									\
+    {									\
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)				\
+	x.coeffs[i] = streamer_read_coeff;				\
+									\
+      /* Ensure remaining coeffs are zero.  */				\
+      for (; i < host_num_poly_int_coeffs; i++)				\
+	{								\
+	  __typeof(x.coeffs[0]) val = streamer_read_coeff;		\
+	  if (val != 0)							\
+	    fatal_error (input_location,				\
+			 "Degree of %<poly_int%> exceeds "		\
+			 "%<NUM_POLY_INT_COEFFS%> (%d)",		\
+			 NUM_POLY_INT_COEFFS);				\
+	}								\
+    }									\
+									\
+  return x;								\
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
   poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+  POLY_INT_READ_COMMON(x, bp_unpack_value (bp, nbits))
 }
 
 
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..3e2c786fc36 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -2013,6 +2013,9 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
 				header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack (&ib);
 
+  host_num_poly_int_coeffs
+    = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   unsigned mode_bits = bp_unpack_value (&bp, 5);
   unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits);
 
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index c329ac8af95..091e4126965 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3192,6 +3192,8 @@ lto_write_mode_table (void)
   ob = create_output_block (LTO_section_mode_table);
   bitpack_d bp = bitpack_create (ob->main_stream);
 
+  bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   /* Ensure that for GET_MODE_INNER (m) != m we have
      also the inner mode marked.  */
   for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..8d3e6098f0b 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS	(2)
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..c41803aa21e 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,29 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+  unsigned i;
+  if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < host_num_poly_int_coeffs; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+      tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+      for (; i < host_num_poly_int_coeffs; i++)
+	{
+	  tree val = stream_read_tree_ref (ib, data_in);
+	  if (!integer_zerop (val))
+	    fatal_error (input_location,
+			 "Degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%>");
+	}
+    }
 }
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-30 12:37 ` Tobias Burnus
@ 2024-07-31 15:01   ` Prathamesh Kulkarni
  0 siblings, 0 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-07-31 15:01 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches



> -----Original Message-----
> From: Tobias Burnus <tburnus@baylibre.com>
> Sent: Tuesday, July 30, 2024 6:08 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Prathamesh Kulkarni wrote:
> > Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> 
> First, thanks a lot for your patch!
> 
> Secondly, it seems as if this patch is indented to fully or partially
> fix the following PRs.
> If so, can you add the PR to the commit log such that both "git log"
> will help finding the problem report and the commit will show up in
> the issue?
Hi Tobias,
Thanks for the pointers to relevant Bugzilla PRs! I have included them in my latest patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658866.html

Thanks,
Prathamesh
> 
> 
> https://gcc.gnu.org/PR111937
>    PR ipa/111937
>    offloading from x86_64-linux-gnu to riscv*-linux-gnu will have
> issues
> 
> https://gcc.gnu.org/PR96265
>    PR ipa/96265
>    offloading to nvptx-none from aarch64-linux-gnu (and
> riscv*-linux-gnu) does not work
> 
> And - marked as duplicate of the latter:
> 
> https://gcc.gnu.org/PR114174
>    PR lto/114174
>    [aarch64] Offloading to nvptx-none
> 
> Thanks,
> 
> Tobias

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-31 14:58                     ` Prathamesh Kulkarni
@ 2024-07-31 15:15                       ` Jakub Jelinek
  2024-08-02 11:58                         ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Jelinek @ 2024-07-31 15:15 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Wed, Jul 31, 2024 at 02:58:34PM +0000, Prathamesh Kulkarni wrote:
> There are a couple of issues in the patch:
> (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of mode_table, which should work for bp_unpack_poly_value,
> (since AFAIK, it's only called by lto_input_mode_table). However, I am not sure if we will always call lto_input_mode_table
> before streaming in poly_int64 / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a different place in LTO bytecode ?

The poly_ints unpacked in lto_input_mode_table obviously are done after
that.
If you use it for streaming in from other sections, you need to check if
they can't be read before the mode table.
And, you don't really need to stream out/in the number for non-offloading
LTO, that should use just NUM_POLY_INT_COEFFS.

> --- a/gcc/data-streamer-in.cc
> +++ b/gcc/data-streamer-in.cc
> @@ -183,9 +183,7 @@ poly_uint64
>  streamer_read_poly_uint64 (class lto_input_block *ib)
>  {
>    poly_uint64 res;
> -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> -    res.coeffs[i] = streamer_read_uhwi (ib);
> -  return res;
> +  POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))

Why is this macro and not an inline function or inline function template
oor inline function calling a lambda?
Even if it has to be a macro (I don't see why), it should be defined such
that you need to add ; at the end, ideally not include the return res;
in there because it is just too weird if used like that (or make it return
what will be returned and use return POLY_INT_READ_COMMON...)
and there needs to be a space in between COMMON and (.

> @@ -194,9 +192,7 @@ poly_int64
>  streamer_read_poly_int64 (class lto_input_block *ib)
>  {
>    poly_int64 res;
> -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> -    res.coeffs[i] = streamer_read_hwi (ib);
> -  return res;
> +  POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
>  }

Ditto.
> +	  __typeof(x.coeffs[0]) val = streamer_read_coeff;		\

You certainly can't use a GCC extension like __typeof here.
Plus missing space.

> +	  if (val != 0)							\
> +	    fatal_error (input_location,				\
> +			 "Degree of %<poly_int%> exceeds "		\

Diagnostics shouldn't start with uppercase letter.

> +			 "%<NUM_POLY_INT_COEFFS%> (%d)",		\
> +			 NUM_POLY_INT_COEFFS);				\
> +	}								\
> +    }									\
> +									\
> +  return x;								\
> +}
> +
> --- a/gcc/poly-int.h
> +++ b/gcc/poly-int.h
> @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
>     ? (void) ((RES).coeffs[I] = VALUE) \
>     : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>  
> +/* Number of bits needed to represent maximum value of
> +   NUM_POLY_INT_COEFFS defined by any target.  */
> +#define MAX_NUM_POLY_INT_COEFFS_BITS	(2)

Why (2) and not just 2?
There should be some static_assert to make sure it is a maximum for any
target.

> +	  if (!integer_zerop (val))
> +	    fatal_error (input_location,
> +			 "Degree of %<poly_int%> exceeds "

Again.
> +			 "%<NUM_POLY_INT_COEFFS%>");
> +	}
> +    }
>  }

	Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-07-31 15:15                       ` Jakub Jelinek
@ 2024-08-02 11:58                         ` Prathamesh Kulkarni
  2024-08-02 12:13                           ` Jakub Jelinek
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-02 11:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4672 bytes --]



> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Wednesday, July 31, 2024 8:46 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Biener <rguenther@suse.de>; Richard Sandiford
> <richard.sandiford@arm.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Jul 31, 2024 at 02:58:34PM +0000, Prathamesh Kulkarni wrote:
> > There are a couple of issues in the patch:
> > (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of
> > mode_table, which should work for bp_unpack_poly_value, (since AFAIK,
> > it's only called by lto_input_mode_table). However, I am not sure if
> we will always call lto_input_mode_table before streaming in poly_int64
> / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a
> different place in LTO bytecode ?
> 
> The poly_ints unpacked in lto_input_mode_table obviously are done after
> that.
> If you use it for streaming in from other sections, you need to check if
> they can't be read before the mode table.
> And, you don't really need to stream out/in the number for non-
> offloading LTO, that should use just NUM_POLY_INT_COEFFS.
> 
> > --- a/gcc/data-streamer-in.cc
> > +++ b/gcc/data-streamer-in.cc
> > @@ -183,9 +183,7 @@ poly_uint64
> >  streamer_read_poly_uint64 (class lto_input_block *ib)  {
> >    poly_uint64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -    res.coeffs[i] = streamer_read_uhwi (ib);
> > -  return res;
> > +  POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))
> 
> Why is this macro and not an inline function or inline function template
> oor inline function calling a lambda?
> Even if it has to be a macro (I don't see why), it should be defined
> such that you need to add ; at the end, ideally not include the return
> res; in there because it is just too weird if used like that (or make it
> return what will be returned and use return POLY_INT_READ_COMMON...) and
> there needs to be a space in between COMMON and (.
> 
> > @@ -194,9 +192,7 @@ poly_int64
> >  streamer_read_poly_int64 (class lto_input_block *ib)  {
> >    poly_int64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -    res.coeffs[i] = streamer_read_hwi (ib);
> > -  return res;
> > +  POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
> >  }
> 
> Ditto.
> > +       __typeof(x.coeffs[0]) val = streamer_read_coeff;
> \
> 
> You certainly can't use a GCC extension like __typeof here.
> Plus missing space.
> 
> > +       if (val != 0)
> \
> > +         fatal_error (input_location,
> \
> > +                      "Degree of %<poly_int%> exceeds "
> \
> 
> Diagnostics shouldn't start with uppercase letter.
> 
> > +                      "%<NUM_POLY_INT_COEFFS%> (%d)",
> \
> > +                      NUM_POLY_INT_COEFFS);
> \
> > +     }
> \
> > +    }
> \
> > +
> \
> > +  return x;
> \
> > +}
> > +
> > --- a/gcc/poly-int.h
> > +++ b/gcc/poly-int.h
> > @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
> >     ? (void) ((RES).coeffs[I] = VALUE) \
> >     : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C
> > (VALUE)))
> >
> > +/* Number of bits needed to represent maximum value of
> > +   NUM_POLY_INT_COEFFS defined by any target.  */ #define
> > +MAX_NUM_POLY_INT_COEFFS_BITS (2)
> 
> Why (2) and not just 2?
> There should be some static_assert to make sure it is a maximum for any
> target.
> 
> > +       if (!integer_zerop (val))
> > +         fatal_error (input_location,
> > +                      "Degree of %<poly_int%> exceeds "
> 
> Again.
Hi Jakub,
Thanks for the suggestions. The attached patch streams out NUM_POLY_INT_COEFFS only if offloading is enabled,
defines poly_int_read_common to be variadic template, and asserts that host_num_poly_int_coeffs is streamed-in
before reading poly_int (AFAIU, the only user of streamer_read_poly_int64 currently is ipa-modref pass).

Patch passes LTO bootstrap+test on aarch64-linux-gnu and in-progress on x86_64-linux-gnu. To test this patch with
offloading enabled, I have so far just been testing libgomp (make check-target-libgomp). Should I be testing any
additional parts of the testsuite for offloading ?

Does the attached patch look in the right direction ?

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Thanks,
Prathamesh
> > +                      "%<NUM_POLY_INT_COEFFS%>");
> > +     }
> > +    }
> >  }
> 
>         Jakub


[-- Attachment #2: p-163-7.txt --]
[-- Type: text/plain, Size: 8541 bytes --]

Partially support streaming of poly_int for offloading.

The patch streams out host NUM_POLY_INT_COEFFS, and changes
streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
	error ();
    }
}

gcc/ChangeLog:
	PR ipa/96265
	PR ipa/111937
	* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
	streaming, and call poly_int_read_common instead. 
	(streamer_read_poly_int64): Likewise.
	* data-streamer.cc (host_num_poly_int_coeffs): New variable.
	* data-streamer.h (host_num_poly_int_coeffs): Declare.
	(poly_int_read_common): New function template.
	(bp_unpack_poly_value): Remove code for streaming and call
	poly_int_read_common instead.
	* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
	NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs.
	* lto-streamer-out.cc (lto_write_mode_table): Stream out
	NUM_POLY_INT_COEFFS if offloading is enabled.
	* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
	streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..7b9d8cc0129 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
-  poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common<poly_int_traits<poly_uint64>::coeff_type>
+	 (streamer_read_uhwi, ib);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +191,8 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
-  poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common<poly_int_traits<poly_int64>::coeff_type>
+	 (streamer_read_hwi, ib);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc
index 346b294c72a..5bf2f086d2a 100644
--- a/gcc/data-streamer.cc
+++ b/gcc/data-streamer.cc
@@ -28,6 +28,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "data-streamer.h"
 
+/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
+   stored at beginning of mode_table. While streaming-in, the value is read in
+   host_num_poly_int_coeffs.  */
+
+unsigned host_num_poly_int_coeffs = 0;
+
 /* Pack WORK into BP in a variant of uleb format.  */
 
 void
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..c77eacb74ca 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const REAL_VALUE_TYPE *);
 void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *);
 unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *);
 HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *);
+extern unsigned host_num_poly_int_coeffs;
 
 /* In data-streamer-out.cc  */
 void streamer_write_zero (struct output_block *);
@@ -194,15 +195,53 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+/* Common code for reading poly_int.  */
+
+template<typename C, typename F, typename ...Args>
+poly_int<NUM_POLY_INT_COEFFS, C>
+poly_int_read_common (F read_coeff, Args ...args)
+{
+  poly_int<NUM_POLY_INT_COEFFS, C> x;
+  unsigned i;
+
+#ifndef ACCEL_COMPILER
+  host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  gcc_assert (host_num_poly_int_coeffs > 0);
+  if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < host_num_poly_int_coeffs; i++)
+	x.coeffs[i] = read_coeff (args...);
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = 0;
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = read_coeff (args...);
+
+      /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
+      for (; i < host_num_poly_int_coeffs; i++)
+	{
+	  C val = read_coeff (args...);
+	  if (val != 0)
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%> (%d)",
+			 NUM_POLY_INT_COEFFS);
+	}
+    }
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
-  poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+  return poly_int_read_common<bitpack_word_t> (bp_unpack_value, bp, nbits);
 }
 
 
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..3e2c786fc36 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -2013,6 +2013,9 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
 				header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack (&ib);
 
+  host_num_poly_int_coeffs
+    = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   unsigned mode_bits = bp_unpack_value (&bp, 5);
   unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits);
 
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index c329ac8af95..523d6dad221 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3192,6 +3192,9 @@ lto_write_mode_table (void)
   ob = create_output_block (LTO_section_mode_table);
   bitpack_d bp = bitpack_create (ob->main_stream);
 
+  if (lto_stream_offload_p)
+    bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   /* Ensure that for GET_MODE_INNER (m) != m we have
      also the inner mode marked.  */
   for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..94708165961 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS	2
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..62200ef2d9d 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,35 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+#ifndef ACCEL_COMPILER
+  host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  gcc_assert (host_num_poly_int_coeffs > 0);
+  unsigned i;
+  if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < host_num_poly_int_coeffs; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+      tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+      for (; i < host_num_poly_int_coeffs; i++)
+	{
+	  tree val = stream_read_tree_ref (ib, data_in);
+	  if (!integer_zerop (val))
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%>");
+	}
+    }
 }
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-02 11:58                         ` Prathamesh Kulkarni
@ 2024-08-02 12:13                           ` Jakub Jelinek
  2024-08-05 14:24                             ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Jelinek @ 2024-08-02 12:13 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Fri, Aug 02, 2024 at 11:58:19AM +0000, Prathamesh Kulkarni wrote:
> diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
> index 7dce2928ef0..7b9d8cc0129 100644
> --- a/gcc/data-streamer-in.cc
> +++ b/gcc/data-streamer-in.cc
> @@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
>  poly_uint64
>  streamer_read_poly_uint64 (class lto_input_block *ib)
>  {
> -  poly_uint64 res;
> -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> -    res.coeffs[i] = streamer_read_uhwi (ib);
> -  return res;
> +  return poly_int_read_common<poly_int_traits<poly_uint64>::coeff_type>
> +	 (streamer_read_uhwi, ib);

Can't you use
  using coeff_type = poly_int_traits <poly_uint64>::coeff_type;
  return poly_int_read_common <coeff_type> (streamer_read_uhwi, ib);
?
The call arguments on different line from the actual function name are
ugly.

> --- a/gcc/data-streamer.cc
> +++ b/gcc/data-streamer.cc
> @@ -28,6 +28,12 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cgraph.h"
>  #include "data-streamer.h"
>  
> +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
> +   stored at beginning of mode_table. While streaming-in, the value is read in
> +   host_num_poly_int_coeffs.  */
> +
> +unsigned host_num_poly_int_coeffs = 0;

I think it would be better to guard this with
#ifdef ACCEL_COMPILER.

> +template<typename C, typename F, typename ...Args>
> +poly_int<NUM_POLY_INT_COEFFS, C>
> +poly_int_read_common (F read_coeff, Args ...args)
> +{
> +  poly_int<NUM_POLY_INT_COEFFS, C> x;
> +  unsigned i;
> +
> +#ifndef ACCEL_COMPILER
> +  host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
> +#endif

And instead of modifying a global var again and again do
#ifdef ACCEL_COMPILER
  const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
  gcc_assert (num_poly_int_coeffs > 0);
#else
  const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
#endif
and use num_poly_int_coeffs in the functions.

	Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-02 12:13                           ` Jakub Jelinek
@ 2024-08-05 14:24                             ` Prathamesh Kulkarni
  2024-08-05 14:30                               ` Jakub Jelinek
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-05 14:24 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]



> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Friday, August 2, 2024 5:43 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Biener <rguenther@suse.de>; Richard Sandiford
> <richard.sandiford@arm.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Aug 02, 2024 at 11:58:19AM +0000, Prathamesh Kulkarni wrote:
> > diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc index
> > 7dce2928ef0..7b9d8cc0129 100644
> > --- a/gcc/data-streamer-in.cc
> > +++ b/gcc/data-streamer-in.cc
> > @@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
> >  poly_uint64
> >  streamer_read_poly_uint64 (class lto_input_block *ib)  {
> > -  poly_uint64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -    res.coeffs[i] = streamer_read_uhwi (ib);
> > -  return res;
> > +  return
> poly_int_read_common<poly_int_traits<poly_uint64>::coeff_type>
> > +      (streamer_read_uhwi, ib);
> 
> Can't you use
>   using coeff_type = poly_int_traits <poly_uint64>::coeff_type;
>   return poly_int_read_common <coeff_type> (streamer_read_uhwi, ib); ?
> The call arguments on different line from the actual function name are
> ugly.
> 
> > --- a/gcc/data-streamer.cc
> > +++ b/gcc/data-streamer.cc
> > @@ -28,6 +28,12 @@ along with GCC; see the file COPYING3.  If not see
> > #include "cgraph.h"
> >  #include "data-streamer.h"
> >
> > +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
> > +   stored at beginning of mode_table. While streaming-in, the value
> is read in
> > +   host_num_poly_int_coeffs.  */
> > +
> > +unsigned host_num_poly_int_coeffs = 0;
> 
> I think it would be better to guard this with #ifdef ACCEL_COMPILER.
> 
> > +template<typename C, typename F, typename ...Args>
> > +poly_int<NUM_POLY_INT_COEFFS, C> poly_int_read_common (F read_coeff,
> > +Args ...args) {
> > +  poly_int<NUM_POLY_INT_COEFFS, C> x;
> > +  unsigned i;
> > +
> > +#ifndef ACCEL_COMPILER
> > +  host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS; #endif
> 
> And instead of modifying a global var again and again do #ifdef
> ACCEL_COMPILER
>   const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
>   gcc_assert (num_poly_int_coeffs > 0);
> #else
>   const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS; #endif and
> use num_poly_int_coeffs in the functions.
Hi Jakub,
I have done the suggested changes in the attached patch.
Does it look OK ?

Thanks,
Prathamesh
> 
>         Jakub


[-- Attachment #2: p-163-8.txt --]
[-- Type: text/plain, Size: 8875 bytes --]

Partially support streaming of poly_int for offloading.

The patch streams out host NUM_POLY_INT_COEFFS, and changes
streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
	error ();
    }
}

gcc/ChangeLog:
	PR ipa/96265
	PR ipa/111937
	* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
	streaming, and call poly_int_read_common instead. 
	(streamer_read_poly_int64): Likewise.
	* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
	new variable if ACCEL_COMPILER is defined.
	* data-streamer.h (host_num_poly_int_coeffs): Declare.
	(poly_int_read_common): New function template.
	(bp_unpack_poly_value): Remove code for streaming and call
	poly_int_read_common instead.
	* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
	NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
	is defined.
	* lto-streamer-out.cc (lto_write_mode_table): Stream out
	NUM_POLY_INT_COEFFS if offloading is enabled.
	* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
	streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..57955a20091 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
-  poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  using coeff_type = poly_int_traits<poly_uint64>::coeff_type;
+  return poly_int_read_common<coeff_type> (streamer_read_hwi, ib);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +191,8 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
-  poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  using coeff_type = poly_int_traits<poly_int64>::coeff_type;
+  return poly_int_read_common<coeff_type> (streamer_read_hwi, ib);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc
index 346b294c72a..1e4141901bc 100644
--- a/gcc/data-streamer.cc
+++ b/gcc/data-streamer.cc
@@ -28,6 +28,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "data-streamer.h"
 
+/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
+   stored at beginning of mode_table. While streaming-in, the value is read in
+   host_num_poly_int_coeffs.  */
+
+#ifdef ACCEL_COMPILER
+unsigned host_num_poly_int_coeffs = 0;
+#endif
+
 /* Pack WORK into BP in a variant of uleb format.  */
 
 void
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..b3dc4b98476 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const REAL_VALUE_TYPE *);
 void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *);
 unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *);
 HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *);
+extern unsigned host_num_poly_int_coeffs;
 
 /* In data-streamer-out.cc  */
 void streamer_write_zero (struct output_block *);
@@ -194,15 +195,55 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+/* Common code for reading poly_int.  */
+
+template<typename C, typename F, typename ...Args>
+poly_int<NUM_POLY_INT_COEFFS, C>
+poly_int_read_common (F read_coeff, Args ...args)
+{
+  poly_int<NUM_POLY_INT_COEFFS, C> x;
+  unsigned i;
+
+#ifdef ACCEL_COMPILER
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
+  gcc_assert (host_num_poly_int_coeffs > 0);
+#else
+  const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  if (num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < num_poly_int_coeffs; i++)
+	x.coeffs[i] = read_coeff (args...);
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = 0;
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = read_coeff (args...);
+
+      /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
+      for (; i < num_poly_int_coeffs; i++)
+	{
+	  C val = read_coeff (args...);
+	  if (val != 0)
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%> (%d)",
+			 NUM_POLY_INT_COEFFS);
+	}
+    }
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
-  poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+  return poly_int_read_common<bitpack_word_t> (bp_unpack_value, bp, nbits);
 }
 
 
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..cbf6041fd68 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -2013,6 +2013,11 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
 				header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack (&ib);
 
+#ifdef ACCEL_COMPILER
+  host_num_poly_int_coeffs
+    = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+#endif
+
   unsigned mode_bits = bp_unpack_value (&bp, 5);
   unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits);
 
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index c329ac8af95..523d6dad221 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3192,6 +3192,9 @@ lto_write_mode_table (void)
   ob = create_output_block (LTO_section_mode_table);
   bitpack_d bp = bitpack_create (ob->main_stream);
 
+  if (lto_stream_offload_p)
+    bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   /* Ensure that for GET_MODE_INNER (m) != m we have
      also the inner mode marked.  */
   for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..94708165961 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS	2
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..40029437199 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,37 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+#ifdef ACCEL_COMPILER
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
+  gcc_assert (num_poly_int_coeffs > 0);
+#else
+  const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  unsigned i;
+  if (num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < num_poly_int_coeffs; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+      tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+      for (; i < num_poly_int_coeffs; i++)
+	{
+	  tree val = stream_read_tree_ref (ib, data_in);
+	  if (!integer_zerop (val))
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%>");
+	}
+    }
 }
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-05 14:24                             ` Prathamesh Kulkarni
@ 2024-08-05 14:30                               ` Jakub Jelinek
  2024-08-07 17:53                                 ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Jelinek @ 2024-08-05 14:30 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Mon, Aug 05, 2024 at 02:24:00PM +0000, Prathamesh Kulkarni wrote:
> gcc/ChangeLog:
> 	PR ipa/96265
> 	PR ipa/111937
> 	* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
> 	streaming, and call poly_int_read_common instead. 
> 	(streamer_read_poly_int64): Likewise.
> 	* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
> 	new variable if ACCEL_COMPILER is defined.
> 	* data-streamer.h (host_num_poly_int_coeffs): Declare.
> 	(poly_int_read_common): New function template.
> 	(bp_unpack_poly_value): Remove code for streaming and call
> 	poly_int_read_common instead.
> 	* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
> 	NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
> 	is defined.
> 	* lto-streamer-out.cc (lto_write_mode_table): Stream out
> 	NUM_POLY_INT_COEFFS if offloading is enabled.
> 	* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
> 	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
> 	streaming-in of poly_int.
> 
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

> --- a/gcc/data-streamer.cc
> +++ b/gcc/data-streamer.cc
> @@ -28,6 +28,14 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cgraph.h"
>  #include "data-streamer.h"
>  
> +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
> +   stored at beginning of mode_table. While streaming-in, the value is read in

Two spaces after . rather than just one, and because of that move in on the
next line.

> +   host_num_poly_int_coeffs.  */

Otherwise LGTM.

	Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-05 14:30                               ` Jakub Jelinek
@ 2024-08-07 17:53                                 ` Prathamesh Kulkarni
  2024-08-07 17:56                                   ` Jakub Jelinek
  0 siblings, 1 reply; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-07 17:53 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]



> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, August 5, 2024 8:01 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Biener <rguenther@suse.de>; Richard Sandiford
> <richard.sandiford@arm.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Aug 05, 2024 at 02:24:00PM +0000, Prathamesh Kulkarni wrote:
> > gcc/ChangeLog:
> >       PR ipa/96265
> >       PR ipa/111937
> >       * data-streamer-in.cc (streamer_read_poly_uint64): Remove code
> for
> >       streaming, and call poly_int_read_common instead.
> >       (streamer_read_poly_int64): Likewise.
> >       * data-streamer.cc (host_num_poly_int_coeffs): Conditionally
> define
> >       new variable if ACCEL_COMPILER is defined.
> >       * data-streamer.h (host_num_poly_int_coeffs): Declare.
> >       (poly_int_read_common): New function template.
> >       (bp_unpack_poly_value): Remove code for streaming and call
> >       poly_int_read_common instead.
> >       * lto-streamer-in.cc (lto_input_mode_table): Stream-in host
> >       NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if
> ACCEL_COMPILER
> >       is defined.
> >       * lto-streamer-out.cc (lto_write_mode_table): Stream out
> >       NUM_POLY_INT_COEFFS if offloading is enabled.
> >       * poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
> >       * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers):
> Adjust
> >       streaming-in of poly_int.
> >
> > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> 
> > --- a/gcc/data-streamer.cc
> > +++ b/gcc/data-streamer.cc
> > @@ -28,6 +28,14 @@ along with GCC; see the file COPYING3.  If not
> see
> > #include "cgraph.h"
> >  #include "data-streamer.h"
> >
> > +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS
> is
> > +   stored at beginning of mode_table. While streaming-in, the value
> > +is read in
> 
> Two spaces after . rather than just one, and because of that move in
> on the next line.
> 
> > +   host_num_poly_int_coeffs.  */
> 
> Otherwise LGTM.
Thanks, I have adjusted the formatting of the comment and a typo in streamer_read_poly_uint64.
Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu, LTO bootstrap+test on x86_64-linux-gnu.
And doesn't seem to regress libgomp testing for x86_64 -> nvptx offloading (altho there were a few occurrences of flaky tests in results).
Is the patch OK to commit ?

Thanks,
Prathamesh
> 
>         Jakub


[-- Attachment #2: p-163-10.txt --]
[-- Type: text/plain, Size: 8905 bytes --]

Partially support streaming of poly_int for offloading.

When offloading is enabled, the patch streams out host
NUM_POLY_INT_COEFFS, and changes streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
	error ();
    }
}

gcc/ChangeLog:
	PR ipa/96265
	PR ipa/111937
	* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
	streaming, and call poly_int_read_common instead. 
	(streamer_read_poly_int64): Likewise.
	* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
	new variable if ACCEL_COMPILER is defined.
	* data-streamer.h (host_num_poly_int_coeffs): Declare.
	(poly_int_read_common): New function template.
	(bp_unpack_poly_value): Remove code for streaming and call
	poly_int_read_common instead.
	* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
	NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
	is defined.
	* lto-streamer-out.cc (lto_write_mode_table): Stream out
	NUM_POLY_INT_COEFFS if offloading is enabled.
	* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
	* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
	streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..07dbc5e2bc3 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
-  poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  using coeff_type = poly_int_traits<poly_uint64>::coeff_type;
+  return poly_int_read_common<coeff_type> (streamer_read_uhwi, ib);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +191,8 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
-  poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  using coeff_type = poly_int_traits<poly_int64>::coeff_type;
+  return poly_int_read_common<coeff_type> (streamer_read_hwi, ib);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer.cc b/gcc/data-streamer.cc
index 346b294c72a..896413e8d2b 100644
--- a/gcc/data-streamer.cc
+++ b/gcc/data-streamer.cc
@@ -28,6 +28,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "data-streamer.h"
 
+/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
+   stored at beginning of mode_table.  While streaming-in, the value is read
+   in host_num_poly_int_coeffs.  */
+
+#ifdef ACCEL_COMPILER
+unsigned host_num_poly_int_coeffs = 0;
+#endif
+
 /* Pack WORK into BP in a variant of uleb format.  */
 
 void
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..b3dc4b98476 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -50,6 +50,7 @@ void bp_pack_real_value (struct bitpack_d *, const REAL_VALUE_TYPE *);
 void bp_unpack_real_value (struct bitpack_d *, REAL_VALUE_TYPE *);
 unsigned HOST_WIDE_INT bp_unpack_var_len_unsigned (struct bitpack_d *);
 HOST_WIDE_INT bp_unpack_var_len_int (struct bitpack_d *);
+extern unsigned host_num_poly_int_coeffs;
 
 /* In data-streamer-out.cc  */
 void streamer_write_zero (struct output_block *);
@@ -194,15 +195,55 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+/* Common code for reading poly_int.  */
+
+template<typename C, typename F, typename ...Args>
+poly_int<NUM_POLY_INT_COEFFS, C>
+poly_int_read_common (F read_coeff, Args ...args)
+{
+  poly_int<NUM_POLY_INT_COEFFS, C> x;
+  unsigned i;
+
+#ifdef ACCEL_COMPILER
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
+  gcc_assert (host_num_poly_int_coeffs > 0);
+#else
+  const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  if (num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < num_poly_int_coeffs; i++)
+	x.coeffs[i] = read_coeff (args...);
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = 0;
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	x.coeffs[i] = read_coeff (args...);
+
+      /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
+      for (; i < num_poly_int_coeffs; i++)
+	{
+	  C val = read_coeff (args...);
+	  if (val != 0)
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%> (%d)",
+			 NUM_POLY_INT_COEFFS);
+	}
+    }
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
-  poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+  return poly_int_read_common<bitpack_word_t> (bp_unpack_value, bp, nbits);
 }
 
 
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..cbf6041fd68 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -2013,6 +2013,11 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
 				header->string_size, vNULL);
   bitpack_d bp = streamer_read_bitpack (&ib);
 
+#ifdef ACCEL_COMPILER
+  host_num_poly_int_coeffs
+    = bp_unpack_value (&bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+#endif
+
   unsigned mode_bits = bp_unpack_value (&bp, 5);
   unsigned char *table = ggc_cleared_vec_alloc<unsigned char> (1 << mode_bits);
 
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index c329ac8af95..523d6dad221 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -3192,6 +3192,9 @@ lto_write_mode_table (void)
   ob = create_output_block (LTO_section_mode_table);
   bitpack_d bp = bitpack_create (ob->main_stream);
 
+  if (lto_stream_offload_p)
+    bp_pack_value (&bp, NUM_POLY_INT_COEFFS, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   /* Ensure that for GET_MODE_INNER (m) != m we have
      also the inner mode marked.  */
   for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..94708165961 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS	2
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..40029437199 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,37 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
 				 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-    POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+#ifdef ACCEL_COMPILER
+  /* Ensure that we have streamed-in host_num_poly_int_coeffs.  */
+  const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
+  gcc_assert (num_poly_int_coeffs > 0);
+#else
+  const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS;
+#endif
+
+  unsigned i;
+  if (num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
+    {
+      for (i = 0; i < num_poly_int_coeffs; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+      tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+      for (; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
+    }
+  else
+    {
+      for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+      for (; i < num_poly_int_coeffs; i++)
+	{
+	  tree val = stream_read_tree_ref (ib, data_in);
+	  if (!integer_zerop (val))
+	    fatal_error (input_location,
+			 "degree of %<poly_int%> exceeds "
+			 "%<NUM_POLY_INT_COEFFS%>");
+	}
+    }
 }
 
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-07 17:53                                 ` Prathamesh Kulkarni
@ 2024-08-07 17:56                                   ` Jakub Jelinek
  2024-08-07 18:26                                     ` Prathamesh Kulkarni
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Jelinek @ 2024-08-07 17:56 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Wed, Aug 07, 2024 at 05:53:00PM +0000, Prathamesh Kulkarni wrote:
> > Two spaces after . rather than just one, and because of that move in
> > on the next line.
> > 
> > > +   host_num_poly_int_coeffs.  */
> > 
> > Otherwise LGTM.
> Thanks, I have adjusted the formatting of the comment and a typo in streamer_read_poly_uint64.
> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu, LTO bootstrap+test on x86_64-linux-gnu.
> And doesn't seem to regress libgomp testing for x86_64 -> nvptx offloading (altho there were a few occurrences of flaky tests in results).
> Is the patch OK to commit ?

The "Otherwise LGTM." was already covering that.
So yes.

	Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
  2024-08-07 17:56                                   ` Jakub Jelinek
@ 2024-08-07 18:26                                     ` Prathamesh Kulkarni
  0 siblings, 0 replies; 23+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-07 18:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Richard Sandiford, gcc-patches



> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Wednesday, August 7, 2024 11:27 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Biener <rguenther@suse.de>; Richard Sandiford
> <richard.sandiford@arm.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Aug 07, 2024 at 05:53:00PM +0000, Prathamesh Kulkarni wrote:
> > > Two spaces after . rather than just one, and because of that move
> in
> > > on the next line.
> > >
> > > > +   host_num_poly_int_coeffs.  */
> > >
> > > Otherwise LGTM.
> > Thanks, I have adjusted the formatting of the comment and a typo in
> streamer_read_poly_uint64.
> > Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-
> gnu, LTO bootstrap+test on x86_64-linux-gnu.
> > And doesn't seem to regress libgomp testing for x86_64 -> nvptx
> offloading (altho there were a few occurrences of flaky tests in
> results).
> > Is the patch OK to commit ?
> 
> The "Otherwise LGTM." was already covering that.
> So yes.
Thanks, pushed to trunk in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=38900247f3880d6eca2e364a000e5898f8deae64

Thanks,
Prathamesh
> 
>         Jakub


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-08-07 18:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-29 10:13 Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS Prathamesh Kulkarni
2024-07-29 11:29 ` Richard Biener
2024-07-29 16:13   ` Richard Sandiford
2024-07-30  6:21     ` Prathamesh Kulkarni
2024-07-30  7:33       ` Richard Biener
2024-07-30  9:01         ` Richard Sandiford
2024-07-30  9:07           ` Richard Biener
2024-07-30  9:17             ` Richard Sandiford
2024-07-30  9:25               ` Richard Biener
2024-07-30  9:45                 ` Jakub Jelinek
2024-07-30 11:07                   ` Richard Sandiford
2024-07-30 11:14                   ` Prathamesh Kulkarni
2024-07-31 14:58                     ` Prathamesh Kulkarni
2024-07-31 15:15                       ` Jakub Jelinek
2024-08-02 11:58                         ` Prathamesh Kulkarni
2024-08-02 12:13                           ` Jakub Jelinek
2024-08-05 14:24                             ` Prathamesh Kulkarni
2024-08-05 14:30                               ` Jakub Jelinek
2024-08-07 17:53                                 ` Prathamesh Kulkarni
2024-08-07 17:56                                   ` Jakub Jelinek
2024-08-07 18:26                                     ` Prathamesh Kulkarni
2024-07-30 12:37 ` Tobias Burnus
2024-07-31 15:01   ` Prathamesh Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).