* Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
@ 2024-08-19 8:22 Prathamesh Kulkarni
2024-08-19 13:29 ` Richard Biener
2024-08-19 18:55 ` Richard Sandiford
0 siblings, 2 replies; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-19 8:22 UTC (permalink / raw)
To: rguenther, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]
Hi Richard,
As mentioned in RFC email, for the following test:
int main()
{
long c[4];
#pragma omp target map(c)
c[0] = 0;
return 0;
}
Compiling for AArch64 host with -O2 -fopenmp -foffload=nvptx-none results in:
lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported (mode 'OI') compilation terminated.
nvptx mkoffload: fatal error: ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated.
This happens because AArch64 uses OImode for ARRAY_TYPE whose size fits 256-bits, which is not supported on nvptx, and thus
emits the above diagnostic.
Following your suggestion, the attached patch streams out VOIDmode from host for TYPE_MODE and DECL_MODE for aggregate types
with offloading enabled, and while streaming-in on accel side, it recomputes TYPE_MODE and DECL_MODE, which fixes the issue.
Patch survives AArch64->nvptx offload testing for libgomp and bootstrap+test on aarch64-linux-gnu.
Does the patch look in the right direction ?
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks,
Prathamesh
[-- Attachment #2: p-166-3.txt --]
[-- Type: text/plain, Size: 8202 bytes --]
Recompute TYPE_MODE and DECL_MODE for aggregate type for acclerator.
The patch streams out VOIDmode for aggregate types with offloading enabled,
and recomputes appropriate TYPE_MODE and DECL_MODE while streaming-in on accel
side. The rationale for this change is to avoid streaming out host-specific
modes that may be used for aggregate types, which may not be representable on
the accelerator. For eg, AArch64 uses OImode for ARRAY_TYPE whose size is 256-bits,
and nvptx doesn't have OImode, and thus ends up emitting an error from
lto_input_mode_table.
gcc/ChangeLog:
* lto-streamer-in.cc: Include stor-layout.h.
(lto_read_tree_1): Call relayout_decl if
offloading is enabled.
* stor-layout.cc (layout_type): Move computation of mode for
ARRAY_TYPE from ...
(compute_array_mode): ... to here.
* stor-layout.h (compute_array_mode): Declare.
* tree-streamer-in.cc: Include stor-layout.h.
(unpack_ts_common_value_fields): Call compute_array_mode if offloading
is enabled.
* tree-streamer-out.cc (pack_ts_fixed_cst_value_fields): Stream out
VOIDmode if decl has aggregate type and offloading is enabled.
(pack_ts_type_common_value_fields): Stream out VOIDmode for aggregate
type if offloading is enabled.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index cbf6041fd68..0420183faf8 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see
#include "debug.h"
#include "alloc-pool.h"
#include "toplev.h"
+#include "stor-layout.h"
/* Allocator used to hold string slot entries for line map streaming. */
static struct object_allocator<struct string_slot> *string_slot_allocator;
@@ -1752,6 +1753,17 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
with -g1, see for example PR113488. */
else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
+
+#ifdef ACCEL_COMPILER
+ /* For decl with aggregate type, host streams out VOIDmode.
+ Compute the correct DECL_MODE by calling relayout_decl. */
+ if ((VAR_P (expr)
+ || TREE_CODE (expr) == PARM_DECL
+ || TREE_CODE (expr) == FIELD_DECL)
+ && AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ && DECL_MODE (expr) == VOIDmode)
+ relayout_decl (expr);
+#endif
}
}
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 10c0809914c..0ff8bd1171e 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char *name, tree fields,
layout_decl (TYPE_NAME (type), 0);
}
+/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
+
+void compute_array_mode (tree type)
+{
+ gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
+ SET_TYPE_MODE (type, BLKmode);
+ if (TYPE_SIZE (type) != 0
+ && ! targetm.member_type_forces_blk (type, VOIDmode)
+ /* BLKmode elements force BLKmode aggregate;
+ else extract/store fields may lose. */
+ && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
+ || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
+ {
+ SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+ TYPE_SIZE (type)));
+ if (TYPE_MODE (type) != BLKmode
+ && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
+ && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
+ {
+ TYPE_NO_FORCE_BLK (type) = 1;
+ SET_TYPE_MODE (type, BLKmode);
+ }
+ }
+}
+
/* Calculate the mode, size, and alignment for TYPE.
For an array type, calculate the element separation as well.
Record TYPE on the chain of permanent or temporary types
@@ -2709,24 +2735,7 @@ layout_type (tree type)
align = MAX (align, BITS_PER_UNIT);
#endif
SET_TYPE_ALIGN (type, align);
- SET_TYPE_MODE (type, BLKmode);
- if (TYPE_SIZE (type) != 0
- && ! targetm.member_type_forces_blk (type, VOIDmode)
- /* BLKmode elements force BLKmode aggregate;
- else extract/store fields may lose. */
- && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
- || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
- {
- SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
- TYPE_SIZE (type)));
- if (TYPE_MODE (type) != BLKmode
- && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
- && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
- {
- TYPE_NO_FORCE_BLK (type) = 1;
- SET_TYPE_MODE (type, BLKmode);
- }
- }
+ compute_array_mode (type);
if (AGGREGATE_TYPE_P (element))
TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE (element);
/* When the element size is constant, check that it is at least as
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 096ca811762..9d9b8c385f6 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
extern void normalize_rli (record_layout_info);
extern void place_field (record_layout_info, tree);
extern void compute_record_mode (tree);
+extern void compute_array_mode (tree);
extern void finish_bitfield_layout (tree);
extern void finish_record_layout (record_layout_info, int);
extern void finalize_size_functions (void);
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 40029437199..329d218e7d4 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
#include "attribs.h"
#include "asan.h"
#include "opts.h"
+#include "stor-layout.h"
/* Read a STRING_CST from the string table in DATA_IN using input
@@ -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
#ifdef ACCEL_COMPILER
if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
+
+ /* Host streams out VOIDmode for aggregate type. */
+ if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
+ {
+ if (TREE_CODE (expr) == ARRAY_TYPE)
+ compute_array_mode (expr);
+ else if (RECORD_OR_UNION_TYPE_P (expr))
+ compute_record_mode (expr);
+ else
+ gcc_unreachable ();
+ }
#endif
}
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..7de4447a1b5 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
static void
pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
{
- bp_pack_machine_mode (bp, DECL_MODE (expr));
+ /* Similar to TYPE_MODE, avoid streaming out host-specific DECL_MODE
+ for aggregate type with offloading enabled, and while streaming-in
+ recompute appropriate DECL_MODE for accelerator. */
+ if (lto_stream_offload_p
+ && (VAR_P (expr)
+ || TREE_CODE (expr) == PARM_DECL
+ || TREE_CODE (expr) == FIELD_DECL)
+ && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
+ bp_pack_machine_mode (bp, VOIDmode);
+ else
+ bp_pack_machine_mode (bp, DECL_MODE (expr));
bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -317,10 +327,18 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr)
static void
pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
{
+ /* For offloading, avoid streaming out TYPE_MODE for aggregate type since
+ it may be host-specific. For eg, aarch64 uses OImode for ARRAY_TYPE
+ whose size is 256-bits, which is not representable on accelerator.
+ Instead stream out VOIDmode, and while streaming-in, recompute
+ appropriate TYPE_MODE for accelerator. */
+ if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
+ bp_pack_machine_mode (bp, VOIDmode);
/* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
not necessary valid in a global context.
Use the raw value previously set by layout_type. */
- bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
+ else
+ bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
/* TYPE_NO_FORCE_BLK is private to stor-layout and need
no streaming. */
bp_pack_value (bp, TYPE_PACKED (expr), 1);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-19 8:22 Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator Prathamesh Kulkarni
@ 2024-08-19 13:29 ` Richard Biener
2024-08-19 18:55 ` Richard Sandiford
1 sibling, 0 replies; 15+ messages in thread
From: Richard Biener @ 2024-08-19 13:29 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Thomas Schwinge, gcc-patches
On Mon, 19 Aug 2024, Prathamesh Kulkarni wrote:
> Hi Richard,
> As mentioned in RFC email, for the following test:
>
> int main()
> {
> long c[4];
> #pragma omp target map(c)
> c[0] = 0;
> return 0;
> }
>
> Compiling for AArch64 host with -O2 -fopenmp -foffload=nvptx-none results in:
> lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported (mode 'OI') compilation terminated.
> nvptx mkoffload: fatal error: ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated.
>
> This happens because AArch64 uses OImode for ARRAY_TYPE whose size fits 256-bits, which is not supported on nvptx, and thus
> emits the above diagnostic.
>
> Following your suggestion, the attached patch streams out VOIDmode from host for TYPE_MODE and DECL_MODE for aggregate types
> with offloading enabled, and while streaming-in on accel side, it recomputes TYPE_MODE and DECL_MODE, which fixes the issue.
> Patch survives AArch64->nvptx offload testing for libgomp and bootstrap+test on aarch64-linux-gnu.
>
> Does the patch look in the right direction ?
+/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
+
+void compute_array_mode (tree type)
+{
newline after 'void'
Otherwise LGTM, please leave time for others to comment though.
Thanks,
Richard.
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>
> Thanks,
> Prathamesh
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-19 8:22 Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator Prathamesh Kulkarni
2024-08-19 13:29 ` Richard Biener
@ 2024-08-19 18:55 ` Richard Sandiford
2024-08-20 5:06 ` Richard Biener
1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2024-08-19 18:55 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: rguenther, Thomas Schwinge, gcc-patches
Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index cbf6041fd68..0420183faf8 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see
> #include "debug.h"
> #include "alloc-pool.h"
> #include "toplev.h"
> +#include "stor-layout.h"
>
> /* Allocator used to hold string slot entries for line map streaming. */
> static struct object_allocator<struct string_slot> *string_slot_allocator;
> @@ -1752,6 +1753,17 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
> with -g1, see for example PR113488. */
> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> +
> +#ifdef ACCEL_COMPILER
> + /* For decl with aggregate type, host streams out VOIDmode.
> + Compute the correct DECL_MODE by calling relayout_decl. */
> + if ((VAR_P (expr)
> + || TREE_CODE (expr) == PARM_DECL
> + || TREE_CODE (expr) == FIELD_DECL)
> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> + && DECL_MODE (expr) == VOIDmode)
> + relayout_decl (expr);
> +#endif
Genuine question, but: is relayout_decl safe in this context? It does
a lot more than just reset the mode. It also applies the target ABI's
preferences wrt alignment, padding, and so on, rather than preserving
those of the host's.
Thanks,
Richard
> }
> }
>
> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
> index 10c0809914c..0ff8bd1171e 100644
> --- a/gcc/stor-layout.cc
> +++ b/gcc/stor-layout.cc
> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char *name, tree fields,
> layout_decl (TYPE_NAME (type), 0);
> }
>
> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> +
> +void compute_array_mode (tree type)
> +{
> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> +
> + SET_TYPE_MODE (type, BLKmode);
> + if (TYPE_SIZE (type) != 0
> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> + /* BLKmode elements force BLKmode aggregate;
> + else extract/store fields may lose. */
> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> + {
> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> + TYPE_SIZE (type)));
> + if (TYPE_MODE (type) != BLKmode
> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> + {
> + TYPE_NO_FORCE_BLK (type) = 1;
> + SET_TYPE_MODE (type, BLKmode);
> + }
> + }
> +}
> +
> /* Calculate the mode, size, and alignment for TYPE.
> For an array type, calculate the element separation as well.
> Record TYPE on the chain of permanent or temporary types
> @@ -2709,24 +2735,7 @@ layout_type (tree type)
> align = MAX (align, BITS_PER_UNIT);
> #endif
> SET_TYPE_ALIGN (type, align);
> - SET_TYPE_MODE (type, BLKmode);
> - if (TYPE_SIZE (type) != 0
> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> - /* BLKmode elements force BLKmode aggregate;
> - else extract/store fields may lose. */
> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> - {
> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> - TYPE_SIZE (type)));
> - if (TYPE_MODE (type) != BLKmode
> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> - {
> - TYPE_NO_FORCE_BLK (type) = 1;
> - SET_TYPE_MODE (type, BLKmode);
> - }
> - }
> + compute_array_mode (type);
> if (AGGREGATE_TYPE_P (element))
> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE (element);
> /* When the element size is constant, check that it is at least as
> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
> index 096ca811762..9d9b8c385f6 100644
> --- a/gcc/stor-layout.h
> +++ b/gcc/stor-layout.h
> @@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
> extern void normalize_rli (record_layout_info);
> extern void place_field (record_layout_info, tree);
> extern void compute_record_mode (tree);
> +extern void compute_array_mode (tree);
> extern void finish_bitfield_layout (tree);
> extern void finish_record_layout (record_layout_info, int);
> extern void finalize_size_functions (void);
> diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
> index 40029437199..329d218e7d4 100644
> --- a/gcc/tree-streamer-in.cc
> +++ b/gcc/tree-streamer-in.cc
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
> #include "attribs.h"
> #include "asan.h"
> #include "opts.h"
> +#include "stor-layout.h"
>
>
> /* Read a STRING_CST from the string table in DATA_IN using input
> @@ -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
> #ifdef ACCEL_COMPILER
> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
> +
> + /* Host streams out VOIDmode for aggregate type. */
> + if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> + {
> + if (TREE_CODE (expr) == ARRAY_TYPE)
> + compute_array_mode (expr);
> + else if (RECORD_OR_UNION_TYPE_P (expr))
> + compute_record_mode (expr);
> + else
> + gcc_unreachable ();
> + }
> #endif
> }
>
> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
> index b7205287ffb..7de4447a1b5 100644
> --- a/gcc/tree-streamer-out.cc
> +++ b/gcc/tree-streamer-out.cc
> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
> static void
> pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
> {
> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> + /* Similar to TYPE_MODE, avoid streaming out host-specific DECL_MODE
> + for aggregate type with offloading enabled, and while streaming-in
> + recompute appropriate DECL_MODE for accelerator. */
> + if (lto_stream_offload_p
> + && (VAR_P (expr)
> + || TREE_CODE (expr) == PARM_DECL
> + || TREE_CODE (expr) == FIELD_DECL)
> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> + bp_pack_machine_mode (bp, VOIDmode);
> + else
> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
> @@ -317,10 +327,18 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr)
> static void
> pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
> {
> + /* For offloading, avoid streaming out TYPE_MODE for aggregate type since
> + it may be host-specific. For eg, aarch64 uses OImode for ARRAY_TYPE
> + whose size is 256-bits, which is not representable on accelerator.
> + Instead stream out VOIDmode, and while streaming-in, recompute
> + appropriate TYPE_MODE for accelerator. */
> + if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> + bp_pack_machine_mode (bp, VOIDmode);
> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
> not necessary valid in a global context.
> Use the raw value previously set by layout_type. */
> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> + else
> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> no streaming. */
> bp_pack_value (bp, TYPE_PACKED (expr), 1);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-19 18:55 ` Richard Sandiford
@ 2024-08-20 5:06 ` Richard Biener
2024-08-21 11:04 ` Prathamesh Kulkarni
0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2024-08-20 5:06 UTC (permalink / raw)
To: Richard Sandiford; +Cc: Prathamesh Kulkarni, Thomas Schwinge, gcc-patches
> Am 19.08.2024 um 20:56 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>
> Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
>> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
>> index cbf6041fd68..0420183faf8 100644
>> --- a/gcc/lto-streamer-in.cc
>> +++ b/gcc/lto-streamer-in.cc
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "debug.h"
>> #include "alloc-pool.h"
>> #include "toplev.h"
>> +#include "stor-layout.h"
>>
>> /* Allocator used to hold string slot entries for line map streaming. */
>> static struct object_allocator<struct string_slot> *string_slot_allocator;
>> @@ -1752,6 +1753,17 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
>> with -g1, see for example PR113488. */
>> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
>> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
>> +
>> +#ifdef ACCEL_COMPILER
>> + /* For decl with aggregate type, host streams out VOIDmode.
>> + Compute the correct DECL_MODE by calling relayout_decl. */
>> + if ((VAR_P (expr)
>> + || TREE_CODE (expr) == PARM_DECL
>> + || TREE_CODE (expr) == FIELD_DECL)
>> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
>> + && DECL_MODE (expr) == VOIDmode)
>> + relayout_decl (expr);
>> +#endif
>
> Genuine question, but: is relayout_decl safe in this context? It does
> a lot more than just reset the mode. It also applies the target ABI's
> preferences wrt alignment, padding, and so on, rather than preserving
> those of the host's.
It would be better to just recompute the mode here.
Richard
> Thanks,
> Richard
>
>
>> }
>> }
>>
>> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
>> index 10c0809914c..0ff8bd1171e 100644
>> --- a/gcc/stor-layout.cc
>> +++ b/gcc/stor-layout.cc
>> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char *name, tree fields,
>> layout_decl (TYPE_NAME (type), 0);
>> }
>>
>> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
>> +
>> +void compute_array_mode (tree type)
>> +{
>> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
>> +
>> + SET_TYPE_MODE (type, BLKmode);
>> + if (TYPE_SIZE (type) != 0
>> + && ! targetm.member_type_forces_blk (type, VOIDmode)
>> + /* BLKmode elements force BLKmode aggregate;
>> + else extract/store fields may lose. */
>> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
>> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
>> + {
>> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
>> + TYPE_SIZE (type)));
>> + if (TYPE_MODE (type) != BLKmode
>> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
>> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
>> + {
>> + TYPE_NO_FORCE_BLK (type) = 1;
>> + SET_TYPE_MODE (type, BLKmode);
>> + }
>> + }
>> +}
>> +
>> /* Calculate the mode, size, and alignment for TYPE.
>> For an array type, calculate the element separation as well.
>> Record TYPE on the chain of permanent or temporary types
>> @@ -2709,24 +2735,7 @@ layout_type (tree type)
>> align = MAX (align, BITS_PER_UNIT);
>> #endif
>> SET_TYPE_ALIGN (type, align);
>> - SET_TYPE_MODE (type, BLKmode);
>> - if (TYPE_SIZE (type) != 0
>> - && ! targetm.member_type_forces_blk (type, VOIDmode)
>> - /* BLKmode elements force BLKmode aggregate;
>> - else extract/store fields may lose. */
>> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
>> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
>> - {
>> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
>> - TYPE_SIZE (type)));
>> - if (TYPE_MODE (type) != BLKmode
>> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
>> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
>> - {
>> - TYPE_NO_FORCE_BLK (type) = 1;
>> - SET_TYPE_MODE (type, BLKmode);
>> - }
>> - }
>> + compute_array_mode (type);
>> if (AGGREGATE_TYPE_P (element))
>> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE (element);
>> /* When the element size is constant, check that it is at least as
>> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
>> index 096ca811762..9d9b8c385f6 100644
>> --- a/gcc/stor-layout.h
>> +++ b/gcc/stor-layout.h
>> @@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
>> extern void normalize_rli (record_layout_info);
>> extern void place_field (record_layout_info, tree);
>> extern void compute_record_mode (tree);
>> +extern void compute_array_mode (tree);
>> extern void finish_bitfield_layout (tree);
>> extern void finish_record_layout (record_layout_info, int);
>> extern void finalize_size_functions (void);
>> diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
>> index 40029437199..329d218e7d4 100644
>> --- a/gcc/tree-streamer-in.cc
>> +++ b/gcc/tree-streamer-in.cc
>> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "attribs.h"
>> #include "asan.h"
>> #include "opts.h"
>> +#include "stor-layout.h"
>>
>>
>> /* Read a STRING_CST from the string table in DATA_IN using input
>> @@ -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
>> #ifdef ACCEL_COMPILER
>> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
>> SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
>> +
>> + /* Host streams out VOIDmode for aggregate type. */
>> + if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
>> + {
>> + if (TREE_CODE (expr) == ARRAY_TYPE)
>> + compute_array_mode (expr);
>> + else if (RECORD_OR_UNION_TYPE_P (expr))
>> + compute_record_mode (expr);
>> + else
>> + gcc_unreachable ();
>> + }
>> #endif
>> }
>>
>> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
>> index b7205287ffb..7de4447a1b5 100644
>> --- a/gcc/tree-streamer-out.cc
>> +++ b/gcc/tree-streamer-out.cc
>> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
>> static void
>> pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
>> {
>> - bp_pack_machine_mode (bp, DECL_MODE (expr));
>> + /* Similar to TYPE_MODE, avoid streaming out host-specific DECL_MODE
>> + for aggregate type with offloading enabled, and while streaming-in
>> + recompute appropriate DECL_MODE for accelerator. */
>> + if (lto_stream_offload_p
>> + && (VAR_P (expr)
>> + || TREE_CODE (expr) == PARM_DECL
>> + || TREE_CODE (expr) == FIELD_DECL)
>> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
>> + bp_pack_machine_mode (bp, VOIDmode);
>> + else
>> + bp_pack_machine_mode (bp, DECL_MODE (expr));
>> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
>> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
>> bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
>> @@ -317,10 +327,18 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr)
>> static void
>> pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
>> {
>> + /* For offloading, avoid streaming out TYPE_MODE for aggregate type since
>> + it may be host-specific. For eg, aarch64 uses OImode for ARRAY_TYPE
>> + whose size is 256-bits, which is not representable on accelerator.
>> + Instead stream out VOIDmode, and while streaming-in, recompute
>> + appropriate TYPE_MODE for accelerator. */
>> + if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
>> + bp_pack_machine_mode (bp, VOIDmode);
>> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
>> not necessary valid in a global context.
>> Use the raw value previously set by layout_type. */
>> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
>> + else
>> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
>> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
>> no streaming. */
>> bp_pack_value (bp, TYPE_PACKED (expr), 1);
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-20 5:06 ` Richard Biener
@ 2024-08-21 11:04 ` Prathamesh Kulkarni
2024-08-21 11:39 ` Richard Biener
0 siblings, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-21 11:04 UTC (permalink / raw)
To: Richard Biener, Richard Sandiford; +Cc: Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 9985 bytes --]
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, August 20, 2024 10:36 AM
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas Schwinge
> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> <richard.sandiford@arm.com>:
> >
> > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index
> >> cbf6041fd68..0420183faf8 100644
> >> --- a/gcc/lto-streamer-in.cc
> >> +++ b/gcc/lto-streamer-in.cc
> >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not
> see
> >> #include "debug.h"
> >> #include "alloc-pool.h"
> >> #include "toplev.h"
> >> +#include "stor-layout.h"
> >>
> >> /* Allocator used to hold string slot entries for line map
> streaming.
> >> */ static struct object_allocator<struct string_slot>
> >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
> (class lto_input_block *ib, class data_in *data_in, tree expr)
> >> with -g1, see for example PR113488. */
> >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
> expr)
> >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> >> +
> >> +#ifdef ACCEL_COMPILER
> >> + /* For decl with aggregate type, host streams out VOIDmode.
> >> + Compute the correct DECL_MODE by calling relayout_decl. */
> >> + if ((VAR_P (expr)
> >> + || TREE_CODE (expr) == PARM_DECL
> >> + || TREE_CODE (expr) == FIELD_DECL)
> >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> >> + && DECL_MODE (expr) == VOIDmode)
> >> + relayout_decl (expr);
> >> +#endif
> >
> > Genuine question, but: is relayout_decl safe in this context? It
> does
> > a lot more than just reset the mode. It also applies the target
> ABI's
> > preferences wrt alignment, padding, and so on, rather than
> preserving
> > those of the host's.
>
> It would be better to just recompute the mode here.
Hi,
The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
I checked layout_decl_type does the same thing for setting decl mode, except for bit fields. Since bit-fields cannot have
aggregate type, I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) would be OK in this case ?
Sorry if this sounds like a silly ques -- Why would it be unsafe to call relayout_decl for variables that are mapped to accelerator even
if it'd not preserve host's properties ? I assumed we want to assign accel's ABI properties for mapped decls (mode being one of them),
or am I misunderstanding ?
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks,
Prathamesh
>
> Richard
>
> > Thanks,
> > Richard
> >
> >
> >> }
> >> }
> >>
> >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> >> 10c0809914c..0ff8bd1171e 100644
> >> --- a/gcc/stor-layout.cc
> >> +++ b/gcc/stor-layout.cc
> >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char
> *name, tree fields,
> >> layout_decl (TYPE_NAME (type), 0);
> >> }
> >>
> >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> >> +
> >> +void compute_array_mode (tree type)
> >> +{
> >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> >> +
> >> + SET_TYPE_MODE (type, BLKmode);
> >> + if (TYPE_SIZE (type) != 0
> >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> >> + /* BLKmode elements force BLKmode aggregate;
> >> + else extract/store fields may lose. */
> >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> >> + {
> >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> >> + TYPE_SIZE (type)));
> >> + if (TYPE_MODE (type) != BLKmode
> >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
> >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> (type)))
> >> + {
> >> + TYPE_NO_FORCE_BLK (type) = 1;
> >> + SET_TYPE_MODE (type, BLKmode);
> >> + }
> >> + }
> >> +}
> >> +
> >> /* Calculate the mode, size, and alignment for TYPE.
> >> For an array type, calculate the element separation as well.
> >> Record TYPE on the chain of permanent or temporary types @@
> >> -2709,24 +2735,7 @@ layout_type (tree type)
> >> align = MAX (align, BITS_PER_UNIT); #endif
> >> SET_TYPE_ALIGN (type, align);
> >> - SET_TYPE_MODE (type, BLKmode);
> >> - if (TYPE_SIZE (type) != 0
> >> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> >> - /* BLKmode elements force BLKmode aggregate;
> >> - else extract/store fields may lose. */
> >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> >> - {
> >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> >> - TYPE_SIZE (type)));
> >> - if (TYPE_MODE (type) != BLKmode
> >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> BIGGEST_ALIGNMENT
> >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> (type)))
> >> - {
> >> - TYPE_NO_FORCE_BLK (type) = 1;
> >> - SET_TYPE_MODE (type, BLKmode);
> >> - }
> >> - }
> >> + compute_array_mode (type);
> >> if (AGGREGATE_TYPE_P (element))
> >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> (element);
> >> /* When the element size is constant, check that it is at least
> as
> >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> >> 096ca811762..9d9b8c385f6 100644
> >> --- a/gcc/stor-layout.h
> >> +++ b/gcc/stor-layout.h
> >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
> >> extern void normalize_rli (record_layout_info); extern void
> >> place_field (record_layout_info, tree); extern void
> >> compute_record_mode (tree);
> >> +extern void compute_array_mode (tree);
> >> extern void finish_bitfield_layout (tree); extern void
> >> finish_record_layout (record_layout_info, int); extern void
> >> finalize_size_functions (void); diff --git a/gcc/tree-streamer-
> in.cc
> >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4 100644
> >> --- a/gcc/tree-streamer-in.cc
> >> +++ b/gcc/tree-streamer-in.cc
> >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not
> see
> >> #include "attribs.h"
> >> #include "asan.h"
> >> #include "opts.h"
> >> +#include "stor-layout.h"
> >>
> >>
> >> /* Read a STRING_CST from the string table in DATA_IN using input
> @@
> >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct
> >> bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> >> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> >> SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
> >> +
> >> + /* Host streams out VOIDmode for aggregate type. */ if
> >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> >> + {
> >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> >> + compute_array_mode (expr);
> >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> >> + compute_record_mode (expr);
> >> + else
> >> + gcc_unreachable ();
> >> + }
> >> #endif
> >> }
> >>
> >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
> >> index b7205287ffb..7de4447a1b5 100644
> >> --- a/gcc/tree-streamer-out.cc
> >> +++ b/gcc/tree-streamer-out.cc
> >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> bitpack_d
> >> *bp, tree expr) static void pack_ts_decl_common_value_fields
> (struct
> >> bitpack_d *bp, tree expr) {
> >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> >> + /* Similar to TYPE_MODE, avoid streaming out host-specific
> DECL_MODE
> >> + for aggregate type with offloading enabled, and while
> streaming-in
> >> + recompute appropriate DECL_MODE for accelerator. */ if
> >> + (lto_stream_offload_p
> >> + && (VAR_P (expr)
> >> + || TREE_CODE (expr) == PARM_DECL
> >> + || TREE_CODE (expr) == FIELD_DECL)
> >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> >> + bp_pack_machine_mode (bp, VOIDmode); else
> >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10 +327,18
> @@
> >> pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree
> expr)
> >> static void pack_ts_type_common_value_fields (struct bitpack_d *bp,
> >> tree expr) {
> >> + /* For offloading, avoid streaming out TYPE_MODE for aggregate
> type since
> >> + it may be host-specific. For eg, aarch64 uses OImode for
> ARRAY_TYPE
> >> + whose size is 256-bits, which is not representable on
> accelerator.
> >> + Instead stream out VOIDmode, and while streaming-in,
> recompute
> >> + appropriate TYPE_MODE for accelerator. */ if
> >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> >> + bp_pack_machine_mode (bp, VOIDmode);
> >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> target_flags
> >> not necessary valid in a global context.
> >> Use the raw value previously set by layout_type. */
> >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> >> + else
> >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> >> no streaming. */
> >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
[-- Attachment #2: p-166-4.txt --]
[-- Type: text/plain, Size: 7838 bytes --]
Recompute TYPE_MODE and DECL_MODE for aggregate type for acclerator.
The patch streams out VOIDmode for aggregate types with offloading enabled,
and recomputes appropriate TYPE_MODE and DECL_MODE while streaming-in on accel
side. The rationale for this change is to avoid streaming out host-specific
modes that may be used for aggregate types, which may not be representable on
the accelerator. For eg, AArch64 uses OImode for ARRAY_TYPE whose size is 256-bits,
and nvptx doesn't have OImode, and thus ends up emitting an error from
lto_input_mode_table.
gcc/ChangeLog:
* lto-streamer-in.cc: (lto_read_tree_1): Set DECL_MODE (expr) to
TREE_TYPE (TYPE_MODE (expr)) if TREE_TYPE (expr) is aggregate type and
offloading is enabled.
* stor-layout.cc (layout_type): Move computation of mode for
ARRAY_TYPE from ...
(compute_array_mode): ... to here.
* stor-layout.h (compute_array_mode): Declare.
* tree-streamer-in.cc: Include stor-layout.h.
(unpack_ts_common_value_fields): Call compute_array_mode if offloading
is enabled.
* tree-streamer-out.cc (pack_ts_fixed_cst_value_fields): Stream out
VOIDmode if decl has aggregate type and offloading is enabled.
(pack_ts_type_common_value_fields): Stream out VOIDmode for aggregate
type if offloading is enabled.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index cbf6041fd68..64f75807328 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1752,6 +1752,15 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
with -g1, see for example PR113488. */
else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
+
+#ifdef ACCEL_COMPILER
+ if ((VAR_P (expr)
+ || TREE_CODE (expr) == PARM_DECL
+ || TREE_CODE (expr) == FIELD_DECL)
+ && AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ && DECL_MODE (expr) == VOIDmode)
+ SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
+#endif
}
}
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 10c0809914c..0ff8bd1171e 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char *name, tree fields,
layout_decl (TYPE_NAME (type), 0);
}
+/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
+
+void compute_array_mode (tree type)
+{
+ gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
+ SET_TYPE_MODE (type, BLKmode);
+ if (TYPE_SIZE (type) != 0
+ && ! targetm.member_type_forces_blk (type, VOIDmode)
+ /* BLKmode elements force BLKmode aggregate;
+ else extract/store fields may lose. */
+ && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
+ || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
+ {
+ SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+ TYPE_SIZE (type)));
+ if (TYPE_MODE (type) != BLKmode
+ && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
+ && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
+ {
+ TYPE_NO_FORCE_BLK (type) = 1;
+ SET_TYPE_MODE (type, BLKmode);
+ }
+ }
+}
+
/* Calculate the mode, size, and alignment for TYPE.
For an array type, calculate the element separation as well.
Record TYPE on the chain of permanent or temporary types
@@ -2709,24 +2735,7 @@ layout_type (tree type)
align = MAX (align, BITS_PER_UNIT);
#endif
SET_TYPE_ALIGN (type, align);
- SET_TYPE_MODE (type, BLKmode);
- if (TYPE_SIZE (type) != 0
- && ! targetm.member_type_forces_blk (type, VOIDmode)
- /* BLKmode elements force BLKmode aggregate;
- else extract/store fields may lose. */
- && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
- || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
- {
- SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
- TYPE_SIZE (type)));
- if (TYPE_MODE (type) != BLKmode
- && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
- && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
- {
- TYPE_NO_FORCE_BLK (type) = 1;
- SET_TYPE_MODE (type, BLKmode);
- }
- }
+ compute_array_mode (type);
if (AGGREGATE_TYPE_P (element))
TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE (element);
/* When the element size is constant, check that it is at least as
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 096ca811762..9d9b8c385f6 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
extern void normalize_rli (record_layout_info);
extern void place_field (record_layout_info, tree);
extern void compute_record_mode (tree);
+extern void compute_array_mode (tree);
extern void finish_bitfield_layout (tree);
extern void finish_record_layout (record_layout_info, int);
extern void finalize_size_functions (void);
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 40029437199..329d218e7d4 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
#include "attribs.h"
#include "asan.h"
#include "opts.h"
+#include "stor-layout.h"
/* Read a STRING_CST from the string table in DATA_IN using input
@@ -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
#ifdef ACCEL_COMPILER
if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
+
+ /* Host streams out VOIDmode for aggregate type. */
+ if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
+ {
+ if (TREE_CODE (expr) == ARRAY_TYPE)
+ compute_array_mode (expr);
+ else if (RECORD_OR_UNION_TYPE_P (expr))
+ compute_record_mode (expr);
+ else
+ gcc_unreachable ();
+ }
#endif
}
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..7de4447a1b5 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct bitpack_d *bp, tree expr)
static void
pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
{
- bp_pack_machine_mode (bp, DECL_MODE (expr));
+ /* Similar to TYPE_MODE, avoid streaming out host-specific DECL_MODE
+ for aggregate type with offloading enabled, and while streaming-in
+ recompute appropriate DECL_MODE for accelerator. */
+ if (lto_stream_offload_p
+ && (VAR_P (expr)
+ || TREE_CODE (expr) == PARM_DECL
+ || TREE_CODE (expr) == FIELD_DECL)
+ && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
+ bp_pack_machine_mode (bp, VOIDmode);
+ else
+ bp_pack_machine_mode (bp, DECL_MODE (expr));
bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
bp_pack_value (bp, DECL_IGNORED_P (expr), 1);
@@ -317,10 +327,18 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr)
static void
pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
{
+ /* For offloading, avoid streaming out TYPE_MODE for aggregate type since
+ it may be host-specific. For eg, aarch64 uses OImode for ARRAY_TYPE
+ whose size is 256-bits, which is not representable on accelerator.
+ Instead stream out VOIDmode, and while streaming-in, recompute
+ appropriate TYPE_MODE for accelerator. */
+ if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
+ bp_pack_machine_mode (bp, VOIDmode);
/* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
not necessary valid in a global context.
Use the raw value previously set by layout_type. */
- bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
+ else
+ bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
/* TYPE_NO_FORCE_BLK is private to stor-layout and need
no streaming. */
bp_pack_value (bp, TYPE_PACKED (expr), 1);
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-21 11:04 ` Prathamesh Kulkarni
@ 2024-08-21 11:39 ` Richard Biener
2024-08-22 14:11 ` Prathamesh Kulkarni
0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2024-08-21 11:39 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 10688 bytes --]
On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
>
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, August 20, 2024 10:36 AM
> > To: Richard Sandiford <richard.sandiford@arm.com>
> > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas Schwinge
> > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > <richard.sandiford@arm.com>:
> > >
> > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index
> > >> cbf6041fd68..0420183faf8 100644
> > >> --- a/gcc/lto-streamer-in.cc
> > >> +++ b/gcc/lto-streamer-in.cc
> > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not
> > see
> > >> #include "debug.h"
> > >> #include "alloc-pool.h"
> > >> #include "toplev.h"
> > >> +#include "stor-layout.h"
> > >>
> > >> /* Allocator used to hold string slot entries for line map
> > streaming.
> > >> */ static struct object_allocator<struct string_slot>
> > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
> > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > >> with -g1, see for example PR113488. */
> > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
> > expr)
> > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > >> +
> > >> +#ifdef ACCEL_COMPILER
> > >> + /* For decl with aggregate type, host streams out VOIDmode.
> > >> + Compute the correct DECL_MODE by calling relayout_decl. */
> > >> + if ((VAR_P (expr)
> > >> + || TREE_CODE (expr) == PARM_DECL
> > >> + || TREE_CODE (expr) == FIELD_DECL)
> > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > >> + && DECL_MODE (expr) == VOIDmode)
> > >> + relayout_decl (expr);
> > >> +#endif
> > >
> > > Genuine question, but: is relayout_decl safe in this context? It
> > does
> > > a lot more than just reset the mode. It also applies the target
> > ABI's
> > > preferences wrt alignment, padding, and so on, rather than
> > preserving
> > > those of the host's.
> >
> > It would be better to just recompute the mode here.
> Hi,
> The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> I checked layout_decl_type does the same thing for setting decl mode, except for bit fields. Since bit-fields cannot have
> aggregate type, I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) would be OK in this case ?
Yep, that should work.
> Sorry if this sounds like a silly ques -- Why would it be unsafe to call relayout_decl for variables that are mapped to accelerator even
> if it'd not preserve host's properties ? I assumed we want to assign accel's ABI properties for mapped decls (mode being one of them),
> or am I misunderstanding ?
Structure layout need not be compatible but we are preserving that of
the host instead of re-layouting in target context. Likewise
type <-> mode mapping doesn't have to agree.
Richard.
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>
> Thanks,
> Prathamesh
> >
> > Richard
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > >> }
> > >> }
> > >>
> > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > >> 10c0809914c..0ff8bd1171e 100644
> > >> --- a/gcc/stor-layout.cc
> > >> +++ b/gcc/stor-layout.cc
> > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char
> > *name, tree fields,
> > >> layout_decl (TYPE_NAME (type), 0);
> > >> }
> > >>
> > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > >> +
> > >> +void compute_array_mode (tree type)
> > >> +{
> > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > >> +
> > >> + SET_TYPE_MODE (type, BLKmode);
> > >> + if (TYPE_SIZE (type) != 0
> > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > >> + /* BLKmode elements force BLKmode aggregate;
> > >> + else extract/store fields may lose. */
> > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > >> + {
> > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > >> + TYPE_SIZE (type)));
> > >> + if (TYPE_MODE (type) != BLKmode
> > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
> > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > (type)))
> > >> + {
> > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > >> + SET_TYPE_MODE (type, BLKmode);
> > >> + }
> > >> + }
> > >> +}
> > >> +
> > >> /* Calculate the mode, size, and alignment for TYPE.
> > >> For an array type, calculate the element separation as well.
> > >> Record TYPE on the chain of permanent or temporary types @@
> > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > >> align = MAX (align, BITS_PER_UNIT); #endif
> > >> SET_TYPE_ALIGN (type, align);
> > >> - SET_TYPE_MODE (type, BLKmode);
> > >> - if (TYPE_SIZE (type) != 0
> > >> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> > >> - /* BLKmode elements force BLKmode aggregate;
> > >> - else extract/store fields may lose. */
> > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > >> - {
> > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > >> - TYPE_SIZE (type)));
> > >> - if (TYPE_MODE (type) != BLKmode
> > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > BIGGEST_ALIGNMENT
> > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > (type)))
> > >> - {
> > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > >> - SET_TYPE_MODE (type, BLKmode);
> > >> - }
> > >> - }
> > >> + compute_array_mode (type);
> > >> if (AGGREGATE_TYPE_P (element))
> > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > (element);
> > >> /* When the element size is constant, check that it is at least
> > as
> > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > >> 096ca811762..9d9b8c385f6 100644
> > >> --- a/gcc/stor-layout.h
> > >> +++ b/gcc/stor-layout.h
> > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far (record_layout_info);
> > >> extern void normalize_rli (record_layout_info); extern void
> > >> place_field (record_layout_info, tree); extern void
> > >> compute_record_mode (tree);
> > >> +extern void compute_array_mode (tree);
> > >> extern void finish_bitfield_layout (tree); extern void
> > >> finish_record_layout (record_layout_info, int); extern void
> > >> finalize_size_functions (void); diff --git a/gcc/tree-streamer-
> > in.cc
> > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4 100644
> > >> --- a/gcc/tree-streamer-in.cc
> > >> +++ b/gcc/tree-streamer-in.cc
> > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not
> > see
> > >> #include "attribs.h"
> > >> #include "asan.h"
> > >> #include "opts.h"
> > >> +#include "stor-layout.h"
> > >>
> > >>
> > >> /* Read a STRING_CST from the string table in DATA_IN using input
> > @@
> > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct
> > >> bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > >> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> > >> SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
> > >> +
> > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > >> + {
> > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > >> + compute_array_mode (expr);
> > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > >> + compute_record_mode (expr);
> > >> + else
> > >> + gcc_unreachable ();
> > >> + }
> > >> #endif
> > >> }
> > >>
> > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
> > >> index b7205287ffb..7de4447a1b5 100644
> > >> --- a/gcc/tree-streamer-out.cc
> > >> +++ b/gcc/tree-streamer-out.cc
> > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > bitpack_d
> > >> *bp, tree expr) static void pack_ts_decl_common_value_fields
> > (struct
> > >> bitpack_d *bp, tree expr) {
> > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > >> + /* Similar to TYPE_MODE, avoid streaming out host-specific
> > DECL_MODE
> > >> + for aggregate type with offloading enabled, and while
> > streaming-in
> > >> + recompute appropriate DECL_MODE for accelerator. */ if
> > >> + (lto_stream_offload_p
> > >> + && (VAR_P (expr)
> > >> + || TREE_CODE (expr) == PARM_DECL
> > >> + || TREE_CODE (expr) == FIELD_DECL)
> > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10 +327,18
> > @@
> > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree
> > expr)
> > >> static void pack_ts_type_common_value_fields (struct bitpack_d *bp,
> > >> tree expr) {
> > >> + /* For offloading, avoid streaming out TYPE_MODE for aggregate
> > type since
> > >> + it may be host-specific. For eg, aarch64 uses OImode for
> > ARRAY_TYPE
> > >> + whose size is 256-bits, which is not representable on
> > accelerator.
> > >> + Instead stream out VOIDmode, and while streaming-in,
> > recompute
> > >> + appropriate TYPE_MODE for accelerator. */ if
> > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > >> + bp_pack_machine_mode (bp, VOIDmode);
> > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > target_flags
> > >> not necessary valid in a global context.
> > >> Use the raw value previously set by layout_type. */
> > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > >> + else
> > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > >> no streaming. */
> > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-21 11:39 ` Richard Biener
@ 2024-08-22 14:11 ` Prathamesh Kulkarni
2024-08-22 15:01 ` Richard Sandiford
2024-09-03 3:55 ` Prathamesh Kulkarni
0 siblings, 2 replies; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-08-22 14:11 UTC (permalink / raw)
To: Richard Biener; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, August 21, 2024 5:09 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas Schwinge
> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > <richard.sandiford@arm.com>:
> > > >
> > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> > > >> index
> > > >> cbf6041fd68..0420183faf8 100644
> > > >> --- a/gcc/lto-streamer-in.cc
> > > >> +++ b/gcc/lto-streamer-in.cc
> > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If
> not
> > > see
> > > >> #include "debug.h"
> > > >> #include "alloc-pool.h"
> > > >> #include "toplev.h"
> > > >> +#include "stor-layout.h"
> > > >>
> > > >> /* Allocator used to hold string slot entries for line map
> > > streaming.
> > > >> */ static struct object_allocator<struct string_slot>
> > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
> > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > >> with -g1, see for example PR113488. */
> > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
> > > expr)
> > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > >> +
> > > >> +#ifdef ACCEL_COMPILER
> > > >> + /* For decl with aggregate type, host streams out
> VOIDmode.
> > > >> + Compute the correct DECL_MODE by calling relayout_decl.
> */
> > > >> + if ((VAR_P (expr)
> > > >> + || TREE_CODE (expr) == PARM_DECL
> > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > >> + && DECL_MODE (expr) == VOIDmode)
> > > >> + relayout_decl (expr);
> > > >> +#endif
> > > >
> > > > Genuine question, but: is relayout_decl safe in this context?
> It
> > > does
> > > > a lot more than just reset the mode. It also applies the target
> > > ABI's
> > > > preferences wrt alignment, padding, and so on, rather than
> > > preserving
> > > > those of the host's.
> > >
> > > It would be better to just recompute the mode here.
> > Hi,
> > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> > I checked layout_decl_type does the same thing for setting decl
> mode,
> > except for bit fields. Since bit-fields cannot have aggregate type,
> I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr))
> would be OK in this case ?
>
> Yep, that should work.
Thanks, I have committed the patch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182871e105f47823b8e72
after verifying it passes bootstrap+test on aarch64-linux-gnu,
and libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
>
> > Sorry if this sounds like a silly ques -- Why would it be unsafe to
> > call relayout_decl for variables that are mapped to accelerator even
> > if it'd not preserve host's properties ? I assumed we want to assign
> accel's ABI properties for mapped decls (mode being one of them), or
> am I misunderstanding ?
>
> Structure layout need not be compatible but we are preserving that of
> the host instead of re-layouting in target context. Likewise type <->
> mode mapping doesn't have to agree.
Ah OK, thanks for clarifying. So IIUC, in future, we might need to change that if
(in theory), host's structure layout for a decl is incompatible with a particular accel's ABI
and will need to relayout in accel's context ?
Thanks,
Prathamesh
>
> Richard.
>
> > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard
> > >
> > > > Thanks,
> > > > Richard
> > > >
> > > >
> > > >> }
> > > >> }
> > > >>
> > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > > >> 10c0809914c..0ff8bd1171e 100644
> > > >> --- a/gcc/stor-layout.cc
> > > >> +++ b/gcc/stor-layout.cc
> > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const
> > > >> char
> > > *name, tree fields,
> > > >> layout_decl (TYPE_NAME (type), 0); }
> > > >>
> > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > > >> +
> > > >> +void compute_array_mode (tree type) {
> > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > >> +
> > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE (type) != 0
> > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > >> + /* BLKmode elements force BLKmode aggregate;
> > > >> + else extract/store fields may lose. */
> > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > >> + {
> > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > > >> + TYPE_SIZE (type)));
> > > >> + if (TYPE_MODE (type) != BLKmode
> > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> BIGGEST_ALIGNMENT
> > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > (type)))
> > > >> + {
> > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > >> + SET_TYPE_MODE (type, BLKmode);
> > > >> + }
> > > >> + }
> > > >> +}
> > > >> +
> > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > >> For an array type, calculate the element separation as well.
> > > >> Record TYPE on the chain of permanent or temporary types @@
> > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > >> SET_TYPE_ALIGN (type, align);
> > > >> - SET_TYPE_MODE (type, BLKmode);
> > > >> - if (TYPE_SIZE (type) != 0
> > > >> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > >> - /* BLKmode elements force BLKmode aggregate;
> > > >> - else extract/store fields may lose. */
> > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > >> - {
> > > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > > >> - TYPE_SIZE (type)));
> > > >> - if (TYPE_MODE (type) != BLKmode
> > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > BIGGEST_ALIGNMENT
> > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > (type)))
> > > >> - {
> > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > >> - SET_TYPE_MODE (type, BLKmode);
> > > >> - }
> > > >> - }
> > > >> + compute_array_mode (type);
> > > >> if (AGGREGATE_TYPE_P (element))
> > > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > > (element);
> > > >> /* When the element size is constant, check that it is at
> > > >> least
> > > as
> > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > > >> 096ca811762..9d9b8c385f6 100644
> > > >> --- a/gcc/stor-layout.h
> > > >> +++ b/gcc/stor-layout.h
> > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > >> (record_layout_info); extern void normalize_rli
> > > >> (record_layout_info); extern void place_field
> > > >> (record_layout_info, tree); extern void compute_record_mode
> > > >> (tree);
> > > >> +extern void compute_array_mode (tree);
> > > >> extern void finish_bitfield_layout (tree); extern void
> > > >> finish_record_layout (record_layout_info, int); extern void
> > > >> finalize_size_functions (void); diff --git a/gcc/tree-streamer-
> > > in.cc
> > > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4 100644
> > > >> --- a/gcc/tree-streamer-in.cc
> > > >> +++ b/gcc/tree-streamer-in.cc
> > > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If
> not
> > > see
> > > >> #include "attribs.h"
> > > >> #include "asan.h"
> > > >> #include "opts.h"
> > > >> +#include "stor-layout.h"
> > > >>
> > > >>
> > > >> /* Read a STRING_CST from the string table in DATA_IN using
> input
> > > @@
> > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct
> > > >> bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > > >> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> > > >> SET_TYPE_ALIGN (expr, targetm.absolute_biggest_alignment);
> > > >> +
> > > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > >> + {
> > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > >> + compute_array_mode (expr);
> > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > >> + compute_record_mode (expr);
> > > >> + else
> > > >> + gcc_unreachable ();
> > > >> + }
> > > >> #endif
> > > >> }
> > > >>
> > > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-
> out.cc
> > > >> index b7205287ffb..7de4447a1b5 100644
> > > >> --- a/gcc/tree-streamer-out.cc
> > > >> +++ b/gcc/tree-streamer-out.cc
> > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > > bitpack_d
> > > >> *bp, tree expr) static void pack_ts_decl_common_value_fields
> > > (struct
> > > >> bitpack_d *bp, tree expr) {
> > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > >> + /* Similar to TYPE_MODE, avoid streaming out host-specific
> > > DECL_MODE
> > > >> + for aggregate type with offloading enabled, and while
> > > streaming-in
> > > >> + recompute appropriate DECL_MODE for accelerator. */ if
> > > >> + (lto_stream_offload_p
> > > >> + && (VAR_P (expr)
> > > >> + || TREE_CODE (expr) == PARM_DECL
> > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10
> > > >> +327,18
> > > @@
> > > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree
> > > expr)
> > > >> static void pack_ts_type_common_value_fields (struct bitpack_d
> > > >> *bp, tree expr) {
> > > >> + /* For offloading, avoid streaming out TYPE_MODE for
> aggregate
> > > type since
> > > >> + it may be host-specific. For eg, aarch64 uses OImode for
> > > ARRAY_TYPE
> > > >> + whose size is 256-bits, which is not representable on
> > > accelerator.
> > > >> + Instead stream out VOIDmode, and while streaming-in,
> > > recompute
> > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > > target_flags
> > > >> not necessary valid in a global context.
> > > >> Use the raw value previously set by layout_type. */
> > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > >> + else
> > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > > >> no streaming. */
> > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-22 14:11 ` Prathamesh Kulkarni
@ 2024-08-22 15:01 ` Richard Sandiford
2024-09-03 3:55 ` Prathamesh Kulkarni
1 sibling, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2024-08-22 15:01 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Biener, Thomas Schwinge, gcc-patches
Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Wednesday, August 21, 2024 5:09 PM
>> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
>> Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
>> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
>> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
>> accelerator
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
>>
>> >
>> >
>> > > -----Original Message-----
>> > > From: Richard Biener <rguenther@suse.de>
>> > > Sent: Tuesday, August 20, 2024 10:36 AM
>> > > To: Richard Sandiford <richard.sandiford@arm.com>
>> > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas Schwinge
>> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
>> > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in
>> > > for accelerator
>> > >
>> > > External email: Use caution opening links or attachments
>> > >
>> > >
>> > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
>> > > <richard.sandiford@arm.com>:
>> > > >
>> > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
>> > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
>> > > >> index
>> > > >> cbf6041fd68..0420183faf8 100644
>> > > >> --- a/gcc/lto-streamer-in.cc
>> > > >> +++ b/gcc/lto-streamer-in.cc
>> > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If
>> not
>> > > see
>> > > >> #include "debug.h"
>> > > >> #include "alloc-pool.h"
>> > > >> #include "toplev.h"
>> > > >> +#include "stor-layout.h"
>> > > >>
>> > > >> /* Allocator used to hold string slot entries for line map
>> > > streaming.
>> > > >> */ static struct object_allocator<struct string_slot>
>> > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
>> > > (class lto_input_block *ib, class data_in *data_in, tree expr)
>> > > >> with -g1, see for example PR113488. */
>> > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
>> > > expr)
>> > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
>> > > >> +
>> > > >> +#ifdef ACCEL_COMPILER
>> > > >> + /* For decl with aggregate type, host streams out
>> VOIDmode.
>> > > >> + Compute the correct DECL_MODE by calling relayout_decl.
>> */
>> > > >> + if ((VAR_P (expr)
>> > > >> + || TREE_CODE (expr) == PARM_DECL
>> > > >> + || TREE_CODE (expr) == FIELD_DECL)
>> > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
>> > > >> + && DECL_MODE (expr) == VOIDmode)
>> > > >> + relayout_decl (expr);
>> > > >> +#endif
>> > > >
>> > > > Genuine question, but: is relayout_decl safe in this context?
>> It
>> > > does
>> > > > a lot more than just reset the mode. It also applies the target
>> > > ABI's
>> > > > preferences wrt alignment, padding, and so on, rather than
>> > > preserving
>> > > > those of the host's.
>> > >
>> > > It would be better to just recompute the mode here.
>> > Hi,
>> > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
>> (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
>> > I checked layout_decl_type does the same thing for setting decl
>> mode,
>> > except for bit fields. Since bit-fields cannot have aggregate type,
>> I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr))
>> would be OK in this case ?
>>
>> Yep, that should work.
> Thanks, I have committed the patch in:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182871e105f47823b8e72
>
> after verifying it passes bootstrap+test on aarch64-linux-gnu,
> and libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
>>
>> > Sorry if this sounds like a silly ques -- Why would it be unsafe to
>> > call relayout_decl for variables that are mapped to accelerator even
>> > if it'd not preserve host's properties ? I assumed we want to assign
>> accel's ABI properties for mapped decls (mode being one of them), or
>> am I misunderstanding ?
>>
>> Structure layout need not be compatible but we are preserving that of
>> the host instead of re-layouting in target context. Likewise type <->
>> mode mapping doesn't have to agree.
> Ah OK, thanks for clarifying. So IIUC, in future, we might need to change that if
> (in theory), host's structure layout for a decl is incompatible with a particular accel's ABI
> and will need to relayout in accel's context ?
If structures are ever used to communicate between the host and the
accelerator, they would need to be laid out as the host expects,
otherwise we'd get data corruption. But maybe structures are never
used that way (it's not my area!).
Richard
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-08-22 14:11 ` Prathamesh Kulkarni
2024-08-22 15:01 ` Richard Sandiford
@ 2024-09-03 3:55 ` Prathamesh Kulkarni
2024-09-09 13:54 ` Richard Biener
1 sibling, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-09-03 3:55 UTC (permalink / raw)
To: Prathamesh Kulkarni, Richard Biener
Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 15234 bytes --]
> -----Original Message-----
> From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Sent: Thursday, August 22, 2024 7:41 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, August 21, 2024 5:09 PM
> > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas
> Schwinge
> > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming
> in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > <richard.sandiford@arm.com>:
> > > > >
> > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> > > > >> index
> > > > >> cbf6041fd68..0420183faf8 100644
> > > > >> --- a/gcc/lto-streamer-in.cc
> > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If
> > not
> > > > see
> > > > >> #include "debug.h"
> > > > >> #include "alloc-pool.h"
> > > > >> #include "toplev.h"
> > > > >> +#include "stor-layout.h"
> > > > >>
> > > > >> /* Allocator used to hold string slot entries for line map
> > > > streaming.
> > > > >> */ static struct object_allocator<struct string_slot>
> > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> lto_read_tree_1
> > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > >> with -g1, see for example PR113488. */
> > > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> ==
> > > > expr)
> > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > >> +
> > > > >> +#ifdef ACCEL_COMPILER
> > > > >> + /* For decl with aggregate type, host streams out
> > VOIDmode.
> > > > >> + Compute the correct DECL_MODE by calling relayout_decl.
> > */
> > > > >> + if ((VAR_P (expr)
> > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > >> + relayout_decl (expr);
> > > > >> +#endif
> > > > >
> > > > > Genuine question, but: is relayout_decl safe in this context?
> > It
> > > > does
> > > > > a lot more than just reset the mode. It also applies the
> target
> > > > ABI's
> > > > > preferences wrt alignment, padding, and so on, rather than
> > > > preserving
> > > > > those of the host's.
> > > >
> > > > It would be better to just recompute the mode here.
> > > Hi,
> > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> > > I checked layout_decl_type does the same thing for setting decl
> > mode,
> > > except for bit fields. Since bit-fields cannot have aggregate
> type,
> > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> (expr))
> > would be OK in this case ?
> >
> > Yep, that should work.
> Thanks, I have committed the patch in:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b18287
> 1e105f47823b8e72
Hi,
This also results in same failure (using OImode) for vector of 256-bit type,
which was triggered for firstprivate-mappings-1.c.
Can be reproduced with following simple test-case:
typedef long v4di __attribute__((vector_size (sizeof (long) * 4)));
int main()
{
v4di x;
#pragma acc parallel copy(x)
x;
return 0;
}
Compiling with -fopenacc -foffload=nvptx-none:
lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported (mode ‘OI’)
compilation terminated.
nvptx mkoffload: fatal error: ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
The attached patch fixes the test with same approach as for aggregate type -- streaming out
VOIDmode from host, and recomputing mode for vector_type during stream-in for accelerator.
LTO bootstrap+tested on aarch64-linux-gnu.
Does the patch look OK ?
If we go with this approach, would it be safe to remove the following hunk from lto_input_mode_table,
since vector modes would no longer be streamed out in LTO bytecode ?
case MODE_VECTOR_BOOL:
case MODE_VECTOR_INT:
case MODE_VECTOR_FLOAT:
case MODE_VECTOR_FRACT:
case MODE_VECTOR_UFRACT:
case MODE_VECTOR_ACCUM:
case MODE_VECTOR_UACCUM:
/* For unsupported vector modes just use BLKmode,
if the scalar mode is supported. */
if (table[(int) inner] != VOIDmode)
{
table[m] = BLKmode;
break;
}
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks,
Prathamesh
>
> after verifying it passes bootstrap+test on aarch64-linux-gnu, and
> libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
> >
> > > Sorry if this sounds like a silly ques -- Why would it be unsafe
> to
> > > call relayout_decl for variables that are mapped to accelerator
> even
> > > if it'd not preserve host's properties ? I assumed we want to
> assign
> > accel's ABI properties for mapped decls (mode being one of them), or
> > am I misunderstanding ?
> >
> > Structure layout need not be compatible but we are preserving that
> of
> > the host instead of re-layouting in target context. Likewise type
> <->
> > mode mapping doesn't have to agree.
> Ah OK, thanks for clarifying. So IIUC, in future, we might need to
> change that if (in theory), host's structure layout for a decl is
> incompatible with a particular accel's ABI and will need to relayout
> in accel's context ?
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Richard
> > > >
> > > > > Thanks,
> > > > > Richard
> > > > >
> > > > >
> > > > >> }
> > > > >> }
> > > > >>
> > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > > > >> 10c0809914c..0ff8bd1171e 100644
> > > > >> --- a/gcc/stor-layout.cc
> > > > >> +++ b/gcc/stor-layout.cc
> > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type,
> const
> > > > >> char
> > > > *name, tree fields,
> > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > >>
> > > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > > > >> +
> > > > >> +void compute_array_mode (tree type) {
> > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > >> +
> > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE (type) != 0
> > > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > >> + else extract/store fields may lose. */
> > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > >> + {
> > > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > > > >> + TYPE_SIZE (type)));
> > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > BIGGEST_ALIGNMENT
> > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > (type)))
> > > > >> + {
> > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > >> + }
> > > > >> + }
> > > > >> +}
> > > > >> +
> > > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > > >> For an array type, calculate the element separation as
> well.
> > > > >> Record TYPE on the chain of permanent or temporary types
> @@
> > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > >> SET_TYPE_ALIGN (type, align);
> > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > >> - if (TYPE_SIZE (type) != 0
> > > > >> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > >> - /* BLKmode elements force BLKmode aggregate;
> > > > >> - else extract/store fields may lose. */
> > > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > >> - {
> > > > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> (type),
> > > > >> - TYPE_SIZE (type)));
> > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > BIGGEST_ALIGNMENT
> > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > (type)))
> > > > >> - {
> > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > >> - }
> > > > >> - }
> > > > >> + compute_array_mode (type);
> > > > >> if (AGGREGATE_TYPE_P (element))
> > > > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > > > (element);
> > > > >> /* When the element size is constant, check that it is at
> > > > >> least
> > > > as
> > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > > > >> 096ca811762..9d9b8c385f6 100644
> > > > >> --- a/gcc/stor-layout.h
> > > > >> +++ b/gcc/stor-layout.h
> > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > >> (record_layout_info); extern void normalize_rli
> > > > >> (record_layout_info); extern void place_field
> > > > >> (record_layout_info, tree); extern void compute_record_mode
> > > > >> (tree);
> > > > >> +extern void compute_array_mode (tree);
> > > > >> extern void finish_bitfield_layout (tree); extern void
> > > > >> finish_record_layout (record_layout_info, int); extern void
> > > > >> finalize_size_functions (void); diff --git a/gcc/tree-
> streamer-
> > > > in.cc
> > > > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4
> 100644
> > > > >> --- a/gcc/tree-streamer-in.cc
> > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If
> > not
> > > > see
> > > > >> #include "attribs.h"
> > > > >> #include "asan.h"
> > > > >> #include "opts.h"
> > > > >> +#include "stor-layout.h"
> > > > >>
> > > > >>
> > > > >> /* Read a STRING_CST from the string table in DATA_IN using
> > input
> > > > @@
> > > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct
> > > > >> bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > > > >> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> > > > >> SET_TYPE_ALIGN (expr,
> targetm.absolute_biggest_alignment);
> > > > >> +
> > > > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > > > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > >> + {
> > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > >> + compute_array_mode (expr);
> > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > >> + compute_record_mode (expr);
> > > > >> + else
> > > > >> + gcc_unreachable ();
> > > > >> + }
> > > > >> #endif
> > > > >> }
> > > > >>
> > > > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-
> > out.cc
> > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > >> --- a/gcc/tree-streamer-out.cc
> > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > > > bitpack_d
> > > > >> *bp, tree expr) static void pack_ts_decl_common_value_fields
> > > > (struct
> > > > >> bitpack_d *bp, tree expr) {
> > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > >> + /* Similar to TYPE_MODE, avoid streaming out host-specific
> > > > DECL_MODE
> > > > >> + for aggregate type with offloading enabled, and while
> > > > streaming-in
> > > > >> + recompute appropriate DECL_MODE for accelerator. */
> if
> > > > >> + (lto_stream_offload_p
> > > > >> + && (VAR_P (expr)
> > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10
> > > > >> +327,18
> > > > @@
> > > > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp,
> tree
> > > > expr)
> > > > >> static void pack_ts_type_common_value_fields (struct
> bitpack_d
> > > > >> *bp, tree expr) {
> > > > >> + /* For offloading, avoid streaming out TYPE_MODE for
> > aggregate
> > > > type since
> > > > >> + it may be host-specific. For eg, aarch64 uses OImode
> for
> > > > ARRAY_TYPE
> > > > >> + whose size is 256-bits, which is not representable on
> > > > accelerator.
> > > > >> + Instead stream out VOIDmode, and while streaming-in,
> > > > recompute
> > > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > > > target_flags
> > > > >> not necessary valid in a global context.
> > > > >> Use the raw value previously set by layout_type. */
> > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > >> + else
> > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > > > >> no streaming. */
> > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
[-- Attachment #2: p-166-5.txt --]
[-- Type: text/plain, Size: 2597 bytes --]
Recompute TYPE_MODE and DECL_MODE for vector_type for accelerator.
gcc/ChangeLog:
* lto-streamer-in.cc (lto_read_tree_1): Set TYPE_MODE and DECL_MODE
for vector_type if offloading is enabled.
* tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out
VOIDmode for vector_type if offloading is enabled.
(pack_ts_decl_common_value_fields): Likewise.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 64f75807328..45938b2ca6f 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
if ((VAR_P (expr)
|| TREE_CODE (expr) == PARM_DECL
|| TREE_CODE (expr) == FIELD_DECL)
- && AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) || VECTOR_TYPE_P (TREE_TYPE (expr)))
&& DECL_MODE (expr) == VOIDmode)
SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
#endif
}
+
+#ifdef ACCEL_COMPILER
+ if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
+ {
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
+ tree innertype = TREE_TYPE (expr);
+ machine_mode vmode
+ = mode_for_vector (SCALAR_TYPE_MODE (innertype), nunits).else_blk ();
+ SET_TYPE_MODE (expr, vmode);
+ }
+#endif
}
/* Read the physical representation of a tree node with tag TAG from
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index 7de4447a1b5..81f5aeb30a6 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -194,7 +194,8 @@ pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
&& (VAR_P (expr)
|| TREE_CODE (expr) == PARM_DECL
|| TREE_CODE (expr) == FIELD_DECL)
- && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
+ && (AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ || VECTOR_TYPE_P (TREE_TYPE (expr))))
bp_pack_machine_mode (bp, VOIDmode);
else
bp_pack_machine_mode (bp, DECL_MODE (expr));
@@ -332,7 +333,8 @@ pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
whose size is 256-bits, which is not representable on accelerator.
Instead stream out VOIDmode, and while streaming-in, recompute
appropriate TYPE_MODE for accelerator. */
- if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
+ if (lto_stream_offload_p
+ && (AGGREGATE_TYPE_P (expr) || VECTOR_TYPE_P (expr)))
bp_pack_machine_mode (bp, VOIDmode);
/* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
not necessary valid in a global context.
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-09-03 3:55 ` Prathamesh Kulkarni
@ 2024-09-09 13:54 ` Richard Biener
2024-09-24 6:02 ` Prathamesh Kulkarni
0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2024-09-09 13:54 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 17009 bytes --]
On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > Sent: Thursday, August 22, 2024 7:41 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > for
> > > accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas
> > Schwinge
> > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming
> > in
> > > > > for accelerator
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > <richard.sandiford@arm.com>:
> > > > > >
> > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> > > > > >> index
> > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If
> > > not
> > > > > see
> > > > > >> #include "debug.h"
> > > > > >> #include "alloc-pool.h"
> > > > > >> #include "toplev.h"
> > > > > >> +#include "stor-layout.h"
> > > > > >>
> > > > > >> /* Allocator used to hold string slot entries for line map
> > > > > streaming.
> > > > > >> */ static struct object_allocator<struct string_slot>
> > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > lto_read_tree_1
> > > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > > >> with -g1, see for example PR113488. */
> > > > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> > ==
> > > > > expr)
> > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > >> +
> > > > > >> +#ifdef ACCEL_COMPILER
> > > > > >> + /* For decl with aggregate type, host streams out
> > > VOIDmode.
> > > > > >> + Compute the correct DECL_MODE by calling relayout_decl.
> > > */
> > > > > >> + if ((VAR_P (expr)
> > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > >> + relayout_decl (expr);
> > > > > >> +#endif
> > > > > >
> > > > > > Genuine question, but: is relayout_decl safe in this context?
> > > It
> > > > > does
> > > > > > a lot more than just reset the mode. It also applies the
> > target
> > > > > ABI's
> > > > > > preferences wrt alignment, padding, and so on, rather than
> > > > > preserving
> > > > > > those of the host's.
> > > > >
> > > > > It would be better to just recompute the mode here.
> > > > Hi,
> > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> > > > I checked layout_decl_type does the same thing for setting decl
> > > mode,
> > > > except for bit fields. Since bit-fields cannot have aggregate
> > type,
> > > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > (expr))
> > > would be OK in this case ?
> > >
> > > Yep, that should work.
> > Thanks, I have committed the patch in:
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b18287
> > 1e105f47823b8e72
> Hi,
> This also results in same failure (using OImode) for vector of 256-bit type,
> which was triggered for firstprivate-mappings-1.c.
> Can be reproduced with following simple test-case:
>
> typedef long v4di __attribute__((vector_size (sizeof (long) * 4)));
> int main()
> {
> v4di x;
> #pragma acc parallel copy(x)
> x;
> return 0;
> }
>
> Compiling with -fopenacc -foffload=nvptx-none:
> lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported (mode ‘OI’)
> compilation terminated.
> nvptx mkoffload: fatal error: ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
> compilation terminated.
>
> The attached patch fixes the test with same approach as for aggregate type -- streaming out
> VOIDmode from host, and recomputing mode for vector_type during stream-in for accelerator.
> LTO bootstrap+tested on aarch64-linux-gnu.
> Does the patch look OK ?
@@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block *ib, class
data_in *data_in, tree expr)
if ((VAR_P (expr)
|| TREE_CODE (expr) == PARM_DECL
|| TREE_CODE (expr) == FIELD_DECL)
- && AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) || VECTOR_TYPE_P
(TREE_TYPE (expr)))
long line, please wrap.
&& DECL_MODE (expr) == VOIDmode)
SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
#endif
}
I'm not sure you can call TYPE_MODE aka vector_type_mode safely during
LTO streaming. Instead you possibly want to use TYPE_MODE_RAW here?
+#ifdef ACCEL_COMPILER
+ if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
+ {
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
+ tree innertype = TREE_TYPE (expr);
+ machine_mode vmode
+ = mode_for_vector (SCALAR_TYPE_MODE (innertype),
nunits).else_blk ();
+ SET_TYPE_MODE (expr, vmode);
I'm not sure this unambiguously specifies the mode, does it? (x2 modes,
etc.).
Richard?
> If we go with this approach, would it be safe to remove the following hunk from lto_input_mode_table,
> since vector modes would no longer be streamed out in LTO bytecode ?
I would guess you want to put an assert on the query side then?
> case MODE_VECTOR_BOOL:
> case MODE_VECTOR_INT:
> case MODE_VECTOR_FLOAT:
> case MODE_VECTOR_FRACT:
> case MODE_VECTOR_UFRACT:
> case MODE_VECTOR_ACCUM:
> case MODE_VECTOR_UACCUM:
> /* For unsupported vector modes just use BLKmode,
> if the scalar mode is supported. */
> if (table[(int) inner] != VOIDmode)
> {
> table[m] = BLKmode;
> break;
> }
>
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>
> Thanks,
> Prathamesh
> >
> > after verifying it passes bootstrap+test on aarch64-linux-gnu, and
> > libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
> > >
> > > > Sorry if this sounds like a silly ques -- Why would it be unsafe
> > to
> > > > call relayout_decl for variables that are mapped to accelerator
> > even
> > > > if it'd not preserve host's properties ? I assumed we want to
> > assign
> > > accel's ABI properties for mapped decls (mode being one of them), or
> > > am I misunderstanding ?
> > >
> > > Structure layout need not be compatible but we are preserving that
> > of
> > > the host instead of re-layouting in target context. Likewise type
> > <->
> > > mode mapping doesn't have to agree.
> > Ah OK, thanks for clarifying. So IIUC, in future, we might need to
> > change that if (in theory), host's structure layout for a decl is
> > incompatible with a particular accel's ABI and will need to relayout
> > in accel's context ?
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Richard
> > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > > > >
> > > > > >
> > > > > >> }
> > > > > >> }
> > > > > >>
> > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > > > > >> 10c0809914c..0ff8bd1171e 100644
> > > > > >> --- a/gcc/stor-layout.cc
> > > > > >> +++ b/gcc/stor-layout.cc
> > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type,
> > const
> > > > > >> char
> > > > > *name, tree fields,
> > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > >>
> > > > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > > > > >> +
> > > > > >> +void compute_array_mode (tree type) {
> > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > >> +
> > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE (type) != 0
> > > > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > >> + else extract/store fields may lose. */
> > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > >> + {
> > > > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> > > > > >> + TYPE_SIZE (type)));
> > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > BIGGEST_ALIGNMENT
> > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > > (type)))
> > > > > >> + {
> > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > >> + }
> > > > > >> + }
> > > > > >> +}
> > > > > >> +
> > > > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > > > >> For an array type, calculate the element separation as
> > well.
> > > > > >> Record TYPE on the chain of permanent or temporary types
> > @@
> > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > >> - && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > > >> - /* BLKmode elements force BLKmode aggregate;
> > > > > >> - else extract/store fields may lose. */
> > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > >> - {
> > > > > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> > (type),
> > > > > >> - TYPE_SIZE (type)));
> > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > BIGGEST_ALIGNMENT
> > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > > (type)))
> > > > > >> - {
> > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > >> - }
> > > > > >> - }
> > > > > >> + compute_array_mode (type);
> > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > > > > (element);
> > > > > >> /* When the element size is constant, check that it is at
> > > > > >> least
> > > > > as
> > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > >> --- a/gcc/stor-layout.h
> > > > > >> +++ b/gcc/stor-layout.h
> > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > >> (record_layout_info); extern void normalize_rli
> > > > > >> (record_layout_info); extern void place_field
> > > > > >> (record_layout_info, tree); extern void compute_record_mode
> > > > > >> (tree);
> > > > > >> +extern void compute_array_mode (tree);
> > > > > >> extern void finish_bitfield_layout (tree); extern void
> > > > > >> finish_record_layout (record_layout_info, int); extern void
> > > > > >> finalize_size_functions (void); diff --git a/gcc/tree-
> > streamer-
> > > > > in.cc
> > > > > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4
> > 100644
> > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If
> > > not
> > > > > see
> > > > > >> #include "attribs.h"
> > > > > >> #include "asan.h"
> > > > > >> #include "opts.h"
> > > > > >> +#include "stor-layout.h"
> > > > > >>
> > > > > >>
> > > > > >> /* Read a STRING_CST from the string table in DATA_IN using
> > > input
> > > > > @@
> > > > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields (struct
> > > > > >> bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > > > > >> if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
> > > > > >> SET_TYPE_ALIGN (expr,
> > targetm.absolute_biggest_alignment);
> > > > > >> +
> > > > > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > > > > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > > >> + {
> > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > >> + compute_array_mode (expr);
> > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > >> + compute_record_mode (expr);
> > > > > >> + else
> > > > > >> + gcc_unreachable ();
> > > > > >> + }
> > > > > >> #endif
> > > > > >> }
> > > > > >>
> > > > > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-
> > > out.cc
> > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > > > > bitpack_d
> > > > > >> *bp, tree expr) static void pack_ts_decl_common_value_fields
> > > > > (struct
> > > > > >> bitpack_d *bp, tree expr) {
> > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > >> + /* Similar to TYPE_MODE, avoid streaming out host-specific
> > > > > DECL_MODE
> > > > > >> + for aggregate type with offloading enabled, and while
> > > > > streaming-in
> > > > > >> + recompute appropriate DECL_MODE for accelerator. */
> > if
> > > > > >> + (lto_stream_offload_p
> > > > > >> + && (VAR_P (expr)
> > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10
> > > > > >> +327,18
> > > > > @@
> > > > > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp,
> > tree
> > > > > expr)
> > > > > >> static void pack_ts_type_common_value_fields (struct
> > bitpack_d
> > > > > >> *bp, tree expr) {
> > > > > >> + /* For offloading, avoid streaming out TYPE_MODE for
> > > aggregate
> > > > > type since
> > > > > >> + it may be host-specific. For eg, aarch64 uses OImode
> > for
> > > > > ARRAY_TYPE
> > > > > >> + whose size is 256-bits, which is not representable on
> > > > > accelerator.
> > > > > >> + Instead stream out VOIDmode, and while streaming-in,
> > > > > recompute
> > > > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > > > > target_flags
> > > > > >> not necessary valid in a global context.
> > > > > >> Use the raw value previously set by layout_type. */
> > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > >> + else
> > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > > > > >> no streaming. */
> > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-09-09 13:54 ` Richard Biener
@ 2024-09-24 6:02 ` Prathamesh Kulkarni
2024-09-24 6:59 ` Richard Biener
0 siblings, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-09-24 6:02 UTC (permalink / raw)
To: Richard Biener; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 20081 bytes --]
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, September 9, 2024 7:24 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > Sent: Thursday, August 22, 2024 7:41 PM
> > > To: Richard Biener <rguenther@suse.de>
> > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for
> > > > accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas
> > > Schwinge
> > > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > streaming
> > > in
> > > > > > for accelerator
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > <richard.sandiford@arm.com>:
> > > > > > >
> > > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
> > > > > > >> If
> > > > not
> > > > > > see
> > > > > > >> #include "debug.h"
> > > > > > >> #include "alloc-pool.h"
> > > > > > >> #include "toplev.h"
> > > > > > >> +#include "stor-layout.h"
> > > > > > >>
> > > > > > >> /* Allocator used to hold string slot entries for line map
> > > > > > streaming.
> > > > > > >> */ static struct object_allocator<struct string_slot>
> > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > lto_read_tree_1
> > > > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > > > >> with -g1, see for example PR113488. */
> > > > > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> > > ==
> > > > > > expr)
> > > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > >> +
> > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > >> + /* For decl with aggregate type, host streams out
> > > > VOIDmode.
> > > > > > >> + Compute the correct DECL_MODE by calling
> relayout_decl.
> > > > */
> > > > > > >> + if ((VAR_P (expr)
> > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > > >> + relayout_decl (expr);
> > > > > > >> +#endif
> > > > > > >
> > > > > > > Genuine question, but: is relayout_decl safe in this
> context?
> > > > It
> > > > > > does
> > > > > > > a lot more than just reset the mode. It also applies the
> > > target
> > > > > > ABI's
> > > > > > > preferences wrt alignment, padding, and so on, rather than
> > > > > > preserving
> > > > > > > those of the host's.
> > > > > >
> > > > > > It would be better to just recompute the mode here.
> > > > > Hi,
> > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > (expr)) in lto_read_tree_1 instead of calling relayout_decl
> (expr).
> > > > > I checked layout_decl_type does the same thing for setting decl
> > > > mode,
> > > > > except for bit fields. Since bit-fields cannot have aggregate
> > > type,
> > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > (expr))
> > > > would be OK in this case ?
> > > >
> > > > Yep, that should work.
> > > Thanks, I have committed the patch in:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182
> > > 87
> > > 1e105f47823b8e72
> > Hi,
> > This also results in same failure (using OImode) for vector of 256-bit
> > type, which was triggered for firstprivate-mappings-1.c.
> > Can be reproduced with following simple test-case:
> >
> > typedef long v4di __attribute__((vector_size (sizeof (long) * 4)));
> > int main() {
> > v4di x;
> > #pragma acc parallel copy(x)
> > x;
> > return 0;
> > }
> >
> > Compiling with -fopenacc -foffload=nvptx-none:
> > lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported
> > (mode ‘OI’) compilation terminated.
> > nvptx mkoffload: fatal error:
> > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned
> 1 exit status compilation terminated.
> >
> > The attached patch fixes the test with same approach as for aggregate
> > type -- streaming out VOIDmode from host, and recomputing mode for
> vector_type during stream-in for accelerator.
> > LTO bootstrap+tested on aarch64-linux-gnu.
> > Does the patch look OK ?
>
> @@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block *ib,
> class data_in *data_in, tree expr)
> if ((VAR_P (expr)
> || TREE_CODE (expr) == PARM_DECL
> || TREE_CODE (expr) == FIELD_DECL)
> - && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> + && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) || VECTOR_TYPE_P
> (TREE_TYPE (expr)))
>
> long line, please wrap.
>
> && DECL_MODE (expr) == VOIDmode)
> SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr))); #endif
> }
>
> I'm not sure you can call TYPE_MODE aka vector_type_mode safely during
> LTO streaming. Instead you possibly want to use TYPE_MODE_RAW here?
>
> +#ifdef ACCEL_COMPILER
> + if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> + {
> + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
> + tree innertype = TREE_TYPE (expr);
> + machine_mode vmode
> + = mode_for_vector (SCALAR_TYPE_MODE (innertype),
> nunits).else_blk ();
> + SET_TYPE_MODE (expr, vmode);
>
> I'm not sure this unambiguously specifies the mode, does it? (x2 modes,
> etc.).
>
> Richard?
>
>
> > If we go with this approach, would it be safe to remove the following
> > hunk from lto_input_mode_table, since vector modes would no longer be
> streamed out in LTO bytecode ?
>
> I would guess you want to put an assert on the query side then?
Hi Richard,
Thanks for the review and sorry for late reply.
The attached patch uses TYPE_MODE_RAW for vector_type,
and removes vector handling in lto_input_mode_table.
Should I also need to add an assert for !VECTOR_MODE_P
in bp_unpack_machine_mode (if we're in accel) or the check in lto_input_mode_table
should be sufficient ?
The patch moves the following hunk in lto_read_tree_1:
#ifdef ACCEL_COMPILER
if ((VAR_P (expr)
|| TREE_CODE (expr) == PARM_DECL
|| TREE_CODE (expr) == FIELD_DECL)
&& AGGREGATE_TYPE_P (TREE_TYPE (expr))
&& DECL_MODE (expr) == VOIDmode)
SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
#endif
outside the following condition:
if ((DECL_P (expr)
&& TREE_CODE (expr) != FIELD_DECL
&& TREE_CODE (expr) != DEBUG_EXPR_DECL
&& TREE_CODE (expr) != TYPE_DECL)
since the condition doesn't allow FIELD_DECL and thus would not set
mode for FIELD_DECL.
I am not sure how to infer vector mode from scalar_type and length, if we can't use
mode_for_vector here. Could you please suggest how to proceed ?
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks
Prathamesh
>
> > case MODE_VECTOR_BOOL:
> > case MODE_VECTOR_INT:
> > case MODE_VECTOR_FLOAT:
> > case MODE_VECTOR_FRACT:
> > case MODE_VECTOR_UFRACT:
> > case MODE_VECTOR_ACCUM:
> > case MODE_VECTOR_UACCUM:
> > /* For unsupported vector modes just use BLKmode,
> > if the scalar mode is supported. */
> > if (table[(int) inner] != VOIDmode)
> > {
> > table[m] = BLKmode;
> > break;
> > }
> >
> > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> >
> > Thanks,
> > Prathamesh
> > >
> > > after verifying it passes bootstrap+test on aarch64-linux-gnu, and
> > > libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
> > > >
> > > > > Sorry if this sounds like a silly ques -- Why would it be unsafe
> > > to
> > > > > call relayout_decl for variables that are mapped to accelerator
> > > even
> > > > > if it'd not preserve host's properties ? I assumed we want to
> > > assign
> > > > accel's ABI properties for mapped decls (mode being one of them),
> > > > or am I misunderstanding ?
> > > >
> > > > Structure layout need not be compatible but we are preserving that
> > > of
> > > > the host instead of re-layouting in target context. Likewise type
> > > <->
> > > > mode mapping doesn't have to agree.
> > > Ah OK, thanks for clarifying. So IIUC, in future, we might need to
> > > change that if (in theory), host's structure layout for a decl is
> > > incompatible with a particular accel's ABI and will need to relayout
> > > in accel's context ?
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Richard.
> > > >
> > > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Richard
> > > > > >
> > > > > > > Thanks,
> > > > > > > Richard
> > > > > > >
> > > > > > >
> > > > > > >> }
> > > > > > >> }
> > > > > > >>
> > > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > > > > > >> 10c0809914c..0ff8bd1171e 100644
> > > > > > >> --- a/gcc/stor-layout.cc
> > > > > > >> +++ b/gcc/stor-layout.cc
> > > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type,
> > > const
> > > > > > >> char
> > > > > > *name, tree fields,
> > > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > > >>
> > > > > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > > > > > >> +
> > > > > > >> +void compute_array_mode (tree type) {
> > > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > > >> +
> > > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE (type) !=
> 0
> > > > > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > > >> + else extract/store fields may lose. */
> > > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > >> + {
> > > > > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> (type),
> > > > > > >> + TYPE_SIZE (type)));
> > > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > BIGGEST_ALIGNMENT
> > > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > > > (type)))
> > > > > > >> + {
> > > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > > >> + }
> > > > > > >> + }
> > > > > > >> +}
> > > > > > >> +
> > > > > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > > > > >> For an array type, calculate the element separation as
> > > well.
> > > > > > >> Record TYPE on the chain of permanent or temporary types
> > > @@
> > > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > > >> - && ! targetm.member_type_forces_blk (type,
> VOIDmode)
> > > > > > >> - /* BLKmode elements force BLKmode aggregate;
> > > > > > >> - else extract/store fields may lose. */
> > > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > >> - {
> > > > > > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> > > (type),
> > > > > > >> - TYPE_SIZE (type)));
> > > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > BIGGEST_ALIGNMENT
> > > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> (TYPE_MODE
> > > > > > (type)))
> > > > > > >> - {
> > > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > >> - }
> > > > > > >> - }
> > > > > > >> + compute_array_mode (type);
> > > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > > > > > (element);
> > > > > > >> /* When the element size is constant, check that it is
> > > > > > >> at least
> > > > > > as
> > > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > > >> --- a/gcc/stor-layout.h
> > > > > > >> +++ b/gcc/stor-layout.h
> > > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > > >> (record_layout_info); extern void normalize_rli
> > > > > > >> (record_layout_info); extern void place_field
> > > > > > >> (record_layout_info, tree); extern void compute_record_mode
> > > > > > >> (tree);
> > > > > > >> +extern void compute_array_mode (tree);
> > > > > > >> extern void finish_bitfield_layout (tree); extern void
> > > > > > >> finish_record_layout (record_layout_info, int); extern void
> > > > > > >> finalize_size_functions (void); diff --git a/gcc/tree-
> > > streamer-
> > > > > > in.cc
> > > > > > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4
> > > 100644
> > > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
> > > > > > >> If
> > > > not
> > > > > > see
> > > > > > >> #include "attribs.h"
> > > > > > >> #include "asan.h"
> > > > > > >> #include "opts.h"
> > > > > > >> +#include "stor-layout.h"
> > > > > > >>
> > > > > > >>
> > > > > > >> /* Read a STRING_CST from the string table in DATA_IN using
> > > > input
> > > > > > @@
> > > > > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields
> > > > > > >> (struct bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > > > > > >> if (TYPE_ALIGN (expr) >
> targetm.absolute_biggest_alignment)
> > > > > > >> SET_TYPE_ALIGN (expr,
> > > targetm.absolute_biggest_alignment);
> > > > > > >> +
> > > > > > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > > > > > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > > > >> + {
> > > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > > >> + compute_array_mode (expr);
> > > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > > >> + compute_record_mode (expr);
> > > > > > >> + else
> > > > > > >> + gcc_unreachable ();
> > > > > > >> + }
> > > > > > >> #endif
> > > > > > >> }
> > > > > > >>
> > > > > > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-
> > > > out.cc
> > > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > > > > > bitpack_d
> > > > > > >> *bp, tree expr) static void
> > > > > > >> pack_ts_decl_common_value_fields
> > > > > > (struct
> > > > > > >> bitpack_d *bp, tree expr) {
> > > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > >> + /* Similar to TYPE_MODE, avoid streaming out
> > > > > > >> + host-specific
> > > > > > DECL_MODE
> > > > > > >> + for aggregate type with offloading enabled, and while
> > > > > > streaming-in
> > > > > > >> + recompute appropriate DECL_MODE for accelerator. */
> > > if
> > > > > > >> + (lto_stream_offload_p
> > > > > > >> + && (VAR_P (expr)
> > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10
> > > > > > >> +327,18
> > > > > > @@
> > > > > > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp,
> > > tree
> > > > > > expr)
> > > > > > >> static void pack_ts_type_common_value_fields (struct
> > > bitpack_d
> > > > > > >> *bp, tree expr) {
> > > > > > >> + /* For offloading, avoid streaming out TYPE_MODE for
> > > > aggregate
> > > > > > type since
> > > > > > >> + it may be host-specific. For eg, aarch64 uses OImode
> > > for
> > > > > > ARRAY_TYPE
> > > > > > >> + whose size is 256-bits, which is not representable on
> > > > > > accelerator.
> > > > > > >> + Instead stream out VOIDmode, and while streaming-in,
> > > > > > recompute
> > > > > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > > > > > target_flags
> > > > > > >> not necessary valid in a global context.
> > > > > > >> Use the raw value previously set by layout_type. */
> > > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > >> + else
> > > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > > > > > >> no streaming. */
> > > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany
> > > > GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
[-- Attachment #2: p-166-7.txt --]
[-- Type: text/plain, Size: 3710 bytes --]
Recompute TYPE_MODE and DECL_MODE for vector_type for accelerator.
gcc/ChangeLog:
* lto-streamer-in.cc (lto_read_tree_1): Set TYPE_MODE and DECL_MODE
for vector_type if offloading is enabled.
(lto_input_mode_table): Remove handling of vector modes.
* tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out
VOIDmode for vector_type if offloading is enabled.
(pack_ts_decl_common_value_fields): Likewise.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 9d0ec5d589c..15181c3f574 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1753,16 +1753,30 @@ lto_read_tree_1 (class lto_input_block *ib, class data_in *data_in, tree expr)
with -g1, see for example PR113488. */
else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
+ }
#ifdef ACCEL_COMPILER
- if ((VAR_P (expr)
- || TREE_CODE (expr) == PARM_DECL
- || TREE_CODE (expr) == FIELD_DECL)
- && AGGREGATE_TYPE_P (TREE_TYPE (expr))
- && DECL_MODE (expr) == VOIDmode)
- SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
-#endif
+ if ((VAR_P (expr)
+ || TREE_CODE (expr) == PARM_DECL
+ || TREE_CODE (expr) == FIELD_DECL)
+ && DECL_MODE (expr) == VOIDmode)
+ {
+ tree type = TREE_TYPE (expr);
+ if (AGGREGATE_TYPE_P (type))
+ SET_DECL_MODE (expr, TYPE_MODE (type));
+ else if (VECTOR_TYPE_P (type))
+ SET_DECL_MODE (expr, TYPE_MODE_RAW (type));
}
+
+ if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
+ {
+ poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
+ tree innertype = TREE_TYPE (expr);
+ machine_mode vmode
+ = mode_for_vector (SCALAR_TYPE_MODE (innertype), nunits).else_blk ();
+ SET_TYPE_MODE (expr, vmode);
+ }
+#endif
}
/* Read the physical representation of a tree node with tag TAG from
@@ -2106,13 +2120,9 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
case MODE_VECTOR_UFRACT:
case MODE_VECTOR_ACCUM:
case MODE_VECTOR_UACCUM:
- /* For unsupported vector modes just use BLKmode,
- if the scalar mode is supported. */
- if (table[(int) inner] != VOIDmode)
- {
- table[m] = BLKmode;
- break;
- }
+ /* Vector modes are recomputed on accel side and shouldn't have
+ been streamed-out from host. */
+ gcc_unreachable ();
/* FALLTHRU */
default:
/* This is only used for offloading-target compilations and
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index 7de4447a1b5..81f5aeb30a6 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -194,7 +194,8 @@ pack_ts_decl_common_value_fields (struct bitpack_d *bp, tree expr)
&& (VAR_P (expr)
|| TREE_CODE (expr) == PARM_DECL
|| TREE_CODE (expr) == FIELD_DECL)
- && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
+ && (AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ || VECTOR_TYPE_P (TREE_TYPE (expr))))
bp_pack_machine_mode (bp, VOIDmode);
else
bp_pack_machine_mode (bp, DECL_MODE (expr));
@@ -332,7 +333,8 @@ pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
whose size is 256-bits, which is not representable on accelerator.
Instead stream out VOIDmode, and while streaming-in, recompute
appropriate TYPE_MODE for accelerator. */
- if (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
+ if (lto_stream_offload_p
+ && (AGGREGATE_TYPE_P (expr) || VECTOR_TYPE_P (expr)))
bp_pack_machine_mode (bp, VOIDmode);
/* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
not necessary valid in a global context.
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-09-24 6:02 ` Prathamesh Kulkarni
@ 2024-09-24 6:59 ` Richard Biener
2024-10-01 14:56 ` Prathamesh Kulkarni
0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2024-09-24 6:59 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 20973 bytes --]
On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:
>
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, September 9, 2024 7:24 PM
> > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > To: Richard Biener <rguenther@suse.de>
> > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > > for
> > > > > accelerator
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas
> > > > Schwinge
> > > > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > streaming
> > > > in
> > > > > > > for accelerator
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > <richard.sandiford@arm.com>:
> > > > > > > >
> > > > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
> > > > > > > >> If
> > > > > not
> > > > > > > see
> > > > > > > >> #include "debug.h"
> > > > > > > >> #include "alloc-pool.h"
> > > > > > > >> #include "toplev.h"
> > > > > > > >> +#include "stor-layout.h"
> > > > > > > >>
> > > > > > > >> /* Allocator used to hold string slot entries for line map
> > > > > > > streaming.
> > > > > > > >> */ static struct object_allocator<struct string_slot>
> > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > lto_read_tree_1
> > > > > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > > > > >> with -g1, see for example PR113488. */
> > > > > > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> > > > ==
> > > > > > > expr)
> > > > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > >> +
> > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > >> + /* For decl with aggregate type, host streams out
> > > > > VOIDmode.
> > > > > > > >> + Compute the correct DECL_MODE by calling
> > relayout_decl.
> > > > > */
> > > > > > > >> + if ((VAR_P (expr)
> > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > > > >> + relayout_decl (expr);
> > > > > > > >> +#endif
> > > > > > > >
> > > > > > > > Genuine question, but: is relayout_decl safe in this
> > context?
> > > > > It
> > > > > > > does
> > > > > > > > a lot more than just reset the mode. It also applies the
> > > > target
> > > > > > > ABI's
> > > > > > > > preferences wrt alignment, padding, and so on, rather than
> > > > > > > preserving
> > > > > > > > those of the host's.
> > > > > > >
> > > > > > > It would be better to just recompute the mode here.
> > > > > > Hi,
> > > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > > (expr)) in lto_read_tree_1 instead of calling relayout_decl
> > (expr).
> > > > > > I checked layout_decl_type does the same thing for setting decl
> > > > > mode,
> > > > > > except for bit fields. Since bit-fields cannot have aggregate
> > > > type,
> > > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > > > (expr))
> > > > > would be OK in this case ?
> > > > >
> > > > > Yep, that should work.
> > > > Thanks, I have committed the patch in:
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182
> > > > 87
> > > > 1e105f47823b8e72
> > > Hi,
> > > This also results in same failure (using OImode) for vector of 256-bit
> > > type, which was triggered for firstprivate-mappings-1.c.
> > > Can be reproduced with following simple test-case:
> > >
> > > typedef long v4di __attribute__((vector_size (sizeof (long) * 4)));
> > > int main() {
> > > v4di x;
> > > #pragma acc parallel copy(x)
> > > x;
> > > return 0;
> > > }
> > >
> > > Compiling with -fopenacc -foffload=nvptx-none:
> > > lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported
> > > (mode ‘OI’) compilation terminated.
> > > nvptx mkoffload: fatal error:
> > > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned
> > 1 exit status compilation terminated.
> > >
> > > The attached patch fixes the test with same approach as for aggregate
> > > type -- streaming out VOIDmode from host, and recomputing mode for
> > vector_type during stream-in for accelerator.
> > > LTO bootstrap+tested on aarch64-linux-gnu.
> > > Does the patch look OK ?
> >
> > @@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block *ib,
> > class data_in *data_in, tree expr)
> > if ((VAR_P (expr)
> > || TREE_CODE (expr) == PARM_DECL
> > || TREE_CODE (expr) == FIELD_DECL)
> > - && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > + && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) || VECTOR_TYPE_P
> > (TREE_TYPE (expr)))
> >
> > long line, please wrap.
> >
> > && DECL_MODE (expr) == VOIDmode)
> > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr))); #endif
> > }
> >
> > I'm not sure you can call TYPE_MODE aka vector_type_mode safely during
> > LTO streaming. Instead you possibly want to use TYPE_MODE_RAW here?
> >
> > +#ifdef ACCEL_COMPILER
> > + if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > + {
> > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
> > + tree innertype = TREE_TYPE (expr);
> > + machine_mode vmode
> > + = mode_for_vector (SCALAR_TYPE_MODE (innertype),
> > nunits).else_blk ();
> > + SET_TYPE_MODE (expr, vmode);
> >
> > I'm not sure this unambiguously specifies the mode, does it? (x2 modes,
> > etc.).
> >
> > Richard?
> >
> >
> > > If we go with this approach, would it be safe to remove the following
> > > hunk from lto_input_mode_table, since vector modes would no longer be
> > streamed out in LTO bytecode ?
> >
> > I would guess you want to put an assert on the query side then?
> Hi Richard,
> Thanks for the review and sorry for late reply.
> The attached patch uses TYPE_MODE_RAW for vector_type,
> and removes vector handling in lto_input_mode_table.
>
> Should I also need to add an assert for !VECTOR_MODE_P
> in bp_unpack_machine_mode (if we're in accel) or the check in lto_input_mode_table
> should be sufficient ?
>
> The patch moves the following hunk in lto_read_tree_1:
>
> #ifdef ACCEL_COMPILER
> if ((VAR_P (expr)
> || TREE_CODE (expr) == PARM_DECL
> || TREE_CODE (expr) == FIELD_DECL)
> && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> && DECL_MODE (expr) == VOIDmode)
> SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
> #endif
>
> outside the following condition:
> if ((DECL_P (expr)
> && TREE_CODE (expr) != FIELD_DECL
> && TREE_CODE (expr) != DEBUG_EXPR_DECL
> && TREE_CODE (expr) != TYPE_DECL)
>
> since the condition doesn't allow FIELD_DECL and thus would not set
> mode for FIELD_DECL.
>
> I am not sure how to infer vector mode from scalar_type and length, if we can't use
> mode_for_vector here. Could you please suggest how to proceed ?
I have no good idea besides indeed using mode_for_vector as layout_type
does.
So OK unless Richard S. has anything to add.
Thanks,
Richard.
> Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
>
> Thanks
> Prathamesh
> >
> > > case MODE_VECTOR_BOOL:
> > > case MODE_VECTOR_INT:
> > > case MODE_VECTOR_FLOAT:
> > > case MODE_VECTOR_FRACT:
> > > case MODE_VECTOR_UFRACT:
> > > case MODE_VECTOR_ACCUM:
> > > case MODE_VECTOR_UACCUM:
> > > /* For unsupported vector modes just use BLKmode,
> > > if the scalar mode is supported. */
> > > if (table[(int) inner] != VOIDmode)
> > > {
> > > table[m] = BLKmode;
> > > break;
> > > }
> > >
> > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > after verifying it passes bootstrap+test on aarch64-linux-gnu, and
> > > > libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
> > > > >
> > > > > > Sorry if this sounds like a silly ques -- Why would it be unsafe
> > > > to
> > > > > > call relayout_decl for variables that are mapped to accelerator
> > > > even
> > > > > > if it'd not preserve host's properties ? I assumed we want to
> > > > assign
> > > > > accel's ABI properties for mapped decls (mode being one of them),
> > > > > or am I misunderstanding ?
> > > > >
> > > > > Structure layout need not be compatible but we are preserving that
> > > > of
> > > > > the host instead of re-layouting in target context. Likewise type
> > > > <->
> > > > > mode mapping doesn't have to agree.
> > > > Ah OK, thanks for clarifying. So IIUC, in future, we might need to
> > > > change that if (in theory), host's structure layout for a decl is
> > > > incompatible with a particular accel's ABI and will need to relayout
> > > > in accel's context ?
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Richard
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard
> > > > > > > >
> > > > > > > >
> > > > > > > >> }
> > > > > > > >> }
> > > > > > > >>
> > > > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> > > > > > > >> 10c0809914c..0ff8bd1171e 100644
> > > > > > > >> --- a/gcc/stor-layout.cc
> > > > > > > >> +++ b/gcc/stor-layout.cc
> > > > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type,
> > > > const
> > > > > > > >> char
> > > > > > > *name, tree fields,
> > > > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > > > >>
> > > > > > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE). */
> > > > > > > >> +
> > > > > > > >> +void compute_array_mode (tree type) {
> > > > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > > > >> +
> > > > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE (type) !=
> > 0
> > > > > > > >> + && ! targetm.member_type_forces_blk (type, VOIDmode)
> > > > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > > > >> + else extract/store fields may lose. */
> > > > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > >> + {
> > > > > > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> > (type),
> > > > > > > >> + TYPE_SIZE (type)));
> > > > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > BIGGEST_ALIGNMENT
> > > > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> > > > > > > (type)))
> > > > > > > >> + {
> > > > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > > > >> + }
> > > > > > > >> + }
> > > > > > > >> +}
> > > > > > > >> +
> > > > > > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > > > > > >> For an array type, calculate the element separation as
> > > > well.
> > > > > > > >> Record TYPE on the chain of permanent or temporary types
> > > > @@
> > > > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > > > >> - && ! targetm.member_type_forces_blk (type,
> > VOIDmode)
> > > > > > > >> - /* BLKmode elements force BLKmode aggregate;
> > > > > > > >> - else extract/store fields may lose. */
> > > > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > >> - {
> > > > > > > >> - SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> > > > (type),
> > > > > > > >> - TYPE_SIZE (type)));
> > > > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > BIGGEST_ALIGNMENT
> > > > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > (TYPE_MODE
> > > > > > > (type)))
> > > > > > > >> - {
> > > > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > >> - }
> > > > > > > >> - }
> > > > > > > >> + compute_array_mode (type);
> > > > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > > > >> TYPE_TYPELESS_STORAGE (type) = TYPE_TYPELESS_STORAGE
> > > > > > > (element);
> > > > > > > >> /* When the element size is constant, check that it is
> > > > > > > >> at least
> > > > > > > as
> > > > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index
> > > > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > > > >> --- a/gcc/stor-layout.h
> > > > > > > >> +++ b/gcc/stor-layout.h
> > > > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > > > >> (record_layout_info); extern void normalize_rli
> > > > > > > >> (record_layout_info); extern void place_field
> > > > > > > >> (record_layout_info, tree); extern void compute_record_mode
> > > > > > > >> (tree);
> > > > > > > >> +extern void compute_array_mode (tree);
> > > > > > > >> extern void finish_bitfield_layout (tree); extern void
> > > > > > > >> finish_record_layout (record_layout_info, int); extern void
> > > > > > > >> finalize_size_functions (void); diff --git a/gcc/tree-
> > > > streamer-
> > > > > > > in.cc
> > > > > > > >> b/gcc/tree-streamer-in.cc index 40029437199..329d218e7d4
> > > > 100644
> > > > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
> > > > > > > >> If
> > > > > not
> > > > > > > see
> > > > > > > >> #include "attribs.h"
> > > > > > > >> #include "asan.h"
> > > > > > > >> #include "opts.h"
> > > > > > > >> +#include "stor-layout.h"
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> /* Read a STRING_CST from the string table in DATA_IN using
> > > > > input
> > > > > > > @@
> > > > > > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields
> > > > > > > >> (struct bitpack_d *bp, tree expr) #ifdef ACCEL_COMPILER
> > > > > > > >> if (TYPE_ALIGN (expr) >
> > targetm.absolute_biggest_alignment)
> > > > > > > >> SET_TYPE_ALIGN (expr,
> > > > targetm.absolute_biggest_alignment);
> > > > > > > >> +
> > > > > > > >> + /* Host streams out VOIDmode for aggregate type. */ if
> > > > > > > >> + (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > > > > >> + {
> > > > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > > > >> + compute_array_mode (expr);
> > > > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > > > >> + compute_record_mode (expr);
> > > > > > > >> + else
> > > > > > > >> + gcc_unreachable ();
> > > > > > > >> + }
> > > > > > > >> #endif
> > > > > > > >> }
> > > > > > > >>
> > > > > > > >> diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-
> > > > > out.cc
> > > > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields (struct
> > > > > > > bitpack_d
> > > > > > > >> *bp, tree expr) static void
> > > > > > > >> pack_ts_decl_common_value_fields
> > > > > > > (struct
> > > > > > > >> bitpack_d *bp, tree expr) {
> > > > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > >> + /* Similar to TYPE_MODE, avoid streaming out
> > > > > > > >> + host-specific
> > > > > > > DECL_MODE
> > > > > > > >> + for aggregate type with offloading enabled, and while
> > > > > > > streaming-in
> > > > > > > >> + recompute appropriate DECL_MODE for accelerator. */
> > > > if
> > > > > > > >> + (lto_stream_offload_p
> > > > > > > >> + && (VAR_P (expr)
> > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@ -317,10
> > > > > > > >> +327,18
> > > > > > > @@
> > > > > > > >> pack_ts_function_decl_value_fields (struct bitpack_d *bp,
> > > > tree
> > > > > > > expr)
> > > > > > > >> static void pack_ts_type_common_value_fields (struct
> > > > bitpack_d
> > > > > > > >> *bp, tree expr) {
> > > > > > > >> + /* For offloading, avoid streaming out TYPE_MODE for
> > > > > aggregate
> > > > > > > type since
> > > > > > > >> + it may be host-specific. For eg, aarch64 uses OImode
> > > > for
> > > > > > > ARRAY_TYPE
> > > > > > > >> + whose size is 256-bits, which is not representable on
> > > > > > > accelerator.
> > > > > > > >> + Instead stream out VOIDmode, and while streaming-in,
> > > > > > > recompute
> > > > > > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using
> > > > > > > target_flags
> > > > > > > >> not necessary valid in a global context.
> > > > > > > >> Use the raw value previously set by layout_type. */
> > > > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > >> + else
> > > > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and need
> > > > > > > >> no streaming. */
> > > > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany
> > > > > GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-09-24 6:59 ` Richard Biener
@ 2024-10-01 14:56 ` Prathamesh Kulkarni
2024-10-07 20:51 ` Prathamesh Kulkarni
0 siblings, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-10-01 14:56 UTC (permalink / raw)
To: Richard Sandiford; +Cc: rguenther, Thomas Schwinge, gcc-patches
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, September 24, 2024 12:29 PM
> To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, September 9, 2024 7:24 PM
> > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > > To: Richard Biener <rguenther@suse.de>
> > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> streaming
> > > > > in for accelerator
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > streaming in
> > > > > for
> > > > > > accelerator
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>; Thomas
> > > > > Schwinge
> > > > > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > > streaming
> > > > > in
> > > > > > > > for accelerator
> > > > > > > >
> > > > > > > > External email: Use caution opening links or attachments
> > > > > > > >
> > > > > > > >
> > > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > > <richard.sandiford@arm.com>:
> > > > > > > > >
> > > > > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file
> COPYING3.
> > > > > > > > >> If
> > > > > > not
> > > > > > > > see
> > > > > > > > >> #include "debug.h"
> > > > > > > > >> #include "alloc-pool.h"
> > > > > > > > >> #include "toplev.h"
> > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > >>
> > > > > > > > >> /* Allocator used to hold string slot entries for
> line
> > > > > > > > >> map
> > > > > > > > streaming.
> > > > > > > > >> */ static struct object_allocator<struct string_slot>
> > > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > > lto_read_tree_1
> > > > > > > > (class lto_input_block *ib, class data_in *data_in, tree
> > > > > > > > expr)
> > > > > > > > >> with -g1, see for example PR113488. */
> > > > > > > > >> else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN
> > > > > > > > >> (expr)
> > > > > ==
> > > > > > > > expr)
> > > > > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > > >> +
> > > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > > >> + /* For decl with aggregate type, host streams
> > > > > > > > >> +out
> > > > > > VOIDmode.
> > > > > > > > >> + Compute the correct DECL_MODE by calling
> > > relayout_decl.
> > > > > > */
> > > > > > > > >> + if ((VAR_P (expr)
> > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > > > > >> + relayout_decl (expr); #endif
> > > > > > > > >
> > > > > > > > > Genuine question, but: is relayout_decl safe in this
> > > context?
> > > > > > It
> > > > > > > > does
> > > > > > > > > a lot more than just reset the mode. It also applies
> > > > > > > > > the
> > > > > target
> > > > > > > > ABI's
> > > > > > > > > preferences wrt alignment, padding, and so on, rather
> > > > > > > > > than
> > > > > > > > preserving
> > > > > > > > > those of the host's.
> > > > > > > >
> > > > > > > > It would be better to just recompute the mode here.
> > > > > > > Hi,
> > > > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE
> > > > > > > (TREE_TYPE
> > > > > > (expr)) in lto_read_tree_1 instead of calling relayout_decl
> > > (expr).
> > > > > > > I checked layout_decl_type does the same thing for setting
> > > > > > > decl
> > > > > > mode,
> > > > > > > except for bit fields. Since bit-fields cannot have
> > > > > > > aggregate
> > > > > type,
> > > > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE
> (TREE_TYPE
> > > > > (expr))
> > > > > > would be OK in this case ?
> > > > > >
> > > > > > Yep, that should work.
> > > > > Thanks, I have committed the patch in:
> > > > >
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16
> > > > > b182
> > > > > 87
> > > > > 1e105f47823b8e72
> > > > Hi,
> > > > This also results in same failure (using OImode) for vector of
> > > > 256-bit type, which was triggered for firstprivate-mappings-1.c.
> > > > Can be reproduced with following simple test-case:
> > > >
> > > > typedef long v4di __attribute__((vector_size (sizeof (long) *
> > > > 4))); int main() {
> > > > v4di x;
> > > > #pragma acc parallel copy(x)
> > > > x;
> > > > return 0;
> > > > }
> > > >
> > > > Compiling with -fopenacc -foffload=nvptx-none:
> > > > lto1: fatal error: nvptx-none - 256-bit integer numbers
> > > > unsupported (mode ‘OI’) compilation terminated.
> > > > nvptx mkoffload: fatal error:
> > > > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> > > > returned
> > > 1 exit status compilation terminated.
> > > >
> > > > The attached patch fixes the test with same approach as for
> > > > aggregate type -- streaming out VOIDmode from host, and
> > > > recomputing mode for
> > > vector_type during stream-in for accelerator.
> > > > LTO bootstrap+tested on aarch64-linux-gnu.
> > > > Does the patch look OK ?
> > >
> > > @@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block
> *ib,
> > > class data_in *data_in, tree expr)
> > > if ((VAR_P (expr)
> > > || TREE_CODE (expr) == PARM_DECL
> > > || TREE_CODE (expr) == FIELD_DECL)
> > > - && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > + && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) || VECTOR_TYPE_P
> > > (TREE_TYPE (expr)))
> > >
> > > long line, please wrap.
> > >
> > > && DECL_MODE (expr) == VOIDmode)
> > > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
> #endif
> > > }
> > >
> > > I'm not sure you can call TYPE_MODE aka vector_type_mode safely
> > > during LTO streaming. Instead you possibly want to use
> TYPE_MODE_RAW here?
> > >
> > > +#ifdef ACCEL_COMPILER
> > > + if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > + {
> > > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
> > > + tree innertype = TREE_TYPE (expr);
> > > + machine_mode vmode
> > > + = mode_for_vector (SCALAR_TYPE_MODE (innertype),
> > > nunits).else_blk ();
> > > + SET_TYPE_MODE (expr, vmode);
> > >
> > > I'm not sure this unambiguously specifies the mode, does it? (x2
> > > modes, etc.).
> > >
> > > Richard?
> > >
> > >
> > > > If we go with this approach, would it be safe to remove the
> > > > following hunk from lto_input_mode_table, since vector modes
> would
> > > > no longer be
> > > streamed out in LTO bytecode ?
> > >
> > > I would guess you want to put an assert on the query side then?
> > Hi Richard,
> > Thanks for the review and sorry for late reply.
> > The attached patch uses TYPE_MODE_RAW for vector_type, and removes
> > vector handling in lto_input_mode_table.
> >
> > Should I also need to add an assert for !VECTOR_MODE_P in
> > bp_unpack_machine_mode (if we're in accel) or the check in
> > lto_input_mode_table should be sufficient ?
> >
> > The patch moves the following hunk in lto_read_tree_1:
> >
> > #ifdef ACCEL_COMPILER
> > if ((VAR_P (expr)
> > || TREE_CODE (expr) == PARM_DECL
> > || TREE_CODE (expr) == FIELD_DECL)
> > && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > && DECL_MODE (expr) == VOIDmode)
> > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr))); #endif
> >
> > outside the following condition:
> > if ((DECL_P (expr)
> > && TREE_CODE (expr) != FIELD_DECL
> > && TREE_CODE (expr) != DEBUG_EXPR_DECL
> > && TREE_CODE (expr) != TYPE_DECL)
> >
> > since the condition doesn't allow FIELD_DECL and thus would not set
> > mode for FIELD_DECL.
> >
> > I am not sure how to infer vector mode from scalar_type and length,
> if
> > we can't use mode_for_vector here. Could you please suggest how to
> proceed ?
>
> I have no good idea besides indeed using mode_for_vector as
> layout_type does.
>
> So OK unless Richard S. has anything to add.
Hi Richard S.,
Does the patch in:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663670.html
look OK to you ? Would using mode_for_vector be correct to recompute vector mode while streaming-in during accel,
as done in the patch ?
Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
> > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> >
> > Thanks
> > Prathamesh
> > >
> > > > case MODE_VECTOR_BOOL:
> > > > case MODE_VECTOR_INT:
> > > > case MODE_VECTOR_FLOAT:
> > > > case MODE_VECTOR_FRACT:
> > > > case MODE_VECTOR_UFRACT:
> > > > case MODE_VECTOR_ACCUM:
> > > > case MODE_VECTOR_UACCUM:
> > > > /* For unsupported vector modes just use BLKmode,
> > > > if the scalar mode is supported. */
> > > > if (table[(int) inner] != VOIDmode)
> > > > {
> > > > table[m] = BLKmode;
> > > > break;
> > > > }
> > > >
> > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > after verifying it passes bootstrap+test on aarch64-linux-gnu,
> > > > > and libgomp testing (without GPU) for aarch64->nvptx and
> x86_64->nvptx.
> > > > > >
> > > > > > > Sorry if this sounds like a silly ques -- Why would it be
> > > > > > > unsafe
> > > > > to
> > > > > > > call relayout_decl for variables that are mapped to
> > > > > > > accelerator
> > > > > even
> > > > > > > if it'd not preserve host's properties ? I assumed we want
> > > > > > > to
> > > > > assign
> > > > > > accel's ABI properties for mapped decls (mode being one of
> > > > > > them), or am I misunderstanding ?
> > > > > >
> > > > > > Structure layout need not be compatible but we are
> preserving
> > > > > > that
> > > > > of
> > > > > > the host instead of re-layouting in target context.
> Likewise
> > > > > > type
> > > > > <->
> > > > > > mode mapping doesn't have to agree.
> > > > > Ah OK, thanks for clarifying. So IIUC, in future, we might
> need
> > > > > to change that if (in theory), host's structure layout for a
> > > > > decl is incompatible with a particular accel's ABI and will
> need
> > > > > to relayout in accel's context ?
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Signed-off-by: Prathamesh Kulkarni
> <prathameshk@nvidia.com>
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Richard
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> }
> > > > > > > > >> }
> > > > > > > > >>
> > > > > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
> > > > > > > > >> index 10c0809914c..0ff8bd1171e 100644
> > > > > > > > >> --- a/gcc/stor-layout.cc
> > > > > > > > >> +++ b/gcc/stor-layout.cc
> > > > > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree
> > > > > > > > >> type,
> > > > > const
> > > > > > > > >> char
> > > > > > > > *name, tree fields,
> > > > > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > > > > >>
> > > > > > > > >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE).
> > > > > > > > >> +*/
> > > > > > > > >> +
> > > > > > > > >> +void compute_array_mode (tree type) {
> > > > > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > > > > >> +
> > > > > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE
> (type)
> > > > > > > > >> + !=
> > > 0
> > > > > > > > >> + && ! targetm.member_type_forces_blk (type,
> VOIDmode)
> > > > > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > > > > >> + else extract/store fields may lose. */
> > > > > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > >> + {
> > > > > > > > >> + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE
> > > (type),
> > > > > > > > >> + TYPE_SIZE (type)));
> > > > > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > BIGGEST_ALIGNMENT
> > > > > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > > > > > > >> + (TYPE_MODE
> > > > > > > > (type)))
> > > > > > > > >> + {
> > > > > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > > > > >> + }
> > > > > > > > >> + }
> > > > > > > > >> +}
> > > > > > > > >> +
> > > > > > > > >> /* Calculate the mode, size, and alignment for TYPE.
> > > > > > > > >> For an array type, calculate the element
> separation
> > > > > > > > >> as
> > > > > well.
> > > > > > > > >> Record TYPE on the chain of permanent or temporary
> > > > > > > > >> types
> > > > > @@
> > > > > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > > > > >> - && ! targetm.member_type_forces_blk (type,
> > > VOIDmode)
> > > > > > > > >> - /* BLKmode elements force BLKmode aggregate;
> > > > > > > > >> - else extract/store fields may lose. */
> > > > > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > >> - {
> > > > > > > > >> - SET_TYPE_MODE (type, mode_for_array
> (TREE_TYPE
> > > > > (type),
> > > > > > > > >> - TYPE_SIZE (type)));
> > > > > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > > BIGGEST_ALIGNMENT
> > > > > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > (TYPE_MODE
> > > > > > > > (type)))
> > > > > > > > >> - {
> > > > > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > >> - }
> > > > > > > > >> - }
> > > > > > > > >> + compute_array_mode (type);
> > > > > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > > > > >> TYPE_TYPELESS_STORAGE (type) =
> > > > > > > > >> TYPE_TYPELESS_STORAGE
> > > > > > > > (element);
> > > > > > > > >> /* When the element size is constant, check that
> it
> > > > > > > > >> is at least
> > > > > > > > as
> > > > > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
> > > > > > > > >> index
> > > > > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > > > > >> --- a/gcc/stor-layout.h
> > > > > > > > >> +++ b/gcc/stor-layout.h
> > > > > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > > > > >> (record_layout_info); extern void normalize_rli
> > > > > > > > >> (record_layout_info); extern void place_field
> > > > > > > > >> (record_layout_info, tree); extern void
> > > > > > > > >> compute_record_mode (tree);
> > > > > > > > >> +extern void compute_array_mode (tree);
> > > > > > > > >> extern void finish_bitfield_layout (tree); extern
> void
> > > > > > > > >> finish_record_layout (record_layout_info, int);
> extern
> > > > > > > > >> void finalize_size_functions (void); diff --git
> > > > > > > > >> a/gcc/tree-
> > > > > streamer-
> > > > > > > > in.cc
> > > > > > > > >> b/gcc/tree-streamer-in.cc index
> > > > > > > > >> 40029437199..329d218e7d4
> > > > > 100644
> > > > > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file
> COPYING3.
> > > > > > > > >> If
> > > > > > not
> > > > > > > > see
> > > > > > > > >> #include "attribs.h"
> > > > > > > > >> #include "asan.h"
> > > > > > > > >> #include "opts.h"
> > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> /* Read a STRING_CST from the string table in DATA_IN
> > > > > > > > >> using
> > > > > > input
> > > > > > > > @@
> > > > > > > > >> -395,6 +396,17 @@ unpack_ts_type_common_value_fields
> > > > > > > > >> (struct bitpack_d *bp, tree expr) #ifdef
> ACCEL_COMPILER
> > > > > > > > >> if (TYPE_ALIGN (expr) >
> > > targetm.absolute_biggest_alignment)
> > > > > > > > >> SET_TYPE_ALIGN (expr,
> > > > > targetm.absolute_biggest_alignment);
> > > > > > > > >> +
> > > > > > > > >> + /* Host streams out VOIDmode for aggregate type.
> */
> > > > > > > > >> + if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr) ==
> VOIDmode)
> > > > > > > > >> + {
> > > > > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > > > > >> + compute_array_mode (expr);
> > > > > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > > > > >> + compute_record_mode (expr);
> > > > > > > > >> + else
> > > > > > > > >> + gcc_unreachable ();
> > > > > > > > >> + }
> > > > > > > > >> #endif
> > > > > > > > >> }
> > > > > > > > >>
> > > > > > > > >> diff --git a/gcc/tree-streamer-out.cc
> > > > > > > > >> b/gcc/tree-streamer-
> > > > > > out.cc
> > > > > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields
> > > > > > > > >> (struct
> > > > > > > > bitpack_d
> > > > > > > > >> *bp, tree expr) static void
> > > > > > > > >> pack_ts_decl_common_value_fields
> > > > > > > > (struct
> > > > > > > > >> bitpack_d *bp, tree expr) {
> > > > > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > >> + /* Similar to TYPE_MODE, avoid streaming out
> > > > > > > > >> + host-specific
> > > > > > > > DECL_MODE
> > > > > > > > >> + for aggregate type with offloading enabled, and
> > > > > > > > >> + while
> > > > > > > > streaming-in
> > > > > > > > >> + recompute appropriate DECL_MODE for
> accelerator.
> > > > > > > > >> + */
> > > > > if
> > > > > > > > >> + (lto_stream_offload_p
> > > > > > > > >> + && (VAR_P (expr)
> > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@
> > > > > > > > >> -317,10
> > > > > > > > >> +327,18
> > > > > > > > @@
> > > > > > > > >> pack_ts_function_decl_value_fields (struct bitpack_d
> > > > > > > > >> *bp,
> > > > > tree
> > > > > > > > expr)
> > > > > > > > >> static void pack_ts_type_common_value_fields (struct
> > > > > bitpack_d
> > > > > > > > >> *bp, tree expr) {
> > > > > > > > >> + /* For offloading, avoid streaming out TYPE_MODE
> for
> > > > > > aggregate
> > > > > > > > type since
> > > > > > > > >> + it may be host-specific. For eg, aarch64 uses
> > > > > > > > >> + OImode
> > > > > for
> > > > > > > > ARRAY_TYPE
> > > > > > > > >> + whose size is 256-bits, which is not
> > > > > > > > >> + representable on
> > > > > > > > accelerator.
> > > > > > > > >> + Instead stream out VOIDmode, and while
> > > > > > > > >> + streaming-in,
> > > > > > > > recompute
> > > > > > > > >> + appropriate TYPE_MODE for accelerator. */ if
> > > > > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode
> > > > > > > > >> using
> > > > > > > > target_flags
> > > > > > > > >> not necessary valid in a global context.
> > > > > > > > >> Use the raw value previously set by layout_type.
> > > > > > > > >> */
> > > > > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > > >> + else
> > > > > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout and
> need
> > > > > > > > >> no streaming. */
> > > > > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > AG
> > > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-10-01 14:56 ` Prathamesh Kulkarni
@ 2024-10-07 20:51 ` Prathamesh Kulkarni
2024-10-08 6:35 ` Richard Biener
0 siblings, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2024-10-07 20:51 UTC (permalink / raw)
To: Prathamesh Kulkarni, Richard Sandiford
Cc: rguenther, Thomas Schwinge, gcc-patches
> -----Original Message-----
> From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Sent: Tuesday, October 1, 2024 8:26 PM
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: rguenther@suse.de; Thomas Schwinge <tschwinge@baylibre.com>; gcc-
> patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
>
> External email: Use caution opening links or attachments
>
>
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, September 24, 2024 12:29 PM
> > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:
> >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, September 9, 2024 7:24 PM
> > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> Schwinge
> > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming
> in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > > > To: Richard Biener <rguenther@suse.de>
> > > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > streaming
> > > > > > in for accelerator
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > streaming in
> > > > > > for
> > > > > > > accelerator
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>;
> Thomas
> > > > > > Schwinge
> > > > > > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > > > streaming
> > > > > > in
> > > > > > > > > for accelerator
> > > > > > > > >
> > > > > > > > > External email: Use caution opening links or
> attachments
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > > > <richard.sandiford@arm.com>:
> > > > > > > > > >
> > > > > > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file
> > COPYING3.
> > > > > > > > > >> If
> > > > > > > not
> > > > > > > > > see
> > > > > > > > > >> #include "debug.h"
> > > > > > > > > >> #include "alloc-pool.h"
> > > > > > > > > >> #include "toplev.h"
> > > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > > >>
> > > > > > > > > >> /* Allocator used to hold string slot entries for
> > line
> > > > > > > > > >> map
> > > > > > > > > streaming.
> > > > > > > > > >> */ static struct object_allocator<struct
> string_slot>
> > > > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > > > lto_read_tree_1
> > > > > > > > > (class lto_input_block *ib, class data_in *data_in,
> tree
> > > > > > > > > expr)
> > > > > > > > > >> with -g1, see for example PR113488. */
> > > > > > > > > >> else if (DECL_P (expr) &&
> DECL_ABSTRACT_ORIGIN
> > > > > > > > > >> (expr)
> > > > > > ==
> > > > > > > > > expr)
> > > > > > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > > > >> +
> > > > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > > > >> + /* For decl with aggregate type, host
> streams
> > > > > > > > > >> +out
> > > > > > > VOIDmode.
> > > > > > > > > >> + Compute the correct DECL_MODE by calling
> > > > relayout_decl.
> > > > > > > */
> > > > > > > > > >> + if ((VAR_P (expr)
> > > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > > > > > >> + relayout_decl (expr); #endif
> > > > > > > > > >
> > > > > > > > > > Genuine question, but: is relayout_decl safe in this
> > > > context?
> > > > > > > It
> > > > > > > > > does
> > > > > > > > > > a lot more than just reset the mode. It also
> applies
> > > > > > > > > > the
> > > > > > target
> > > > > > > > > ABI's
> > > > > > > > > > preferences wrt alignment, padding, and so on,
> rather
> > > > > > > > > > than
> > > > > > > > > preserving
> > > > > > > > > > those of the host's.
> > > > > > > > >
> > > > > > > > > It would be better to just recompute the mode here.
> > > > > > > > Hi,
> > > > > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE
> > > > > > > > (TREE_TYPE
> > > > > > > (expr)) in lto_read_tree_1 instead of calling
> relayout_decl
> > > > (expr).
> > > > > > > > I checked layout_decl_type does the same thing for
> setting
> > > > > > > > decl
> > > > > > > mode,
> > > > > > > > except for bit fields. Since bit-fields cannot have
> > > > > > > > aggregate
> > > > > > type,
> > > > > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE
> > (TREE_TYPE
> > > > > > (expr))
> > > > > > > would be OK in this case ?
> > > > > > >
> > > > > > > Yep, that should work.
> > > > > > Thanks, I have committed the patch in:
> > > > > >
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16
> > > > > > b182
> > > > > > 87
> > > > > > 1e105f47823b8e72
> > > > > Hi,
> > > > > This also results in same failure (using OImode) for vector of
> > > > > 256-bit type, which was triggered for firstprivate-mappings-
> 1.c.
> > > > > Can be reproduced with following simple test-case:
> > > > >
> > > > > typedef long v4di __attribute__((vector_size (sizeof (long) *
> > > > > 4))); int main() {
> > > > > v4di x;
> > > > > #pragma acc parallel copy(x)
> > > > > x;
> > > > > return 0;
> > > > > }
> > > > >
> > > > > Compiling with -fopenacc -foffload=nvptx-none:
> > > > > lto1: fatal error: nvptx-none - 256-bit integer numbers
> > > > > unsupported (mode ‘OI’) compilation terminated.
> > > > > nvptx mkoffload: fatal error:
> > > > > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> > > > > returned
> > > > 1 exit status compilation terminated.
> > > > >
> > > > > The attached patch fixes the test with same approach as for
> > > > > aggregate type -- streaming out VOIDmode from host, and
> > > > > recomputing mode for
> > > > vector_type during stream-in for accelerator.
> > > > > LTO bootstrap+tested on aarch64-linux-gnu.
> > > > > Does the patch look OK ?
> > > >
> > > > @@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block
> > *ib,
> > > > class data_in *data_in, tree expr)
> > > > if ((VAR_P (expr)
> > > > || TREE_CODE (expr) == PARM_DECL
> > > > || TREE_CODE (expr) == FIELD_DECL)
> > > > - && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > + && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) ||
> VECTOR_TYPE_P
> > > > (TREE_TYPE (expr)))
> > > >
> > > > long line, please wrap.
> > > >
> > > > && DECL_MODE (expr) == VOIDmode)
> > > > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
> > #endif
> > > > }
> > > >
> > > > I'm not sure you can call TYPE_MODE aka vector_type_mode safely
> > > > during LTO streaming. Instead you possibly want to use
> > TYPE_MODE_RAW here?
> > > >
> > > > +#ifdef ACCEL_COMPILER
> > > > + if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > + {
> > > > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
> > > > + tree innertype = TREE_TYPE (expr);
> > > > + machine_mode vmode
> > > > + = mode_for_vector (SCALAR_TYPE_MODE (innertype),
> > > > nunits).else_blk ();
> > > > + SET_TYPE_MODE (expr, vmode);
> > > >
> > > > I'm not sure this unambiguously specifies the mode, does it? (x2
> > > > modes, etc.).
> > > >
> > > > Richard?
> > > >
> > > >
> > > > > If we go with this approach, would it be safe to remove the
> > > > > following hunk from lto_input_mode_table, since vector modes
> > would
> > > > > no longer be
> > > > streamed out in LTO bytecode ?
> > > >
> > > > I would guess you want to put an assert on the query side then?
> > > Hi Richard,
> > > Thanks for the review and sorry for late reply.
> > > The attached patch uses TYPE_MODE_RAW for vector_type, and removes
> > > vector handling in lto_input_mode_table.
> > >
> > > Should I also need to add an assert for !VECTOR_MODE_P in
> > > bp_unpack_machine_mode (if we're in accel) or the check in
> > > lto_input_mode_table should be sufficient ?
> > >
> > > The patch moves the following hunk in lto_read_tree_1:
> > >
> > > #ifdef ACCEL_COMPILER
> > > if ((VAR_P (expr)
> > > || TREE_CODE (expr) == PARM_DECL
> > > || TREE_CODE (expr) == FIELD_DECL)
> > > && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > && DECL_MODE (expr) == VOIDmode)
> > > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr))); #endif
> > >
> > > outside the following condition:
> > > if ((DECL_P (expr)
> > > && TREE_CODE (expr) != FIELD_DECL
> > > && TREE_CODE (expr) != DEBUG_EXPR_DECL
> > > && TREE_CODE (expr) != TYPE_DECL)
> > >
> > > since the condition doesn't allow FIELD_DECL and thus would not
> set
> > > mode for FIELD_DECL.
> > >
> > > I am not sure how to infer vector mode from scalar_type and
> length,
> > if
> > > we can't use mode_for_vector here. Could you please suggest how to
> > proceed ?
> >
> > I have no good idea besides indeed using mode_for_vector as
> > layout_type does.
> >
> > So OK unless Richard S. has anything to add.
> Hi Richard S.,
> Does the patch in:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663670.html
>
> look OK to you ? Would using mode_for_vector be correct to recompute
> vector mode while streaming-in during accel, as done in the patch ?
Hi, ping ?
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > >
> > > Thanks
> > > Prathamesh
> > > >
> > > > > case MODE_VECTOR_BOOL:
> > > > > case MODE_VECTOR_INT:
> > > > > case MODE_VECTOR_FLOAT:
> > > > > case MODE_VECTOR_FRACT:
> > > > > case MODE_VECTOR_UFRACT:
> > > > > case MODE_VECTOR_ACCUM:
> > > > > case MODE_VECTOR_UACCUM:
> > > > > /* For unsupported vector modes just use
> BLKmode,
> > > > > if the scalar mode is supported. */
> > > > > if (table[(int) inner] != VOIDmode)
> > > > > {
> > > > > table[m] = BLKmode;
> > > > > break;
> > > > > }
> > > > >
> > > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > after verifying it passes bootstrap+test on aarch64-linux-
> gnu,
> > > > > > and libgomp testing (without GPU) for aarch64->nvptx and
> > x86_64->nvptx.
> > > > > > >
> > > > > > > > Sorry if this sounds like a silly ques -- Why would it
> be
> > > > > > > > unsafe
> > > > > > to
> > > > > > > > call relayout_decl for variables that are mapped to
> > > > > > > > accelerator
> > > > > > even
> > > > > > > > if it'd not preserve host's properties ? I assumed we
> want
> > > > > > > > to
> > > > > > assign
> > > > > > > accel's ABI properties for mapped decls (mode being one of
> > > > > > > them), or am I misunderstanding ?
> > > > > > >
> > > > > > > Structure layout need not be compatible but we are
> > preserving
> > > > > > > that
> > > > > > of
> > > > > > > the host instead of re-layouting in target context.
> > Likewise
> > > > > > > type
> > > > > > <->
> > > > > > > mode mapping doesn't have to agree.
> > > > > > Ah OK, thanks for clarifying. So IIUC, in future, we might
> > need
> > > > > > to change that if (in theory), host's structure layout for a
> > > > > > decl is incompatible with a particular accel's ABI and will
> > need
> > > > > > to relayout in accel's context ?
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Signed-off-by: Prathamesh Kulkarni
> > <prathameshk@nvidia.com>
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh
> > > > > > > > >
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> }
> > > > > > > > > >> }
> > > > > > > > > >>
> > > > > > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-
> layout.cc
> > > > > > > > > >> index 10c0809914c..0ff8bd1171e 100644
> > > > > > > > > >> --- a/gcc/stor-layout.cc
> > > > > > > > > >> +++ b/gcc/stor-layout.cc
> > > > > > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree
> > > > > > > > > >> type,
> > > > > > const
> > > > > > > > > >> char
> > > > > > > > > *name, tree fields,
> > > > > > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > > > > > >>
> > > > > > > > > >> +/* Compute TYPE_MODE for TYPE (which is
> ARRAY_TYPE).
> > > > > > > > > >> +*/
> > > > > > > > > >> +
> > > > > > > > > >> +void compute_array_mode (tree type) {
> > > > > > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > > > > > >> +
> > > > > > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE
> > (type)
> > > > > > > > > >> + !=
> > > > 0
> > > > > > > > > >> + && ! targetm.member_type_forces_blk (type,
> > VOIDmode)
> > > > > > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > > > > > >> + else extract/store fields may lose. */
> > > > > > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > > >> + {
> > > > > > > > > >> + SET_TYPE_MODE (type, mode_for_array
> (TREE_TYPE
> > > > (type),
> > > > > > > > > >> + TYPE_SIZE (type)));
> > > > > > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > BIGGEST_ALIGNMENT
> > > > > > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > > > > > > > >> + (TYPE_MODE
> > > > > > > > > (type)))
> > > > > > > > > >> + {
> > > > > > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > >> + }
> > > > > > > > > >> + }
> > > > > > > > > >> +}
> > > > > > > > > >> +
> > > > > > > > > >> /* Calculate the mode, size, and alignment for
> TYPE.
> > > > > > > > > >> For an array type, calculate the element
> > separation
> > > > > > > > > >> as
> > > > > > well.
> > > > > > > > > >> Record TYPE on the chain of permanent or
> temporary
> > > > > > > > > >> types
> > > > > > @@
> > > > > > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > > > > > >> - && ! targetm.member_type_forces_blk (type,
> > > > VOIDmode)
> > > > > > > > > >> - /* BLKmode elements force BLKmode
> aggregate;
> > > > > > > > > >> - else extract/store fields may lose. */
> > > > > > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) !=
> BLKmode
> > > > > > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > > >> - {
> > > > > > > > > >> - SET_TYPE_MODE (type, mode_for_array
> > (TREE_TYPE
> > > > > > (type),
> > > > > > > > > >> - TYPE_SIZE (type)));
> > > > > > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > > > BIGGEST_ALIGNMENT
> > > > > > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > > (TYPE_MODE
> > > > > > > > > (type)))
> > > > > > > > > >> - {
> > > > > > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > >> - }
> > > > > > > > > >> - }
> > > > > > > > > >> + compute_array_mode (type);
> > > > > > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > > > > > >> TYPE_TYPELESS_STORAGE (type) =
> > > > > > > > > >> TYPE_TYPELESS_STORAGE
> > > > > > > > > (element);
> > > > > > > > > >> /* When the element size is constant, check that
> > it
> > > > > > > > > >> is at least
> > > > > > > > > as
> > > > > > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
> > > > > > > > > >> index
> > > > > > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > > > > > >> --- a/gcc/stor-layout.h
> > > > > > > > > >> +++ b/gcc/stor-layout.h
> > > > > > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > > > > > >> (record_layout_info); extern void normalize_rli
> > > > > > > > > >> (record_layout_info); extern void place_field
> > > > > > > > > >> (record_layout_info, tree); extern void
> > > > > > > > > >> compute_record_mode (tree);
> > > > > > > > > >> +extern void compute_array_mode (tree);
> > > > > > > > > >> extern void finish_bitfield_layout (tree); extern
> > void
> > > > > > > > > >> finish_record_layout (record_layout_info, int);
> > extern
> > > > > > > > > >> void finalize_size_functions (void); diff --git
> > > > > > > > > >> a/gcc/tree-
> > > > > > streamer-
> > > > > > > > > in.cc
> > > > > > > > > >> b/gcc/tree-streamer-in.cc index
> > > > > > > > > >> 40029437199..329d218e7d4
> > > > > > 100644
> > > > > > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file
> > COPYING3.
> > > > > > > > > >> If
> > > > > > > not
> > > > > > > > > see
> > > > > > > > > >> #include "attribs.h"
> > > > > > > > > >> #include "asan.h"
> > > > > > > > > >> #include "opts.h"
> > > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> /* Read a STRING_CST from the string table in
> DATA_IN
> > > > > > > > > >> using
> > > > > > > input
> > > > > > > > > @@
> > > > > > > > > >> -395,6 +396,17 @@
> unpack_ts_type_common_value_fields
> > > > > > > > > >> (struct bitpack_d *bp, tree expr) #ifdef
> > ACCEL_COMPILER
> > > > > > > > > >> if (TYPE_ALIGN (expr) >
> > > > targetm.absolute_biggest_alignment)
> > > > > > > > > >> SET_TYPE_ALIGN (expr,
> > > > > > targetm.absolute_biggest_alignment);
> > > > > > > > > >> +
> > > > > > > > > >> + /* Host streams out VOIDmode for aggregate type.
> > */
> > > > > > > > > >> + if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr)
> ==
> > VOIDmode)
> > > > > > > > > >> + {
> > > > > > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > > > > > >> + compute_array_mode (expr);
> > > > > > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > > > > > >> + compute_record_mode (expr);
> > > > > > > > > >> + else
> > > > > > > > > >> + gcc_unreachable ();
> > > > > > > > > >> + }
> > > > > > > > > >> #endif
> > > > > > > > > >> }
> > > > > > > > > >>
> > > > > > > > > >> diff --git a/gcc/tree-streamer-out.cc
> > > > > > > > > >> b/gcc/tree-streamer-
> > > > > > > out.cc
> > > > > > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields
> > > > > > > > > >> (struct
> > > > > > > > > bitpack_d
> > > > > > > > > >> *bp, tree expr) static void
> > > > > > > > > >> pack_ts_decl_common_value_fields
> > > > > > > > > (struct
> > > > > > > > > >> bitpack_d *bp, tree expr) {
> > > > > > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > > >> + /* Similar to TYPE_MODE, avoid streaming out
> > > > > > > > > >> + host-specific
> > > > > > > > > DECL_MODE
> > > > > > > > > >> + for aggregate type with offloading enabled,
> and
> > > > > > > > > >> + while
> > > > > > > > > streaming-in
> > > > > > > > > >> + recompute appropriate DECL_MODE for
> > accelerator.
> > > > > > > > > >> + */
> > > > > > if
> > > > > > > > > >> + (lto_stream_offload_p
> > > > > > > > > >> + && (VAR_P (expr)
> > > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@
> > > > > > > > > >> -317,10
> > > > > > > > > >> +327,18
> > > > > > > > > @@
> > > > > > > > > >> pack_ts_function_decl_value_fields (struct
> bitpack_d
> > > > > > > > > >> *bp,
> > > > > > tree
> > > > > > > > > expr)
> > > > > > > > > >> static void pack_ts_type_common_value_fields
> (struct
> > > > > > bitpack_d
> > > > > > > > > >> *bp, tree expr) {
> > > > > > > > > >> + /* For offloading, avoid streaming out TYPE_MODE
> > for
> > > > > > > aggregate
> > > > > > > > > type since
> > > > > > > > > >> + it may be host-specific. For eg, aarch64 uses
> > > > > > > > > >> + OImode
> > > > > > for
> > > > > > > > > ARRAY_TYPE
> > > > > > > > > >> + whose size is 256-bits, which is not
> > > > > > > > > >> + representable on
> > > > > > > > > accelerator.
> > > > > > > > > >> + Instead stream out VOIDmode, and while
> > > > > > > > > >> + streaming-in,
> > > > > > > > > recompute
> > > > > > > > > >> + appropriate TYPE_MODE for accelerator. */
> if
> > > > > > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the
> mode
> > > > > > > > > >> using
> > > > > > > > > target_flags
> > > > > > > > > >> not necessary valid in a global context.
> > > > > > > > > >> Use the raw value previously set by
> layout_type.
> > > > > > > > > >> */
> > > > > > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > > > >> + else
> > > > > > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW
> (expr));
> > > > > > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout
> and
> > need
> > > > > > > > > >> no streaming. */
> > > > > > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> Germany
> > > > GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator
2024-10-07 20:51 ` Prathamesh Kulkarni
@ 2024-10-08 6:35 ` Richard Biener
0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2024-10-08 6:35 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Richard Sandiford, Thomas Schwinge, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 26846 bytes --]
On Mon, 7 Oct 2024, Prathamesh Kulkarni wrote:
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > Sent: Tuesday, October 1, 2024 8:26 PM
> > To: Richard Sandiford <richard.sandiford@arm.com>
> > Cc: rguenther@suse.de; Thomas Schwinge <tschwinge@baylibre.com>; gcc-
> > patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Tuesday, September 24, 2024 12:29 PM
> > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas Schwinge
> > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > for
> > > accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Tue, 24 Sep 2024, Prathamesh Kulkarni wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Monday, September 9, 2024 7:24 PM
> > > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > Schwinge
> > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming
> > in
> > > > > for accelerator
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > > > Sent: Thursday, August 22, 2024 7:41 PM
> > > > > > > To: Richard Biener <rguenther@suse.de>
> > > > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > > streaming
> > > > > > > in for accelerator
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Wednesday, August 21, 2024 5:09 PM
> > > > > > > > To: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > > > > Cc: Richard Sandiford <richard.sandiford@arm.com>; Thomas
> > > > > > > > Schwinge <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > > streaming in
> > > > > > > for
> > > > > > > > accelerator
> > > > > > > >
> > > > > > > > External email: Use caution opening links or attachments
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > > > > > > > To: Richard Sandiford <richard.sandiford@arm.com>
> > > > > > > > > > Cc: Prathamesh Kulkarni <prathameshk@nvidia.com>;
> > Thomas
> > > > > > > Schwinge
> > > > > > > > > > <tschwinge@baylibre.com>; gcc-patches@gcc.gnu.org
> > > > > > > > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while
> > > > > > > > > > streaming
> > > > > > > in
> > > > > > > > > > for accelerator
> > > > > > > > > >
> > > > > > > > > > External email: Use caution opening links or
> > attachments
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > > > > > > > <richard.sandiford@arm.com>:
> > > > > > > > > > >
> > > > > > > > > > > Prathamesh Kulkarni <prathameshk@nvidia.com> writes:
> > > > > > > > > > >> diff --git a/gcc/lto-streamer-in.cc
> > > > > > > > > > >> b/gcc/lto-streamer-in.cc index
> > > > > > > > > > >> cbf6041fd68..0420183faf8 100644
> > > > > > > > > > >> --- a/gcc/lto-streamer-in.cc
> > > > > > > > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > > > > > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file
> > > COPYING3.
> > > > > > > > > > >> If
> > > > > > > > not
> > > > > > > > > > see
> > > > > > > > > > >> #include "debug.h"
> > > > > > > > > > >> #include "alloc-pool.h"
> > > > > > > > > > >> #include "toplev.h"
> > > > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > > > >>
> > > > > > > > > > >> /* Allocator used to hold string slot entries for
> > > line
> > > > > > > > > > >> map
> > > > > > > > > > streaming.
> > > > > > > > > > >> */ static struct object_allocator<struct
> > string_slot>
> > > > > > > > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> > > > > > > lto_read_tree_1
> > > > > > > > > > (class lto_input_block *ib, class data_in *data_in,
> > tree
> > > > > > > > > > expr)
> > > > > > > > > > >> with -g1, see for example PR113488. */
> > > > > > > > > > >> else if (DECL_P (expr) &&
> > DECL_ABSTRACT_ORIGIN
> > > > > > > > > > >> (expr)
> > > > > > > ==
> > > > > > > > > > expr)
> > > > > > > > > > >> DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > > > > > > > >> +
> > > > > > > > > > >> +#ifdef ACCEL_COMPILER
> > > > > > > > > > >> + /* For decl with aggregate type, host
> > streams
> > > > > > > > > > >> +out
> > > > > > > > VOIDmode.
> > > > > > > > > > >> + Compute the correct DECL_MODE by calling
> > > > > relayout_decl.
> > > > > > > > */
> > > > > > > > > > >> + if ((VAR_P (expr)
> > > > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > > > > > > >> + && DECL_MODE (expr) == VOIDmode)
> > > > > > > > > > >> + relayout_decl (expr); #endif
> > > > > > > > > > >
> > > > > > > > > > > Genuine question, but: is relayout_decl safe in this
> > > > > context?
> > > > > > > > It
> > > > > > > > > > does
> > > > > > > > > > > a lot more than just reset the mode. It also
> > applies
> > > > > > > > > > > the
> > > > > > > target
> > > > > > > > > > ABI's
> > > > > > > > > > > preferences wrt alignment, padding, and so on,
> > rather
> > > > > > > > > > > than
> > > > > > > > > > preserving
> > > > > > > > > > > those of the host's.
> > > > > > > > > >
> > > > > > > > > > It would be better to just recompute the mode here.
> > > > > > > > > Hi,
> > > > > > > > > The attached patch sets DECL_MODE (expr) to TYPE_MODE
> > > > > > > > > (TREE_TYPE
> > > > > > > > (expr)) in lto_read_tree_1 instead of calling
> > relayout_decl
> > > > > (expr).
> > > > > > > > > I checked layout_decl_type does the same thing for
> > setting
> > > > > > > > > decl
> > > > > > > > mode,
> > > > > > > > > except for bit fields. Since bit-fields cannot have
> > > > > > > > > aggregate
> > > > > > > type,
> > > > > > > > I am assuming setting DECL_MODE (expr) to TYPE_MODE
> > > (TREE_TYPE
> > > > > > > (expr))
> > > > > > > > would be OK in this case ?
> > > > > > > >
> > > > > > > > Yep, that should work.
> > > > > > > Thanks, I have committed the patch in:
> > > > > > >
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16
> > > > > > > b182
> > > > > > > 87
> > > > > > > 1e105f47823b8e72
> > > > > > Hi,
> > > > > > This also results in same failure (using OImode) for vector of
> > > > > > 256-bit type, which was triggered for firstprivate-mappings-
> > 1.c.
> > > > > > Can be reproduced with following simple test-case:
> > > > > >
> > > > > > typedef long v4di __attribute__((vector_size (sizeof (long) *
> > > > > > 4))); int main() {
> > > > > > v4di x;
> > > > > > #pragma acc parallel copy(x)
> > > > > > x;
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > Compiling with -fopenacc -foffload=nvptx-none:
> > > > > > lto1: fatal error: nvptx-none - 256-bit integer numbers
> > > > > > unsupported (mode ‘OI’) compilation terminated.
> > > > > > nvptx mkoffload: fatal error:
> > > > > > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> > > > > > returned
> > > > > 1 exit status compilation terminated.
> > > > > >
> > > > > > The attached patch fixes the test with same approach as for
> > > > > > aggregate type -- streaming out VOIDmode from host, and
> > > > > > recomputing mode for
> > > > > vector_type during stream-in for accelerator.
> > > > > > LTO bootstrap+tested on aarch64-linux-gnu.
> > > > > > Does the patch look OK ?
> > > > >
> > > > > @@ -1757,11 +1757,22 @@ lto_read_tree_1 (class lto_input_block
> > > *ib,
> > > > > class data_in *data_in, tree expr)
> > > > > if ((VAR_P (expr)
> > > > > || TREE_CODE (expr) == PARM_DECL
> > > > > || TREE_CODE (expr) == FIELD_DECL)
> > > > > - && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > > + && (AGGREGATE_TYPE_P (TREE_TYPE (expr)) ||
> > VECTOR_TYPE_P
> > > > > (TREE_TYPE (expr)))
> > > > >
> > > > > long line, please wrap.
> > > > >
> > > > > && DECL_MODE (expr) == VOIDmode)
> > > > > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr)));
> > > #endif
> > > > > }
> > > > >
> > > > > I'm not sure you can call TYPE_MODE aka vector_type_mode safely
> > > > > during LTO streaming. Instead you possibly want to use
> > > TYPE_MODE_RAW here?
> > > > >
> > > > > +#ifdef ACCEL_COMPILER
> > > > > + if (VECTOR_TYPE_P (expr) && TYPE_MODE (expr) == VOIDmode)
> > > > > + {
> > > > > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (expr);
> > > > > + tree innertype = TREE_TYPE (expr);
> > > > > + machine_mode vmode
> > > > > + = mode_for_vector (SCALAR_TYPE_MODE (innertype),
> > > > > nunits).else_blk ();
> > > > > + SET_TYPE_MODE (expr, vmode);
> > > > >
> > > > > I'm not sure this unambiguously specifies the mode, does it? (x2
> > > > > modes, etc.).
> > > > >
> > > > > Richard?
> > > > >
> > > > >
> > > > > > If we go with this approach, would it be safe to remove the
> > > > > > following hunk from lto_input_mode_table, since vector modes
> > > would
> > > > > > no longer be
> > > > > streamed out in LTO bytecode ?
> > > > >
> > > > > I would guess you want to put an assert on the query side then?
> > > > Hi Richard,
> > > > Thanks for the review and sorry for late reply.
> > > > The attached patch uses TYPE_MODE_RAW for vector_type, and removes
> > > > vector handling in lto_input_mode_table.
> > > >
> > > > Should I also need to add an assert for !VECTOR_MODE_P in
> > > > bp_unpack_machine_mode (if we're in accel) or the check in
> > > > lto_input_mode_table should be sufficient ?
> > > >
> > > > The patch moves the following hunk in lto_read_tree_1:
> > > >
> > > > #ifdef ACCEL_COMPILER
> > > > if ((VAR_P (expr)
> > > > || TREE_CODE (expr) == PARM_DECL
> > > > || TREE_CODE (expr) == FIELD_DECL)
> > > > && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > && DECL_MODE (expr) == VOIDmode)
> > > > SET_DECL_MODE (expr, TYPE_MODE (TREE_TYPE (expr))); #endif
> > > >
> > > > outside the following condition:
> > > > if ((DECL_P (expr)
> > > > && TREE_CODE (expr) != FIELD_DECL
> > > > && TREE_CODE (expr) != DEBUG_EXPR_DECL
> > > > && TREE_CODE (expr) != TYPE_DECL)
> > > >
> > > > since the condition doesn't allow FIELD_DECL and thus would not
> > set
> > > > mode for FIELD_DECL.
> > > >
> > > > I am not sure how to infer vector mode from scalar_type and
> > length,
> > > if
> > > > we can't use mode_for_vector here. Could you please suggest how to
> > > proceed ?
> > >
> > > I have no good idea besides indeed using mode_for_vector as
> > > layout_type does.
> > >
> > > So OK unless Richard S. has anything to add.
> > Hi Richard S.,
> > Does the patch in:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663670.html
> >
> > look OK to you ? Would using mode_for_vector be correct to recompute
> > vector mode while streaming-in during accel, as done in the patch ?
> Hi, ping ?
I think you can interpret silence as "nothing to add", so go ahead.
Richard.
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > >
> > > > Thanks
> > > > Prathamesh
> > > > >
> > > > > > case MODE_VECTOR_BOOL:
> > > > > > case MODE_VECTOR_INT:
> > > > > > case MODE_VECTOR_FLOAT:
> > > > > > case MODE_VECTOR_FRACT:
> > > > > > case MODE_VECTOR_UFRACT:
> > > > > > case MODE_VECTOR_ACCUM:
> > > > > > case MODE_VECTOR_UACCUM:
> > > > > > /* For unsupported vector modes just use
> > BLKmode,
> > > > > > if the scalar mode is supported. */
> > > > > > if (table[(int) inner] != VOIDmode)
> > > > > > {
> > > > > > table[m] = BLKmode;
> > > > > > break;
> > > > > > }
> > > > > >
> > > > > > Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > after verifying it passes bootstrap+test on aarch64-linux-
> > gnu,
> > > > > > > and libgomp testing (without GPU) for aarch64->nvptx and
> > > x86_64->nvptx.
> > > > > > > >
> > > > > > > > > Sorry if this sounds like a silly ques -- Why would it
> > be
> > > > > > > > > unsafe
> > > > > > > to
> > > > > > > > > call relayout_decl for variables that are mapped to
> > > > > > > > > accelerator
> > > > > > > even
> > > > > > > > > if it'd not preserve host's properties ? I assumed we
> > want
> > > > > > > > > to
> > > > > > > assign
> > > > > > > > accel's ABI properties for mapped decls (mode being one of
> > > > > > > > them), or am I misunderstanding ?
> > > > > > > >
> > > > > > > > Structure layout need not be compatible but we are
> > > preserving
> > > > > > > > that
> > > > > > > of
> > > > > > > > the host instead of re-layouting in target context.
> > > Likewise
> > > > > > > > type
> > > > > > > <->
> > > > > > > > mode mapping doesn't have to agree.
> > > > > > > Ah OK, thanks for clarifying. So IIUC, in future, we might
> > > need
> > > > > > > to change that if (in theory), host's structure layout for a
> > > > > > > decl is incompatible with a particular accel's ABI and will
> > > need
> > > > > > > to relayout in accel's context ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > > Signed-off-by: Prathamesh Kulkarni
> > > <prathameshk@nvidia.com>
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Richard
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >> }
> > > > > > > > > > >> }
> > > > > > > > > > >>
> > > > > > > > > > >> diff --git a/gcc/stor-layout.cc b/gcc/stor-
> > layout.cc
> > > > > > > > > > >> index 10c0809914c..0ff8bd1171e 100644
> > > > > > > > > > >> --- a/gcc/stor-layout.cc
> > > > > > > > > > >> +++ b/gcc/stor-layout.cc
> > > > > > > > > > >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree
> > > > > > > > > > >> type,
> > > > > > > const
> > > > > > > > > > >> char
> > > > > > > > > > *name, tree fields,
> > > > > > > > > > >> layout_decl (TYPE_NAME (type), 0); }
> > > > > > > > > > >>
> > > > > > > > > > >> +/* Compute TYPE_MODE for TYPE (which is
> > ARRAY_TYPE).
> > > > > > > > > > >> +*/
> > > > > > > > > > >> +
> > > > > > > > > > >> +void compute_array_mode (tree type) {
> > > > > > > > > > >> + gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> > > > > > > > > > >> +
> > > > > > > > > > >> + SET_TYPE_MODE (type, BLKmode); if (TYPE_SIZE
> > > (type)
> > > > > > > > > > >> + !=
> > > > > 0
> > > > > > > > > > >> + && ! targetm.member_type_forces_blk (type,
> > > VOIDmode)
> > > > > > > > > > >> + /* BLKmode elements force BLKmode aggregate;
> > > > > > > > > > >> + else extract/store fields may lose. */
> > > > > > > > > > >> + && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> > > > > > > > > > >> + || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > > > >> + {
> > > > > > > > > > >> + SET_TYPE_MODE (type, mode_for_array
> > (TREE_TYPE
> > > > > (type),
> > > > > > > > > > >> + TYPE_SIZE (type)));
> > > > > > > > > > >> + if (TYPE_MODE (type) != BLKmode
> > > > > > > > > > >> + && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > > BIGGEST_ALIGNMENT
> > > > > > > > > > >> + && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > > > > > > > > >> + (TYPE_MODE
> > > > > > > > > > (type)))
> > > > > > > > > > >> + {
> > > > > > > > > > >> + TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > > > >> + SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > > >> + }
> > > > > > > > > > >> + }
> > > > > > > > > > >> +}
> > > > > > > > > > >> +
> > > > > > > > > > >> /* Calculate the mode, size, and alignment for
> > TYPE.
> > > > > > > > > > >> For an array type, calculate the element
> > > separation
> > > > > > > > > > >> as
> > > > > > > well.
> > > > > > > > > > >> Record TYPE on the chain of permanent or
> > temporary
> > > > > > > > > > >> types
> > > > > > > @@
> > > > > > > > > > >> -2709,24 +2735,7 @@ layout_type (tree type)
> > > > > > > > > > >> align = MAX (align, BITS_PER_UNIT); #endif
> > > > > > > > > > >> SET_TYPE_ALIGN (type, align);
> > > > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > > >> - if (TYPE_SIZE (type) != 0
> > > > > > > > > > >> - && ! targetm.member_type_forces_blk (type,
> > > > > VOIDmode)
> > > > > > > > > > >> - /* BLKmode elements force BLKmode
> > aggregate;
> > > > > > > > > > >> - else extract/store fields may lose. */
> > > > > > > > > > >> - && (TYPE_MODE (TREE_TYPE (type)) !=
> > BLKmode
> > > > > > > > > > >> - || TYPE_NO_FORCE_BLK (TREE_TYPE (type))))
> > > > > > > > > > >> - {
> > > > > > > > > > >> - SET_TYPE_MODE (type, mode_for_array
> > > (TREE_TYPE
> > > > > > > (type),
> > > > > > > > > > >> - TYPE_SIZE (type)));
> > > > > > > > > > >> - if (TYPE_MODE (type) != BLKmode
> > > > > > > > > > >> - && STRICT_ALIGNMENT && TYPE_ALIGN (type) <
> > > > > > > > > > BIGGEST_ALIGNMENT
> > > > > > > > > > >> - && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT
> > > > > (TYPE_MODE
> > > > > > > > > > (type)))
> > > > > > > > > > >> - {
> > > > > > > > > > >> - TYPE_NO_FORCE_BLK (type) = 1;
> > > > > > > > > > >> - SET_TYPE_MODE (type, BLKmode);
> > > > > > > > > > >> - }
> > > > > > > > > > >> - }
> > > > > > > > > > >> + compute_array_mode (type);
> > > > > > > > > > >> if (AGGREGATE_TYPE_P (element))
> > > > > > > > > > >> TYPE_TYPELESS_STORAGE (type) =
> > > > > > > > > > >> TYPE_TYPELESS_STORAGE
> > > > > > > > > > (element);
> > > > > > > > > > >> /* When the element size is constant, check that
> > > it
> > > > > > > > > > >> is at least
> > > > > > > > > > as
> > > > > > > > > > >> diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
> > > > > > > > > > >> index
> > > > > > > > > > >> 096ca811762..9d9b8c385f6 100644
> > > > > > > > > > >> --- a/gcc/stor-layout.h
> > > > > > > > > > >> +++ b/gcc/stor-layout.h
> > > > > > > > > > >> @@ -34,6 +34,7 @@ extern tree rli_size_so_far
> > > > > > > > > > >> (record_layout_info); extern void normalize_rli
> > > > > > > > > > >> (record_layout_info); extern void place_field
> > > > > > > > > > >> (record_layout_info, tree); extern void
> > > > > > > > > > >> compute_record_mode (tree);
> > > > > > > > > > >> +extern void compute_array_mode (tree);
> > > > > > > > > > >> extern void finish_bitfield_layout (tree); extern
> > > void
> > > > > > > > > > >> finish_record_layout (record_layout_info, int);
> > > extern
> > > > > > > > > > >> void finalize_size_functions (void); diff --git
> > > > > > > > > > >> a/gcc/tree-
> > > > > > > streamer-
> > > > > > > > > > in.cc
> > > > > > > > > > >> b/gcc/tree-streamer-in.cc index
> > > > > > > > > > >> 40029437199..329d218e7d4
> > > > > > > 100644
> > > > > > > > > > >> --- a/gcc/tree-streamer-in.cc
> > > > > > > > > > >> +++ b/gcc/tree-streamer-in.cc
> > > > > > > > > > >> @@ -35,6 +35,7 @@ along with GCC; see the file
> > > COPYING3.
> > > > > > > > > > >> If
> > > > > > > > not
> > > > > > > > > > see
> > > > > > > > > > >> #include "attribs.h"
> > > > > > > > > > >> #include "asan.h"
> > > > > > > > > > >> #include "opts.h"
> > > > > > > > > > >> +#include "stor-layout.h"
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> /* Read a STRING_CST from the string table in
> > DATA_IN
> > > > > > > > > > >> using
> > > > > > > > input
> > > > > > > > > > @@
> > > > > > > > > > >> -395,6 +396,17 @@
> > unpack_ts_type_common_value_fields
> > > > > > > > > > >> (struct bitpack_d *bp, tree expr) #ifdef
> > > ACCEL_COMPILER
> > > > > > > > > > >> if (TYPE_ALIGN (expr) >
> > > > > targetm.absolute_biggest_alignment)
> > > > > > > > > > >> SET_TYPE_ALIGN (expr,
> > > > > > > targetm.absolute_biggest_alignment);
> > > > > > > > > > >> +
> > > > > > > > > > >> + /* Host streams out VOIDmode for aggregate type.
> > > */
> > > > > > > > > > >> + if (AGGREGATE_TYPE_P (expr) && TYPE_MODE (expr)
> > ==
> > > VOIDmode)
> > > > > > > > > > >> + {
> > > > > > > > > > >> + if (TREE_CODE (expr) == ARRAY_TYPE)
> > > > > > > > > > >> + compute_array_mode (expr);
> > > > > > > > > > >> + else if (RECORD_OR_UNION_TYPE_P (expr))
> > > > > > > > > > >> + compute_record_mode (expr);
> > > > > > > > > > >> + else
> > > > > > > > > > >> + gcc_unreachable ();
> > > > > > > > > > >> + }
> > > > > > > > > > >> #endif
> > > > > > > > > > >> }
> > > > > > > > > > >>
> > > > > > > > > > >> diff --git a/gcc/tree-streamer-out.cc
> > > > > > > > > > >> b/gcc/tree-streamer-
> > > > > > > > out.cc
> > > > > > > > > > >> index b7205287ffb..7de4447a1b5 100644
> > > > > > > > > > >> --- a/gcc/tree-streamer-out.cc
> > > > > > > > > > >> +++ b/gcc/tree-streamer-out.cc
> > > > > > > > > > >> @@ -187,7 +187,17 @@ pack_ts_fixed_cst_value_fields
> > > > > > > > > > >> (struct
> > > > > > > > > > bitpack_d
> > > > > > > > > > >> *bp, tree expr) static void
> > > > > > > > > > >> pack_ts_decl_common_value_fields
> > > > > > > > > > (struct
> > > > > > > > > > >> bitpack_d *bp, tree expr) {
> > > > > > > > > > >> - bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > > > >> + /* Similar to TYPE_MODE, avoid streaming out
> > > > > > > > > > >> + host-specific
> > > > > > > > > > DECL_MODE
> > > > > > > > > > >> + for aggregate type with offloading enabled,
> > and
> > > > > > > > > > >> + while
> > > > > > > > > > streaming-in
> > > > > > > > > > >> + recompute appropriate DECL_MODE for
> > > accelerator.
> > > > > > > > > > >> + */
> > > > > > > if
> > > > > > > > > > >> + (lto_stream_offload_p
> > > > > > > > > > >> + && (VAR_P (expr)
> > > > > > > > > > >> + || TREE_CODE (expr) == PARM_DECL
> > > > > > > > > > >> + || TREE_CODE (expr) == FIELD_DECL)
> > > > > > > > > > >> + && AGGREGATE_TYPE_P (TREE_TYPE (expr)))
> > > > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode); else
> > > > > > > > > > >> + bp_pack_machine_mode (bp, DECL_MODE (expr));
> > > > > > > > > > >> bp_pack_value (bp, DECL_NONLOCAL (expr), 1);
> > > > > > > > > > >> bp_pack_value (bp, DECL_VIRTUAL_P (expr), 1);
> > > > > > > > > > >> bp_pack_value (bp, DECL_IGNORED_P (expr), 1); @@
> > > > > > > > > > >> -317,10
> > > > > > > > > > >> +327,18
> > > > > > > > > > @@
> > > > > > > > > > >> pack_ts_function_decl_value_fields (struct
> > bitpack_d
> > > > > > > > > > >> *bp,
> > > > > > > tree
> > > > > > > > > > expr)
> > > > > > > > > > >> static void pack_ts_type_common_value_fields
> > (struct
> > > > > > > bitpack_d
> > > > > > > > > > >> *bp, tree expr) {
> > > > > > > > > > >> + /* For offloading, avoid streaming out TYPE_MODE
> > > for
> > > > > > > > aggregate
> > > > > > > > > > type since
> > > > > > > > > > >> + it may be host-specific. For eg, aarch64 uses
> > > > > > > > > > >> + OImode
> > > > > > > for
> > > > > > > > > > ARRAY_TYPE
> > > > > > > > > > >> + whose size is 256-bits, which is not
> > > > > > > > > > >> + representable on
> > > > > > > > > > accelerator.
> > > > > > > > > > >> + Instead stream out VOIDmode, and while
> > > > > > > > > > >> + streaming-in,
> > > > > > > > > > recompute
> > > > > > > > > > >> + appropriate TYPE_MODE for accelerator. */
> > if
> > > > > > > > > > >> + (lto_stream_offload_p && AGGREGATE_TYPE_P (expr))
> > > > > > > > > > >> + bp_pack_machine_mode (bp, VOIDmode);
> > > > > > > > > > >> /* for VECTOR_TYPE, TYPE_MODE reevaluates the
> > mode
> > > > > > > > > > >> using
> > > > > > > > > > target_flags
> > > > > > > > > > >> not necessary valid in a global context.
> > > > > > > > > > >> Use the raw value previously set by
> > layout_type.
> > > > > > > > > > >> */
> > > > > > > > > > >> - bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));
> > > > > > > > > > >> + else
> > > > > > > > > > >> + bp_pack_machine_mode (bp, TYPE_MODE_RAW
> > (expr));
> > > > > > > > > > >> /* TYPE_NO_FORCE_BLK is private to stor-layout
> > and
> > > need
> > > > > > > > > > >> no streaming. */
> > > > > > > > > > >> bp_pack_value (bp, TYPE_PACKED (expr), 1);
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> > Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > Germany
> > > > > GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-10-08 6:35 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-19 8:22 Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator Prathamesh Kulkarni
2024-08-19 13:29 ` Richard Biener
2024-08-19 18:55 ` Richard Sandiford
2024-08-20 5:06 ` Richard Biener
2024-08-21 11:04 ` Prathamesh Kulkarni
2024-08-21 11:39 ` Richard Biener
2024-08-22 14:11 ` Prathamesh Kulkarni
2024-08-22 15:01 ` Richard Sandiford
2024-09-03 3:55 ` Prathamesh Kulkarni
2024-09-09 13:54 ` Richard Biener
2024-09-24 6:02 ` Prathamesh Kulkarni
2024-09-24 6:59 ` Richard Biener
2024-10-01 14:56 ` Prathamesh Kulkarni
2024-10-07 20:51 ` Prathamesh Kulkarni
2024-10-08 6:35 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).