* New post-LTO OpenACC pass
@ 2015-09-21 16:39 Nathan Sidwell
2015-09-21 21:03 ` Cesar Philippidis
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-21 16:39 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 971 bytes --]
Jakub,
this patch adds a new transforming pass, which executes after the LTO readback
pass, and hence knows whether it is targeting the host or (a) device.
The contents of the pass will be built out -- it does much more on the gomp4
pass. This instance simply scans and replaces the acc_on_device builtin.
Expanding early will allow such things as constant propagation and dead code
removal earlier.
We still need the traditional expansion at RTL time too, because this function
is used when building the library, in case the user does something crazy, like
calling via a pointer.
The scanning code is written such that the replaced code is also scanned. This
will occur for the later transforms that might expand to internal builtins which
themselves could be optimized.
The 'get_oacc_fn_attrib' was also present in the launch API patch. Although
used just internally to omp-low.c in this patch, it ends up being more widely used.
ok for trunk?
nathan
[-- Attachment #2: trunk-ondev.patch --]
[-- Type: text/x-patch, Size: 6077 bytes --]
2015-09-21 Nathan Sidwell <nathan@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
* omp-low.h (get_oacc_fn_attrib): Declare.
* omp-low.c (get_oacc_fn_attrib): New.
(oacc_xform_on_device): New.
(execute_oacc_transform): New pass.
(pass_data_oacc_transform): New.
(pass_oacc_transform): New.
(make_pass_oacc_transform): New.
* tree-pass.h (make_pass_oacc_transform): Declare.
* passes.def: Add pass_oacc_transform.
Index: omp-low.c
===================================================================
--- omp-low.c (revision 227968)
+++ omp-low.c (working copy)
@@ -8860,6 +8860,16 @@ expand_omp_atomic (struct omp_region *re
expand_omp_atomic_mutex (load_bb, store_bb, addr, loaded_val, stored_val);
}
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Retrieve the oacc function attrib and return it. Non-oacc
+ functions will return NULL. */
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+ return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
/* Expand the GIMPLE_OMP_TARGET starting at REGION. */
@@ -13909,4 +13919,131 @@ omp_finish_file (void)
}
}
+/* Transform an acc_on_device call. OpenACC 2.0a requires this folded at
+ compile time for constant operands. We always fold it. In an
+ offloaded function we're never 'none'. */
+
+static void
+oacc_xform_on_device (gimple *call)
+{
+ tree arg = gimple_call_arg (call, 0);
+ unsigned val = GOMP_DEVICE_HOST;
+
+#ifdef ACCEL_COMPILER
+ val = GOMP_DEVICE_NOT_HOST;
+#endif
+ tree result = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node, val));
+#ifdef ACCEL_COMPILER
+ {
+ tree dev = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node,
+ ACCEL_COMPILER_acc_device));
+ result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
+ }
+#endif
+ result = fold_convert (integer_type_node, result);
+ tree lhs = gimple_call_lhs (call);
+ gimple_seq seq = NULL;
+
+ push_gimplify_context (true);
+ gimplify_assign (lhs, result, &seq);
+ pop_gimplify_context (NULL);
+
+ gimple_stmt_iterator gsi = gsi_for_stmt (call);
+ gsi_replace_with_seq (&gsi, seq, false);
+}
+
+/* Main entry point for oacc transformations which run on the device
+ compiler after LTO, so we know what the target device is at this
+ point (including the host fallback). */
+
+static unsigned int
+execute_oacc_transform ()
+{
+ tree attrs = get_oacc_fn_attrib (current_function_decl);
+
+ if (!attrs)
+ /* Not an offloaded function. */
+ return 0;
+
+ basic_block bb;
+ FOR_ALL_BB_FN (bb, cfun)
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ bool rescan = false;
+
+ if (!is_gimple_call (stmt))
+ continue;
+
+ /* Rewind to allow rescan. */
+ gsi_prev (&gsi);
+
+ gcall *call = as_a <gcall *> (stmt);
+
+ if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+ /* acc_on_device must be evaluated at compile time for
+ constant arguments. */
+ {
+ oacc_xform_on_device (call);
+ rescan = true;
+ }
+
+ if (gsi_end_p (gsi))
+ /* We rewound past the beginning of the BB. */
+ gsi = gsi_start_bb (bb);
+
+ if (!rescan)
+ /* Undo the rewind, so we don't get stuck infinitely. */
+ gsi_next (&gsi);
+ }
+
+ return 0;
+}
+
+namespace {
+
+const pass_data pass_data_oacc_transform =
+{
+ GIMPLE_PASS, /* type */
+ "fold_oacc_transform", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ TV_NONE, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_oacc_transform : public gimple_opt_pass
+{
+public:
+ pass_oacc_transform (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_oacc_transform, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual unsigned int execute (function *)
+ {
+ bool gate = (flag_openacc != 0 && !seen_error ());
+
+ if (!gate)
+ return 0;
+
+ return execute_oacc_transform ();
+ }
+
+}; // class pass_oacc_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_transform (gcc::context *ctxt)
+{
+ return new pass_oacc_transform (ctxt);
+}
+
#include "gt-omp-low.h"
Index: omp-low.h
===================================================================
--- omp-low.h (revision 227968)
+++ omp-low.h (working copy)
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
extern tree omp_reduction_init (tree, tree);
extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
extern GTY(()) vec<tree, va_gc> *offload_funcs;
extern GTY(()) vec<tree, va_gc> *offload_vars;
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 227968)
+++ tree-pass.h (working copy)
@@ -406,6 +406,7 @@ extern gimple_opt_pass *make_pass_lower_
extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_transform (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: passes.def
===================================================================
--- passes.def (revision 227968)
+++ passes.def (working copy)
@@ -148,6 +148,7 @@ along with GCC; see the file COPYING3.
INSERT_PASSES_AFTER (all_passes)
NEXT_PASS (pass_fixup_cfg);
NEXT_PASS (pass_lower_eh_dispatch);
+ NEXT_PASS (pass_oacc_transform);
NEXT_PASS (pass_all_optimizations);
PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
NEXT_PASS (pass_remove_cgraph_callee_edges);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-21 16:39 New post-LTO OpenACC pass Nathan Sidwell
@ 2015-09-21 21:03 ` Cesar Philippidis
2015-09-21 21:15 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Cesar Philippidis @ 2015-09-21 21:03 UTC (permalink / raw)
To: Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 638 bytes --]
On 09/21/2015 09:30 AM, Nathan Sidwell wrote:
> +const pass_data pass_data_oacc_transform =
> +{
> + GIMPLE_PASS, /* type */
> + "fold_oacc_transform", /* name */
Want to rename the tree dump file to oacc_xforms like I'm did in the
attached patch? Regardless, I think we need to document this flag in
invoke.texi.
> + OPTGROUP_NONE, /* optinfo_flags */
> + TV_NONE, /* tv_id */
> + PROP_cfg, /* properties_required */
> + 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
Cesar
[-- Attachment #2: oacc_xforms.diff --]
[-- Type: text/x-patch, Size: 1281 bytes --]
2015-09-21 Cesar Philippidis <cesar@codesourcery.com>
gcc/
* doc/invoke.texi: Document -fdump-tree-oacc_xforms.
* omp-low.c (pass_data_oacc_transform): Rename the tree dump for
oacc_transform as oacc_xforms.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92f82d7..7406941 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7158,6 +7158,11 @@ is made by appending @file{.slp} to the source file name.
Dump each function after Value Range Propagation (VRP). The file name
is made by appending @file{.vrp} to the source file name.
+@item oacc_xforms
+@opindex fdump-tree-oacc_xforms
+Dump each function after applying target-specific OpenACC transformations.
+The file name is made by appending @file{.oacc_xforms} to the source file name.
+
@item all
@opindex fdump-tree-all
Enable all the available tree dumps with the flags provided in this option.
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e3dc160..f31e6cd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15086,7 +15086,7 @@ namespace {
const pass_data pass_data_oacc_transform =
{
GIMPLE_PASS, /* type */
- "fold_oacc_transform", /* name */
+ "oacc_xforms", /* name */
OPTGROUP_NONE, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_cfg, /* properties_required */
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-21 21:03 ` Cesar Philippidis
@ 2015-09-21 21:15 ` Nathan Sidwell
2015-09-22 15:22 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-21 21:15 UTC (permalink / raw)
To: Cesar Philippidis, Jakub Jelinek; +Cc: GCC Patches
On 09/21/15 16:30, Cesar Philippidis wrote:
> On 09/21/2015 09:30 AM, Nathan Sidwell wrote:
>
>> +const pass_data pass_data_oacc_transform =
>> +{
>> + GIMPLE_PASS, /* type */
>> + "fold_oacc_transform", /* name */
>
> Want to rename the tree dump file to oacc_xforms like I'm did in the
> attached patch? Regardless, I think we need to document this flag in
> invoke.texi.
Thanks for noticing the missing doc. I'm not attached to any particular name.
'fold_oacc_transform' is rather generic, and a bit of a mouthful. Perhaps
'oacclower', 'oaccdevlower' or something (I see there's 'lateomplower' for
guidance)
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-21 21:15 ` Nathan Sidwell
@ 2015-09-22 15:22 ` Nathan Sidwell
2015-09-23 11:10 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-22 15:22 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
[-- Attachment #1: Type: text/plain, Size: 788 bytes --]
On 09/21/15 16:39, Nathan Sidwell wrote:
> On 09/21/15 16:30, Cesar Philippidis wrote:
>> On 09/21/2015 09:30 AM, Nathan Sidwell wrote:
>>
>>> +const pass_data pass_data_oacc_transform =
>>> +{
>>> + GIMPLE_PASS, /* type */
>>> + "fold_oacc_transform", /* name */
>>
>> Want to rename the tree dump file to oacc_xforms like I'm did in the
>> attached patch? Regardless, I think we need to document this flag in
>> invoke.texi.
>
> Thanks for noticing the missing doc. I'm not attached to any particular name.
> 'fold_oacc_transform' is rather generic, and a bit of a mouthful. Perhaps
> 'oacclower', 'oaccdevlower' or something (I see there's 'lateomplower' for
> guidance)
this updated patch includes Cesar's doc patch. Also change the name of the pass
to 'oaccdevlow'.
nathan
[-- Attachment #2: trunk-ondev-2.patch --]
[-- Type: text/x-patch, Size: 6847 bytes --]
2015-09-22 Nathan Sidwell <nathan@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
* omp-low.h (get_oacc_fn_attrib): Declare.
* omp-low.c (get_oacc_fn_attrib): New.
(oacc_xform_on_device): New.
(execute_oacc_device_lower): New pass.
(pass_data_oacc_device_lower): New.
(pass_oacc_device_lower): New.
(make_pass_oacc_device_lower): New.
* tree-pass.h (make_pass_oacc_device_lower): Declare.
* passes.def: Add pass_oacc_transform.
* doc/invoke.texi: Document -fdump-tree-oaccdevlow.
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 227968)
+++ tree-pass.h (working copy)
@@ -406,6 +406,7 @@ extern gimple_opt_pass *make_pass_lower_
extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: passes.def
===================================================================
--- passes.def (revision 227968)
+++ passes.def (working copy)
@@ -148,6 +148,7 @@ along with GCC; see the file COPYING3.
INSERT_PASSES_AFTER (all_passes)
NEXT_PASS (pass_fixup_cfg);
NEXT_PASS (pass_lower_eh_dispatch);
+ NEXT_PASS (pass_oacc_device_lower);
NEXT_PASS (pass_all_optimizations);
PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
NEXT_PASS (pass_remove_cgraph_callee_edges);
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 227968)
+++ doc/invoke.texi (working copy)
@@ -7179,6 +7179,11 @@ is made by appending @file{.slp} to the
Dump each function after Value Range Propagation (VRP). The file name
is made by appending @file{.vrp} to the source file name.
+@item oaccdevlow
+@opindex fdump-tree-oaccdevlow
+Dump each function after applying device-specific OpenACC transformations.
+The file name is made by appending @file{.oaccdevlow} to the source file name.
+
@item all
@opindex fdump-tree-all
Enable all the available tree dumps with the flags provided in this option.
Index: omp-low.c
===================================================================
--- omp-low.c (revision 227968)
+++ omp-low.c (working copy)
@@ -8860,6 +8860,16 @@ expand_omp_atomic (struct omp_region *re
expand_omp_atomic_mutex (load_bb, store_bb, addr, loaded_val, stored_val);
}
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Retrieve the oacc function attrib and return it. Non-oacc
+ functions will return NULL. */
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+ return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
/* Expand the GIMPLE_OMP_TARGET starting at REGION. */
@@ -13909,4 +13919,131 @@ omp_finish_file (void)
}
}
+/* Transform an acc_on_device call. OpenACC 2.0a requires this folded at
+ compile time for constant operands. We always fold it. In an
+ offloaded function we're never 'none'. */
+
+static void
+oacc_xform_on_device (gimple *call)
+{
+ tree arg = gimple_call_arg (call, 0);
+ unsigned val = GOMP_DEVICE_HOST;
+
+#ifdef ACCEL_COMPILER
+ val = GOMP_DEVICE_NOT_HOST;
+#endif
+ tree result = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node, val));
+#ifdef ACCEL_COMPILER
+ {
+ tree dev = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node,
+ ACCEL_COMPILER_acc_device));
+ result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
+ }
+#endif
+ result = fold_convert (integer_type_node, result);
+ tree lhs = gimple_call_lhs (call);
+ gimple_seq seq = NULL;
+
+ push_gimplify_context (true);
+ gimplify_assign (lhs, result, &seq);
+ pop_gimplify_context (NULL);
+
+ gimple_stmt_iterator gsi = gsi_for_stmt (call);
+ gsi_replace_with_seq (&gsi, seq, false);
+}
+
+/* Main entry point for oacc transformations which run on the device
+ compiler after LTO, so we know what the target device is at this
+ point (including the host fallback). */
+
+static unsigned int
+execute_oacc_device_lower ()
+{
+ tree attrs = get_oacc_fn_attrib (current_function_decl);
+
+ if (!attrs)
+ /* Not an offloaded function. */
+ return 0;
+
+ basic_block bb;
+ FOR_ALL_BB_FN (bb, cfun)
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ bool rescan = false;
+
+ if (!is_gimple_call (stmt))
+ continue;
+
+ /* Rewind to allow rescan. */
+ gsi_prev (&gsi);
+
+ gcall *call = as_a <gcall *> (stmt);
+
+ if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+ /* acc_on_device must be evaluated at compile time for
+ constant arguments. */
+ {
+ oacc_xform_on_device (call);
+ rescan = true;
+ }
+
+ if (gsi_end_p (gsi))
+ /* We rewound past the beginning of the BB. */
+ gsi = gsi_start_bb (bb);
+
+ if (!rescan)
+ /* Undo the rewind, so we don't get stuck infinitely. */
+ gsi_next (&gsi);
+ }
+
+ return 0;
+}
+
+namespace {
+
+const pass_data pass_data_oacc_device_lower =
+{
+ GIMPLE_PASS, /* type */
+ "oaccdevlow", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ TV_NONE, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_oacc_device_lower : public gimple_opt_pass
+{
+public:
+ pass_oacc_device_lower (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_oacc_device_lower, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual unsigned int execute (function *)
+ {
+ bool gate = (flag_openacc != 0 && !seen_error ());
+
+ if (!gate)
+ return 0;
+
+ return execute_oacc_device_lower ();
+ }
+
+}; // class pass_oacc_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_device_lower (gcc::context *ctxt)
+{
+ return new pass_oacc_device_lower (ctxt);
+}
+
#include "gt-omp-low.h"
Index: omp-low.h
===================================================================
--- omp-low.h (revision 227968)
+++ omp-low.h (working copy)
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
extern tree omp_reduction_init (tree, tree);
extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
extern GTY(()) vec<tree, va_gc> *offload_funcs;
extern GTY(()) vec<tree, va_gc> *offload_vars;
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-22 15:22 ` Nathan Sidwell
@ 2015-09-23 11:10 ` Bernd Schmidt
2015-09-23 12:40 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-23 11:10 UTC (permalink / raw)
To: Nathan Sidwell, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
> + /* acc_on_device must be evaluated at compile time for
> + constant arguments. */
> + {
> + oacc_xform_on_device (call);
> + rescan = true;
> + }
Is there a reason this is not done as part of pass_fold_builtins? (It
looks like maybe adding this to fold_call_stmt in builtins.c would be
sufficient too).
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 11:10 ` Bernd Schmidt
@ 2015-09-23 12:40 ` Nathan Sidwell
2015-09-23 13:19 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-23 12:40 UTC (permalink / raw)
To: Bernd Schmidt, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/15 06:59, Bernd Schmidt wrote:
> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>> + /* acc_on_device must be evaluated at compile time for
>> + constant arguments. */
>> + {
>> + oacc_xform_on_device (call);
>> + rescan = true;
>> + }
>
> Is there a reason this is not done as part of pass_fold_builtins? (It looks like
> maybe adding this to fold_call_stmt in builtins.c would be sufficient too).
Perhaps it could be. I'll need to check where that pass happens. Anyway, the
main thrust of this patch is the new pass, which I thought might be easier to
review with minimal additional clutter.
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 12:40 ` Nathan Sidwell
@ 2015-09-23 13:19 ` Bernd Schmidt
2015-09-23 18:45 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-23 13:19 UTC (permalink / raw)
To: Nathan Sidwell, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/2015 02:14 PM, Nathan Sidwell wrote:
> On 09/23/15 06:59, Bernd Schmidt wrote:
>> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>>> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>>> + /* acc_on_device must be evaluated at compile time for
>>> + constant arguments. */
>>> + {
>>> + oacc_xform_on_device (call);
>>> + rescan = true;
>>> + }
>>
>> Is there a reason this is not done as part of pass_fold_builtins? (It
>> looks like
>> maybe adding this to fold_call_stmt in builtins.c would be sufficient
>> too).
>
> Perhaps it could be. I'll need to check where that pass happens.
> Anyway, the main thrust of this patch is the new pass, which I thought
> might be easier to review with minimal additional clutter.
There's no issue adding a new pass if there's a demonstrated need for
it, but I think builtin folding doesn't quite meet that criterion given
that we already have a pass that does that. Unless you really need it to
happen very early in the pipeline - fold_builtins runs pretty late, but
I checked and fold_call_stmt gets called from pass_forwprop and possibly
from elsewhere too.
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 13:19 ` Bernd Schmidt
@ 2015-09-23 18:45 ` Nathan Sidwell
2015-09-23 18:58 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-23 18:45 UTC (permalink / raw)
To: Bernd Schmidt, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/15 08:58, Bernd Schmidt wrote:
> On 09/23/2015 02:14 PM, Nathan Sidwell wrote:
>> On 09/23/15 06:59, Bernd Schmidt wrote:
>>> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>>>> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>>>> + /* acc_on_device must be evaluated at compile time for
>>>> + constant arguments. */
>>>> + {
>>>> + oacc_xform_on_device (call);
>>>> + rescan = true;
>>>> + }
>>>
>>> Is there a reason this is not done as part of pass_fold_builtins? (It
>>> looks like
>>> maybe adding this to fold_call_stmt in builtins.c would be sufficient
>>> too).
As I feared, builtin folding occurs in several places. In particular its first
call is very early on in the host compiler, which is far too soon.
We have to defer folding until we know whether we're doing host or device
compilation.
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 18:45 ` Nathan Sidwell
@ 2015-09-23 18:58 ` Bernd Schmidt
2015-09-23 20:08 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-23 18:58 UTC (permalink / raw)
To: Nathan Sidwell, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
> As I feared, builtin folding occurs in several places. In particular
> its first call is very early on in the host compiler, which is far too
> soon.
>
> We have to defer folding until we know whether we're doing host or
> device compilation.
Doesn't something like "symtab->state >= EXPANSION" give you that?
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 18:58 ` Bernd Schmidt
@ 2015-09-23 20:08 ` Nathan Sidwell
2015-09-25 0:15 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-23 20:08 UTC (permalink / raw)
To: Bernd Schmidt, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/15 14:51, Bernd Schmidt wrote:
> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>
>> As I feared, builtin folding occurs in several places. In particular
>> its first call is very early on in the host compiler, which is far too
>> soon.
>>
>> We have to defer folding until we know whether we're doing host or
>> device compilation.
>
> Doesn't something like "symtab->state >= EXPANSION" give you that?
I don't know. It doesn't seem to me to be a good idea for the builtin
expanders to be context-sensitive.
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-23 20:08 ` Nathan Sidwell
@ 2015-09-25 0:15 ` Nathan Sidwell
2015-09-25 11:06 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-25 0:15 UTC (permalink / raw)
To: Bernd Schmidt, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/23/15 14:58, Nathan Sidwell wrote:
> On 09/23/15 14:51, Bernd Schmidt wrote:
>> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>>
>>> As I feared, builtin folding occurs in several places. In particular
>>> its first call is very early on in the host compiler, which is far too
>>> soon.
>>>
>>> We have to defer folding until we know whether we're doing host or
>>> device compilation.
>>
>> Doesn't something like "symtab->state >= EXPANSION" give you that?
I've tried limiting expansion by checking symtab->state. I have been unable to
succeed.
It either expands too early in the host compiler, or it doesn't get expanded at
all and one ends up with an RTL call to the library function. For instance
there doesn't appear to be call to fold builtins when state == EXPANSION.
lesser values are present in the host compiler before LTO write out, AFAICT.
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 0:15 ` Nathan Sidwell
@ 2015-09-25 11:06 ` Bernd Schmidt
2015-09-25 11:13 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-25 11:06 UTC (permalink / raw)
To: Nathan Sidwell, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
On 09/25/2015 12:38 AM, Nathan Sidwell wrote:
> On 09/23/15 14:58, Nathan Sidwell wrote:
>> On 09/23/15 14:51, Bernd Schmidt wrote:
>>> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>>>> We have to defer folding until we know whether we're doing host or
>>>> device compilation.
>>>
>>> Doesn't something like "symtab->state >= EXPANSION" give you that?
>
> I've tried limiting expansion by checking symtab->state. I have been
> unable to succeed.
>
> It either expands too early in the host compiler, or it doesn't get
> expanded at all and one ends up with an RTL call to the library
> function. For instance there doesn't appear to be call to fold
> builtins when state == EXPANSION. lesser values are present in the host
> compiler before LTO write out, AFAICT.
That's a bit odd:
Breakpoint 5, (anonymous namespace)::pass_fold_builtins::execute
(this=0x1ce89a0, fun=0x7ffff0858348) at ../../git/gcc/tree-ssa-ccp.c:2722
[...]
(gdb) p stmt
$3 = (gimple *) 0x7ffff0736d80
(gdb) pgg
warning: Expression is not an assignment (and might have no effect)
# .MEM_2 = VDEF <.MEM_1(D)>
_3 = acc_on_device (123);
(gdb) p symtab->state
$4 = EXPANSION
On the other hand, it's not considered a builtin:
(gdb) p gimple_call_builtin_p(stmt, BUILT_IN_ACC_ON_DEVICE)
$6 = false
This is the c-c++-common/goacc/acc_on_device-2.c testcase. Is that
expected to be handled? If I change it to use __builtin_acc_on_device, I
can step right into
Breakpoint 8, fold_call_stmt (stmt=0x7ffff0736e10, ignore=false) at
../../git/gcc/builtins.c:12277
12277 tree ret = NULL_TREE;
Maybe you were compiling without optimization? In that case
expand_builtin_acc_on_device (which already exists) should still end up
doing the right thing. In no case should you see a RTL call to a
function, that indicates that something else went wrong.
Can you send me the patch you tried (and possibly a testcase you expect
to be handled), I'll see if I can find out what's going on.
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 11:06 ` Bernd Schmidt
@ 2015-09-25 11:13 ` Nathan Sidwell
2015-09-25 13:03 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-25 11:13 UTC (permalink / raw)
To: Bernd Schmidt, Jakub Jelinek; +Cc: Cesar Philippidis, GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2814 bytes --]
On 09/25/15 06:28, Bernd Schmidt wrote:
>
> This is the c-c++-common/goacc/acc_on_device-2.c testcase. Is that expected to
> be handled? If I change it to use __builtin_acc_on_device, I can step right into
>
> Breakpoint 8, fold_call_stmt (stmt=0x7ffff0736e10, ignore=false) at
> ../../git/gcc/builtins.c:12277
> 12277 tree ret = NULL_TREE;
>
> Maybe you were compiling without optimization? In that case
> expand_builtin_acc_on_device (which already exists) should still end up doing
> the right thing. In no case should you see a RTL call to a function, that
> indicates that something else went wrong.
I think I was reading more into the std than it intended, as it claims
on_deveice should evaluate 'to a constant'. (no mention of 'when optimizing').
It can't mean 'be useable in integral-constant-expression, as at the point we
need those, one doesn't know the value it should be.
thinking about it, I don't think a user can tell. the case I had in mind (and
have used it for), is something like
on_device (nvidia) ? asm ("NVIDIA specific asm") : c-expr
and for that to work, one must turn the optimzer on to get the dead code
removal, regardless of where on_device expands. So my goal of getting it
expanded regardless of optimization level is not needed --- indeed getting it
expanded in fold_call_stmt will mean the body of expand_on_device can go away (I
think).
From the POV of what the programmer really cares about is that when optimizing
the compiler knows how to fold it.
> Can you send me the patch you tried (and possibly a testcase you expect to be
> handled), I'll see if I can find out what's going on.
Thanks! When things didn't work, I tried getting it workong on the gomp4
branch, as I new what to expect there. So the patch is for that branch.
The fails I observed are:
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/gang-static-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0
execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/gang-static-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/gang-static-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/gang-static-2.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2
execution test
the diff I have is attached -- as you can see it's 'experimental'.
nathan
[-- Attachment #2: gomp4-ondev.patch --]
[-- Type: text/x-patch, Size: 2861 bytes --]
Index: builtins.c
===================================================================
--- builtins.c (revision 228094)
+++ builtins.c (working copy)
@@ -5866,6 +5866,8 @@ expand_stack_save (void)
static rtx
expand_builtin_acc_on_device (tree exp, rtx target)
{
+ gcc_unreachable ();
+
#ifndef ACCEL_COMPILER
gcc_assert (!get_oacc_fn_attrib (current_function_decl));
#endif
@@ -10272,6 +10274,27 @@ fold_builtin_1 (location_t loc, tree fnd
return build_empty_stmt (loc);
break;
+ case BUILT_IN_ACC_ON_DEVICE:
+ /* Don't fold on_device until we know which compiler is active. */
+ if (symtab->state == EXPANSION)
+ {
+ unsigned val_host = GOMP_DEVICE_HOST;
+ unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+ val_host = GOMP_DEVICE_NOT_HOST;
+ val_dev = ACCEL_COMPILER_acc_device;
+#endif
+ tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
+ build_int_cst (integer_type_node, val_host));
+ tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
+ build_int_cst (integer_type_node, val_dev));
+
+ tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
+ return fold_convert (integer_type_node, result);
+ }
+ break;
+
default:
break;
}
Index: omp-low.c
===================================================================
--- omp-low.c (revision 228094)
+++ omp-low.c (working copy)
@@ -14725,21 +14725,20 @@ static void
oacc_xform_on_device (gcall *call)
{
tree arg = gimple_call_arg (call, 0);
- unsigned val = GOMP_DEVICE_HOST;
-
-#ifdef ACCEL_COMPILER
- val = GOMP_DEVICE_NOT_HOST;
-#endif
- tree result = build2 (EQ_EXPR, boolean_type_node, arg,
- build_int_cst (integer_type_node, val));
+ unsigned val_host = GOMP_DEVICE_HOST;
+ unsigned val_dev = GOMP_DEVICE_NONE;
+
#ifdef ACCEL_COMPILER
- {
- tree dev = build2 (EQ_EXPR, boolean_type_node, arg,
- build_int_cst (integer_type_node,
- ACCEL_COMPILER_acc_device));
- result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
- }
+ val_host = GOMP_DEVICE_NOT_HOST;
+ val_dev = ACCEL_COMPILER_acc_device;
#endif
+
+ tree host = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node, val_host));
+ tree dev = build2 (EQ_EXPR, boolean_type_node, arg,
+ build_int_cst (integer_type_node, val_dev));
+
+ tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
result = fold_convert (integer_type_node, result);
tree lhs = gimple_call_lhs (call);
gimple_seq seq = NULL;
@@ -14879,7 +14878,7 @@ execute_oacc_transform ()
gcall *call = as_a <gcall *> (stmt);
- if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+ if (0 && gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
/* acc_on_device must be evaluated at compile time for
constant arguments. */
{
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 11:13 ` Nathan Sidwell
@ 2015-09-25 13:03 ` Bernd Schmidt
2015-09-25 13:20 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-25 13:03 UTC (permalink / raw)
To: Nathan Sidwell, Bernd Schmidt, Jakub Jelinek
Cc: Cesar Philippidis, GCC Patches
On 09/25/2015 12:56 PM, Nathan Sidwell wrote:
> On 09/25/15 06:28, Bernd Schmidt wrote:
>> Can you send me the patch you tried (and possibly a testcase you
>> expect to be
>> handled), I'll see if I can find out what's going on.
>
> Thanks! When things didn't work, I tried getting it workong on the
> gomp4 branch, as I new what to expect there. So the patch is for that
> branch.
>
> The fails I observed are:
>
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none
> execution test
Ok, I tried to compile this one. When using -O for host cc1 and ptx
lto1, I see fold_builtin_1 being executed with state == EXPANSION.
In host cc1:
10294 return fold_convert (integer_type_node, result);
(gdb) p result
$16 = <truth_or_expr 0x7ffff06e57a8>
(gdb) pge
warning: Expression is not an assignment (and might have no effect)
2 == 2 || 2 == 0
In ptx lto1:
(gdb) p result
$1 = <truth_or_expr 0x7ffff0875910>
(gdb) pge
warning: Expression is not an assignment (and might have no effect)
2 == 4 || 2 == 5
I'm not really sure about the logic, but are the results maybe switched
(returning false on the device and true on the host)?
I think the reason you're seeing calls to acc_on_device when not
optimizing is this code:
5931 /* When not optimizing, generate calls to library functions for a
certain
5932 set of builtins. */
5933 if (!optimize
5934 && !called_as_built_in (fndecl)
5935 && fcode != BUILT_IN_FORK
[...]
which should probably have the acc_on_device code added to the list.
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 13:03 ` Bernd Schmidt
@ 2015-09-25 13:20 ` Bernd Schmidt
2015-09-25 13:42 ` Bernd Schmidt
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-25 13:20 UTC (permalink / raw)
To: Bernd Schmidt, Nathan Sidwell, Jakub Jelinek
Cc: Cesar Philippidis, GCC Patches
On 09/25/2015 02:30 PM, Bernd Schmidt wrote:
> (gdb) p result
> $1 = <truth_or_expr 0x7ffff0875910>
> (gdb) pge
> warning: Expression is not an assignment (and might have no effect)
> 2 == 4 || 2 == 5
>
> I'm not really sure about the logic, but are the results maybe switched
> (returning false on the device and true on the host)?
Eh, no, the testcase seems to want to know if it's running on the host,
so that appears OK. But AFAICS it's doing the right thing. Stepping into
libgomp:
182 else if (acc_device_type (acc_dev->type) == acc_device_host)
(gdb) p acc_dev->type
$1 = OFFLOAD_TARGET_TYPE_HOST
(gdb) next
184 fn (hostaddrs);
It's not running the offloaded version, so the testcase I think should fail.
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 13:20 ` Bernd Schmidt
@ 2015-09-25 13:42 ` Bernd Schmidt
2015-09-28 13:26 ` Nathan Sidwell
0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schmidt @ 2015-09-25 13:42 UTC (permalink / raw)
To: Bernd Schmidt, Nathan Sidwell, Jakub Jelinek
Cc: Cesar Philippidis, GCC Patches
On 09/25/2015 03:03 PM, Bernd Schmidt wrote:
> 182 else if (acc_device_type (acc_dev->type) == acc_device_host)
> (gdb) p acc_dev->type
> $1 = OFFLOAD_TARGET_TYPE_HOST
> (gdb) next
> 184 fn (hostaddrs);
>
> It's not running the offloaded version, so the testcase I think should
> fail.
... and that's because my system was no longer set up to run CUDA
binaries, after I fixed that the testcase passes.
So as far as I can tell almost everything here works as expected?
Bernd
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New post-LTO OpenACC pass
2015-09-25 13:42 ` Bernd Schmidt
@ 2015-09-28 13:26 ` Nathan Sidwell
0 siblings, 0 replies; 17+ messages in thread
From: Nathan Sidwell @ 2015-09-28 13:26 UTC (permalink / raw)
To: Bernd Schmidt, Bernd Schmidt, Jakub Jelinek
Cc: Cesar Philippidis, GCC Patches
On 09/25/15 09:19, Bernd Schmidt wrote:
> On 09/25/2015 03:03 PM, Bernd Schmidt wrote:
>> 182 else if (acc_device_type (acc_dev->type) == acc_device_host)
>> (gdb) p acc_dev->type
>> $1 = OFFLOAD_TARGET_TYPE_HOST
>> (gdb) next
>> 184 fn (hostaddrs);
>>
>> It's not running the offloaded version, so the testcase I think should
>> fail.
>
> ... and that's because my system was no longer set up to run CUDA binaries,
> after I fixed that the testcase passes.
>
> So as far as I can tell almost everything here works as expected?
hm strange. will take another look this week. Thanks for looking.
nathan
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-09-28 12:33 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-21 16:39 New post-LTO OpenACC pass Nathan Sidwell
2015-09-21 21:03 ` Cesar Philippidis
2015-09-21 21:15 ` Nathan Sidwell
2015-09-22 15:22 ` Nathan Sidwell
2015-09-23 11:10 ` Bernd Schmidt
2015-09-23 12:40 ` Nathan Sidwell
2015-09-23 13:19 ` Bernd Schmidt
2015-09-23 18:45 ` Nathan Sidwell
2015-09-23 18:58 ` Bernd Schmidt
2015-09-23 20:08 ` Nathan Sidwell
2015-09-25 0:15 ` Nathan Sidwell
2015-09-25 11:06 ` Bernd Schmidt
2015-09-25 11:13 ` Nathan Sidwell
2015-09-25 13:03 ` Bernd Schmidt
2015-09-25 13:20 ` Bernd Schmidt
2015-09-25 13:42 ` Bernd Schmidt
2015-09-28 13:26 ` Nathan Sidwell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).