public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [omp] Create openmp -fopt-info optimization group
@ 2016-03-16 14:13     ` Martin Jambor
  2017-02-21  8:09       ` [gomp4] add -finform-parallelism Cesar Philippidis
  0 siblings, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-03-16 14:13 UTC (permalink / raw)
  To: GCC Patches

Hi,

the following patch does two things.  First, it creates a new optinfo
group for OpenMP and moves OpenMP lowering and expansion to this
group.  Second, it changes all gridification MSG_NOTE dumps to
MSG_MISSED_OPTIMIZATION, which is more appropriate.  (Apparently, I
remembered to change the dump about performed gridification to
MSG_OPTIMIZED_LOCATIONS last autumn but failed to do it for dumps with
failure reasons).

With these changes, users that configured their compiler with HSA can
use (for example) the -fopt-info-all-openmp option to get information
about which target constructs have been gridified and which were not:

  mjambor@virgil:~/gcc/hsa/tests/grid$ ~/gcc/hsa/inst/bin/gcc -fopenmp -O combined-hsa.c -fopt-info-all-openmp
  combined-hsa.c:9:9: note: Target construct will be turned into a gridified GPGPU kernel

or

  /home/mjambor/gcc/hsa/src/libgomp/testsuite/libgomp.c/examples-4/target_data-3.c:50:10: note: Will not turn target construct into a simple GPGPU kernel because it does not have a sole teams construct in it.

and so forth.

I have bootstrapped and tested the patch on x86_64-linux (with and
without configured HSA) and by running make info and examining the
generated info files.  Since it is only a dumping change, I'd like to
propose it for trunk even at this late stage.  If release managers
however do not think it is desirable, I'll commit it to the hsa branch
and propose to trunk again once stage1 opens.

Thanks,

Martin


2016-03-14  Martin Jambor  <mjambor@suse.cz>

	* doc/invoke.texi (-fopt-info): Document openmp optimization group.
	* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.
	* dumpfile.c (optgroup_options): Add entry for OpenMP optimizations.
	* dumpfile.h (OPTGROUP_OPENMP): New define.
	* omp-low.c (pass_data_expand_omp): Change optinfo_flags to
	OPTGROUP_OPENMP.
	(pass_data_expand_omp_ssa): Likewise.
	(pass_data_lower_omp): Likewise.
	(pass_data_omp_simd_clone): Likewise.
	(grid_find_single_omp_among_assignments_1): Changed all occurrences of
	MSG_NOTE to MSG_MISSED_OPTIMIZATION.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_target_follows_gridifiable_pattern): Likewise.
---
 gcc/doc/invoke.texi  |  2 ++
 gcc/doc/optinfo.texi |  3 +++
 gcc/dumpfile.c       |  1 +
 gcc/dumpfile.h       |  3 ++-
 gcc/omp-low.c        | 56 ++++++++++++++++++++++++++--------------------------
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99ac11b..5c798a4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12194,6 +12194,8 @@ Enable dumps from all interprocedural optimizations.
 Enable dumps from all loop optimizations.
 @item inline
 Enable dumps from all inlining optimizations.
+@item openmp
+Enable dumps from OpenMP optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
 @item optall
diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 3c8fdba..20ca560 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
+@item OPTGROUP_OPENMP
+OpenMP passes. Enabled by @option{-openmp}.
+
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 144e371..f2430f3 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -136,6 +136,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
+  {"openmp", OPTGROUP_OPENMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index c168cbf..72f696b 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,7 +97,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
+#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 82dec9d..6f42717 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13990,7 +13990,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -14037,7 +14037,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -17210,7 +17210,7 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
 	  if (*ret)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, target_loc,
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, target_loc,
 				 "Will not turn target construct into a simple "
 				 "GPGPU kernel because %s construct contains "
 				 "multiple OpenMP constructs\n", name);
@@ -17221,7 +17221,7 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
       else
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, target_loc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, target_loc,
 			     "Will not turn target construct into a simple "
 			     "GPGPU kernel because %s construct contains "
 			     "a complex statement\n", name);
@@ -17244,7 +17244,7 @@ grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
   if (!seq)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, target_loc,
 			 "Will not turn target construct into a simple "
 			 "GPGPU kernel because %s construct has empty "
 			 "body\n",
@@ -17256,7 +17256,7 @@ grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
   if (grid_find_single_omp_among_assignments_1 (seq, target_loc, name, &ret))
     {
       if (!ret && dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, target_loc,
 			 "Will not turn target construct into a simple "
 			 "GPGPU kernel because %s construct does not contain"
 			 "any other OpenMP construct\n", name);
@@ -17340,7 +17340,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   tree group_size = NULL;
   if (!teams)
     {
-      dump_printf_loc (MSG_NOTE, tloc,
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 		       "Will not turn target construct into a simple "
 		       "GPGPU kernel because it does not have a sole teams "
 		       "construct in it.\n");
@@ -17354,7 +17354,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	{
 	case OMP_CLAUSE_NUM_TEAMS:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because we cannot "
 			     "handle num_teams clause of teams "
@@ -17363,7 +17363,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a reduction "
 			     "clause is present\n ");
@@ -17371,7 +17371,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_LASTPRIVATE:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a lastprivate "
 			     "clause is present\n ");
@@ -17394,7 +17394,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   gomp_for *dist = dyn_cast <gomp_for *> (stmt);
   if (!dist)
     {
-      dump_printf_loc (MSG_NOTE, tloc,
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 		       "Will not turn target construct into a simple "
 		       "GPGPU kernel because the teams construct  does not have "
 		       "a sole distribute construct in it.\n");
@@ -17405,7 +17405,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   if (!gimple_omp_for_combined_p (dist))
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			 "Will not turn target construct into a gridified GPGPU "
 			 "kernel because we cannot handle a standalone "
 			 "distribute construct\n ");
@@ -17414,7 +17414,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   if (dist->collapse > 1)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			 "Will not turn target construct into a gridified GPGPU "
 			 "kernel because the distribute construct contains "
 			 "collapse clause\n");
@@ -17427,7 +17427,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
       if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because the teams "
 			     "thread limit is different from distribute "
@@ -17449,7 +17449,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	{
 	case OMP_CLAUSE_NUM_THREADS:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a gridified"
 			     "GPGPU kernel because there is a num_threads "
 			     "clause of the parallel construct\n");
@@ -17457,7 +17457,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a reduction "
 			     "clause is present\n ");
@@ -17465,7 +17465,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_LASTPRIVATE:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a lastprivate "
 			     "clause is present\n ");
@@ -17486,7 +17486,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			 "Will not turn target construct into a gridified GPGPU "
 			 "kernel because the inner loop is not a simple for "
 			 "loop\n");
@@ -17495,7 +17495,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   if (gfor->collapse > 1)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			 "Will not turn target construct into a gridified GPGPU "
 			 "kernel because the inner loop contains collapse "
 			 "clause\n");
@@ -17505,7 +17505,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
   if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor)))
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			 "Will not turn target construct into a gridified GPGPU "
 			 "kernel because the inner loop pre_body contains"
 			 "a complex instruction\n");
@@ -17521,7 +17521,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	  if (OMP_CLAUSE_SCHEDULE_KIND (clauses) != OMP_CLAUSE_SCHEDULE_AUTO)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, tloc,
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 				 "Will not turn target construct into a "
 				 "gridified GPGPU kernel because the inner "
 				 "loop has a non-automatic scheduling clause\n");
@@ -17531,7 +17531,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a reduction "
 			     "clause is present\n ");
@@ -17539,7 +17539,7 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 
 	case OMP_CLAUSE_LASTPRIVATE:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a "
 			     "gridified GPGPU kernel because a lastprivate "
 			     "clause is present\n ");
@@ -17561,17 +17561,17 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
       if (dump_enabled_p ())
 	{
 	  if (is_gimple_call (bad))
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a gridified "
 			     " GPGPU kernel because the inner loop contains "
 			     "call to a noreturn function\n");
 	  if (gimple_code (bad) == GIMPLE_OMP_FOR)
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a gridified "
 			     " GPGPU kernel because the inner loop contains "
 			     "a simd construct\n");
 	  else
-	    dump_printf_loc (MSG_NOTE, tloc,
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
 			     "Will not turn target construct into a gridified "
 			     "GPGPU kernel because the inner loop contains "
 			     "statement %s which cannot be transformed\n",
@@ -17895,7 +17895,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp, /* properties_provided */
@@ -19895,7 +19895,7 @@ const pass_data pass_data_omp_simd_clone =
 {
   SIMPLE_IPA_PASS,		/* type */
   "simdclone",			/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OPENMP,		/* optinfo_flags */
   TV_NONE,			/* tv_id */
   ( PROP_ssa | PROP_cfg ),	/* properties_required */
   0,				/* properties_provided */
-- 
2.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0/4] Merge from HSA branch to trunk
@ 2016-11-13 23:20 Martin Jambor
  2016-11-13 23:20 ` [PATCH 4/4] Back-end and IPA bits of hsa branch merge Martin Jambor
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Martin Jambor @ 2016-11-13 23:20 UTC (permalink / raw)
  To: GCC Patches

Hello,

this series is a merge from what is ready for trunk in the HSA
branch.

The first patch is self-contained and I intend to commit it
separately, the other three need to be committed together but I split
the change into these pieces, because I believe they will be easier to
review that way and because I have the authority to self-approve the
last one so although any comments are of course welcome, review of it
is not strictly required.

More details are in the individual email messages.

Thanks,

Martin


Martin Jambor (4):
  Remove HSA build dependence
  HSA specific built-ins
  OpenMP lowering changes from the hsa branch
  Back-end and IPA bits of hsa branch merge

 gcc/Makefile.in                               |    3 +-
 gcc/builtins.def                              |   16 +
 gcc/doc/install.texi                          |    6 -
 gcc/doc/optinfo.texi                          |    3 +
 gcc/dumpfile.c                                |    1 +
 gcc/dumpfile.h                                |    3 +-
 gcc/fortran/f95-lang.c                        |   11 +
 gcc/gimple.h                                  |   57 +
 gcc/hsa-brig.c                                |  140 ++-
 gcc/hsa-builtins.def                          |   39 +
 gcc/hsa-dump.c                                |  107 +-
 gcc/hsa-gen.c                                 |  914 ++++++++-------
 gcc/hsa.c                                     |   60 +-
 gcc/hsa.h                                     |  157 ++-
 gcc/ipa-hsa.c                                 |   14 +-
 gcc/omp-low.c                                 | 1543 ++++++++++++++++++-------
 gcc/testsuite/c-c++-common/gomp/gridify-2.c   |   66 ++
 gcc/testsuite/c-c++-common/gomp/gridify-3.c   |   68 ++
 libgomp/config.h.in                           |    3 +
 libgomp/configure                             |   56 +-
 libgomp/plugin/configfrag.ac                  |   32 +-
 libgomp/plugin/hsa.h                          |  630 ++++++++++
 libgomp/plugin/hsa_ext_finalize.h             |  265 +++++
 libgomp/plugin/plugin-hsa.c                   |  471 ++++++--
 libgomp/testsuite/lib/libgomp.exp             |    4 -
 libgomp/testsuite/libgomp-test-support.exp.in |    1 -
 libgomp/testsuite/libgomp.hsa.c/bits-insns.c  |   73 ++
 libgomp/testsuite/libgomp.hsa.c/tiling-1.c    |  212 ++++
 libgomp/testsuite/libgomp.hsa.c/tiling-2.c    |  258 +++++
 29 files changed, 3992 insertions(+), 1221 deletions(-)
 create mode 100644 gcc/hsa-builtins.def
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-3.c
 create mode 100644 libgomp/plugin/hsa.h
 create mode 100644 libgomp/plugin/hsa_ext_finalize.h
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bits-insns.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-2.c

-- 
2.10.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/4] OpenMP lowering changes from the hsa branch
  2016-11-13 23:20 [PATCH 0/4] Merge from HSA branch to trunk Martin Jambor
                   ` (2 preceding siblings ...)
  2016-11-13 23:20 ` [PATCH 2/4] HSA specific built-ins Martin Jambor
@ 2016-11-13 23:20 ` Martin Jambor
  2016-11-18 10:39   ` Jakub Jelinek
  3 siblings, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-11-13 23:20 UTC (permalink / raw)
  To: GCC Patches

Hello,

this email contains the OpenMP bits of the HSA branch merge.  I have
covered the new functionality in my talk at cauldron and in the email
messages announcing commits to the branch, but in short, the patch
below allows gridification standalone distribute constructs if their
step is as big as iteration space of the normal parallel loop
constructs in it, it implements the lastprivate clause and hides away
simds because the HSA model does not really benefit from them.  It
also puts OpenMP passes into its own optimization group so that users
can request optimization info about gridification alone.

Early in stage three I'd like to divide omp-low.c into multiple files
and put the gridification-specific ones into one of them, hopefully
making it more easy to follow.

I expect feedback and comments but I'd really be greateful if these
changes make it to gcc7, despite this rather late submission, they
should not affect non-HSA code generation in any way (and apart from
the optimization group stuff, really only two hunks even touch it, I
believe).

Thanks,

Martin


2016-11-11  Martin Jambor  <mjambor@suse.cz>

gcc/
	* dumpfile.h (OPTGROUP_OPENMP): Define.
	* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
	* gimple.h (gf_mask): Added elements GF_OMP_FOR_GRID_INTRA_GROUP and
	GF_OMP_FOR_GRID_GROUP_ITER.
	(gimple_omp_for_grid_phony): Added checking assert.
	(gimple_omp_for_set_grid_phony): Likewise.
	(gimple_omp_for_grid_intra_group): New function.
	(gimple_omp_for_set_grid_intra_group): Likewise.
	(gimple_omp_for_grid_group_iter): Likewise.
	(gimple_omp_for_set_grid_group_iter): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Allow GRID loop where
	previosuly only distribute loop was permitted.
	(lower_lastprivate_clauses): Allow non tcc_comparison predicates.
	(grid_get_kernel_launch_attributes): Support multiple HSA grid
	dimensions.
	(grid_expand_omp_for_loop): Likewise and also support standalone
	distribute constructs.  New parameter INTRA_GROUP, updated both users.
	(grid_expand_target_grid_body): Support standalone distribute
	constructs.
	(pass_data_expand_omp): Changed optinfo_flags to OPTGROUP_OPENMP.
	(pass_data_expand_omp_ssa): Likewise.
	(pass_data_lower_omp): Likewise.
	(grid_lastprivate_predicate): New function.
	(lower_omp_for_lastprivate): Call grid_lastprivate_predicate for
	gridified loops.
	(lower_omp_for): Support standalone distribute constructs.
	(grid_prop): New type.
	(grid_safe_assignment_p): Check for assignments to group_sizes, new
	parameter GRID.
	(grid_seq_only_contains_local_assignments): New parameter GRID, pass
	it to callee.
	(grid_find_single_omp_among_assignments_1): Likewise, improve missed
	optimization info messages.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_find_ungridifiable_statement): Do not bail out for SIMDs.
	(grid_parallel_clauses_gridifiable): New function.
	(grid_inner_loop_gridifiable_p): Likewise.
	(grid_dist_follows_simple_pattern): Likewise.
	(grid_gfor_follows_tiling_pattern): Likewise.
	(grid_call_permissible_in_distribute_p): Likewise.
	(grid_handle_call_in_distribute): Likewise.
	(grid_dist_follows_tiling_pattern): Likewise.
	(grid_target_follows_gridifiable_pattern): Support standalone distribute
	constructs.
	(grid_var_segment): New enum.
	(grid_mark_variable_segment): New function.
	(grid_copy_leading_local_assignments): Call grid_mark_variable_segment
	if a new argument says so.
	(grid_process_grid_body): New function.
	(grid_eliminate_combined_simd_part): Likewise.
	(grid_mark_tiling_loops): Likewise.
	(grid_mark_tiling_parallels_and_loops): Likewise.
	(grid_process_kernel_body_copy): Support standalone distribute
	constructs.
	(grid_attempt_target_gridification): New grid variable holding overall
	gridification state.  Support standalone distribute constructs and
	collapse clauses.
	* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.

gcc/testsuite/
	* c-c++-common/gomp/gridify-2.c: New test.
	* c-c++-common/gomp/gridify-3.c: Likewise.

libgomp/
	* testsuite/libgomp.hsa.c/tiling-1.c: New test.
	* testsuite/libgomp.hsa.c/tiling-2.c: Likewise.
---
 gcc/doc/optinfo.texi                        |    3 +
 gcc/dumpfile.c                              |    1 +
 gcc/dumpfile.h                              |    3 +-
 gcc/gimple.h                                |   57 +
 gcc/omp-low.c                               | 1543 ++++++++++++++++++++-------
 gcc/testsuite/c-c++-common/gomp/gridify-2.c |   66 ++
 gcc/testsuite/c-c++-common/gomp/gridify-3.c |   68 ++
 libgomp/testsuite/libgomp.hsa.c/tiling-1.c  |  212 ++++
 libgomp/testsuite/libgomp.hsa.c/tiling-2.c  |  258 +++++
 9 files changed, 1805 insertions(+), 406 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-2.c

diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 3c8fdba..20ca560 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
+@item OPTGROUP_OPENMP
+OpenMP passes. Enabled by @option{-openmp}.
+
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 74522a6..3f03132 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -136,6 +136,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
+  {"openmp", OPTGROUP_OPENMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 3f08b16..85bec6a 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,7 +97,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
+#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0eafada..0d0296e 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -163,7 +163,13 @@ enum gf_mask {
     GF_OMP_FOR_KIND_CILKSIMD	= GF_OMP_FOR_SIMD | 1,
     GF_OMP_FOR_COMBINED		= 1 << 4,
     GF_OMP_FOR_COMBINED_INTO	= 1 << 5,
+    /* The following flag must not be used on GF_OMP_FOR_KIND_GRID_LOOP loop
+       statements.  */
     GF_OMP_FOR_GRID_PHONY	= 1 << 6,
+    /* The following two flags should only be set on GF_OMP_FOR_KIND_GRID_LOOP
+       loop statements.  */
+    GF_OMP_FOR_GRID_INTRA_GROUP	= 1 << 6,
+    GF_OMP_FOR_GRID_GROUP_ITER  = 1 << 7,
     GF_OMP_TARGET_KIND_MASK	= (1 << 4) - 1,
     GF_OMP_TARGET_KIND_REGION	= 0,
     GF_OMP_TARGET_KIND_DATA	= 1,
@@ -5143,6 +5149,8 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
 static inline bool
 gimple_omp_for_grid_phony (const gomp_for *omp_for)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       != GF_OMP_FOR_KIND_GRID_LOOP);
   return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
 }
 
@@ -5151,12 +5159,61 @@ gimple_omp_for_grid_phony (const gomp_for *omp_for)
 static inline void
 gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       != GF_OMP_FOR_KIND_GRID_LOOP);
   if (value)
     omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
   else
     omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
 }
 
+/* Return the kernel_intra_group of a GRID_LOOP OMP_FOR statement.  */
+
+static inline bool
+gimple_omp_for_grid_intra_group (const gomp_for *omp_for)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_INTRA_GROUP) != 0;
+}
+
+/* Set kernel_intra_group flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_intra_group (gomp_for *omp_for, bool value)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  if (value)
+    omp_for->subcode |= GF_OMP_FOR_GRID_INTRA_GROUP;
+  else
+    omp_for->subcode &= ~GF_OMP_FOR_GRID_INTRA_GROUP;
+}
+
+/* Return true if iterations of a grid OMP_FOR statement correspond to HSA
+   groups.  */
+
+static inline bool
+gimple_omp_for_grid_group_iter (const gomp_for *omp_for)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_GROUP_ITER) != 0;
+}
+
+/* Set group_iter flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_group_iter (gomp_for *omp_for, bool value)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  if (value)
+    omp_for->subcode |= GF_OMP_FOR_GRID_GROUP_ITER;
+  else
+    omp_for->subcode &= ~GF_OMP_FOR_GRID_GROUP_ITER;
+}
+
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
 static inline tree
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 331da6a..d968cec 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3294,8 +3294,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
       else if (gimple_code (ctx->stmt) == GIMPLE_OMP_TEAMS)
 	{
 	  if ((gimple_code (stmt) != GIMPLE_OMP_FOR
-	       || (gimple_omp_for_kind (stmt)
-		   != GF_OMP_FOR_KIND_DISTRIBUTE))
+	       || ((gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_DISTRIBUTE)
+		   && (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP)))
 	      && gimple_code (stmt) != GIMPLE_OMP_PARALLEL)
 	    {
 	      error_at (gimple_location (stmt),
@@ -5420,15 +5420,25 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
     {
       gcond *stmt;
       tree label_true, arm1, arm2;
+      enum tree_code pred_code = TREE_CODE (predicate);
 
       label = create_artificial_label (UNKNOWN_LOCATION);
       label_true = create_artificial_label (UNKNOWN_LOCATION);
-      arm1 = TREE_OPERAND (predicate, 0);
-      arm2 = TREE_OPERAND (predicate, 1);
-      gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
-      gimplify_expr (&arm2, stmt_list, NULL, is_gimple_val, fb_rvalue);
-      stmt = gimple_build_cond (TREE_CODE (predicate), arm1, arm2,
-				label_true, label);
+      if (TREE_CODE_CLASS (pred_code) == tcc_comparison)
+	{
+	  arm1 = TREE_OPERAND (predicate, 0);
+	  arm2 = TREE_OPERAND (predicate, 1);
+	  gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	  gimplify_expr (&arm2, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	}
+      else
+	{
+	  arm1 = predicate;
+	  gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	  arm2 = boolean_false_node;
+	  pred_code = NE_EXPR;
+	}
+      stmt = gimple_build_cond (pred_code, arm1, arm2, label_true, label);
       gimple_seq_add_stmt (stmt_list, stmt);
       gimple_seq_add_stmt (stmt_list, gimple_build_label (label_true));
     }
@@ -12917,7 +12927,6 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
 				   gomp_target *tgt_stmt)
 {
   grid_create_kernel_launch_attr_types ();
-  tree u32_one = build_one_cst (uint32_type_node);
   tree lattrs = create_tmp_var (grid_attr_trees->kernel_launch_attributes_type,
 				"__kernel_launch_attrs");
 
@@ -12942,10 +12951,10 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
 
   tree dimref = build3 (COMPONENT_REF, uint32_type_node, lattrs,
 			grid_attr_trees->kernel_lattrs_dimnum_decl, NULL_TREE);
-  /* At this moment we cannot gridify a loop with a collapse clause.  */
-  /* TODO: Adjust when we support bigger collapse.  */
-  gcc_assert (max_dim == 0);
-  gsi_insert_before (gsi, gimple_build_assign (dimref, u32_one), GSI_SAME_STMT);
+  gcc_checking_assert (max_dim <= 2);
+  tree dimensions = build_int_cstu (uint32_type_node, max_dim + 1);
+  gsi_insert_before (gsi, gimple_build_assign (dimref, dimensions),
+		     GSI_SAME_STMT);
   TREE_ADDRESSABLE (lattrs) = 1;
   return build_fold_addr_expr (lattrs);
 }
@@ -13591,59 +13600,81 @@ expand_omp_target (struct omp_region *region)
     }
 }
 
-/* Expand KFOR loop as a GPGPU kernel, i.e. as a body only with iteration
-   variable derived from the thread number.  */
+/* Expand KFOR loop as a HSA grifidied kernel, i.e. as a body only with
+   iteration variable derived from the thread number.  INTRA_GROUP means this
+   is an expansion of a loop iterating over work-items within a separate
+   iteration over groups. */
 
 static void
-grid_expand_omp_for_loop (struct omp_region *kfor)
+grid_expand_omp_for_loop (struct omp_region *kfor, bool intra_group)
 {
-  tree t, threadid;
-  tree type, itype;
   gimple_stmt_iterator gsi;
-  tree n1, step;
-  struct omp_for_data fd;
-
   gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
   gcc_checking_assert (gimple_omp_for_kind (for_stmt)
 		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  size_t collapse = gimple_omp_for_collapse (for_stmt);
+  struct omp_for_data_loop *loops
+    = (struct omp_for_data_loop *)
+    alloca (gimple_omp_for_collapse (for_stmt)
+	    * sizeof (struct omp_for_data_loop));
+
+  struct omp_for_data fd;
+
+  remove_edge (BRANCH_EDGE (kfor->entry));
   basic_block body_bb = FALLTHRU_EDGE (kfor->entry)->dest;
 
-  gcc_assert (gimple_omp_for_collapse (for_stmt) == 1);
   gcc_assert (kfor->cont);
-  extract_omp_for_data (for_stmt, &fd, NULL);
-
-  itype = type = TREE_TYPE (fd.loop.v);
-  if (POINTER_TYPE_P (type))
-    itype = signed_type_for (type);
+  extract_omp_for_data (for_stmt, &fd, loops);
 
   gsi = gsi_start_bb (body_bb);
 
-  n1 = fd.loop.n1;
-  step = fd.loop.step;
-  n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
-				 true, NULL_TREE, true, GSI_SAME_STMT);
-  step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
-				   true, NULL_TREE, true, GSI_SAME_STMT);
-  threadid = build_call_expr (builtin_decl_explicit
-			      (BUILT_IN_OMP_GET_THREAD_NUM), 0);
-  threadid = fold_convert (itype, threadid);
-  threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
-				       true, GSI_SAME_STMT);
+  for (size_t dim = 0; dim < collapse; dim++)
+    {
+      tree type, itype;
+      itype = type = TREE_TYPE (fd.loops[dim].v);
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
 
-  tree startvar = fd.loop.v;
-  t = fold_build2 (MULT_EXPR, itype, threadid, step);
-  if (POINTER_TYPE_P (type))
-    t = fold_build_pointer_plus (n1, t);
-  else
-    t = fold_build2 (PLUS_EXPR, type, t, n1);
-  t = fold_convert (type, t);
-  t = force_gimple_operand_gsi (&gsi, t,
-				DECL_P (startvar)
-				&& TREE_ADDRESSABLE (startvar),
-				NULL_TREE, true, GSI_SAME_STMT);
-  gassign *assign_stmt = gimple_build_assign (startvar, t);
-  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+      tree n1 = fd.loops[dim].n1;
+      tree step = fd.loops[dim].step;
+      n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
+				     true, NULL_TREE, true, GSI_SAME_STMT);
+      step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
+				       true, NULL_TREE, true, GSI_SAME_STMT);
+      tree threadid;
+      if (gimple_omp_for_grid_group_iter (for_stmt))
+	{
+	  gcc_checking_assert (!intra_group);
+	  threadid = build_call_expr (builtin_decl_explicit
+				      (BUILT_IN_HSA_WORKGROUPID), 1,
+				      build_int_cstu (unsigned_type_node, dim));
+	}
+      else if (intra_group)
+	threadid = build_call_expr (builtin_decl_explicit
+				    (BUILT_IN_HSA_WORKITEMID), 1,
+				    build_int_cstu (unsigned_type_node, dim));
+      else
+	threadid = build_call_expr (builtin_decl_explicit
+				    (BUILT_IN_HSA_WORKITEMABSID), 1,
+				    build_int_cstu (unsigned_type_node, dim));
+      threadid = fold_convert (itype, threadid);
+      threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
+					   true, GSI_SAME_STMT);
 
+      tree startvar = fd.loops[dim].v;
+      tree t = fold_build2 (MULT_EXPR, itype, threadid, step);
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (n1, t);
+      else
+	t = fold_build2 (PLUS_EXPR, type, t, n1);
+      t = fold_convert (type, t);
+      t = force_gimple_operand_gsi (&gsi, t,
+				    DECL_P (startvar)
+				    && TREE_ADDRESSABLE (startvar),
+				    NULL_TREE, true, GSI_SAME_STMT);
+      gassign *assign_stmt = gimple_build_assign (startvar, t);
+      gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+    }
   /* Remove the omp for statement */
   gsi = gsi_last_bb (kfor->entry);
   gsi_remove (&gsi, true);
@@ -13654,10 +13685,12 @@ grid_expand_omp_for_loop (struct omp_region *kfor)
 	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_CONTINUE);
   gsi_remove (&gsi, true);
 
-  /* Replace the GIMPLE_OMP_RETURN with a real return.  */
+  /* Replace the GIMPLE_OMP_RETURN with a barrier, if necessary.  */
   gsi = gsi_last_bb (kfor->exit);
   gcc_assert (!gsi_end_p (gsi)
 	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  if (intra_group)
+    gsi_insert_before (&gsi, build_omp_barrier (NULL_TREE), GSI_SAME_STMT);
   gsi_remove (&gsi, true);
 
   /* Fixup the much simpler CFG.  */
@@ -13696,7 +13729,7 @@ grid_remap_kernel_arg_accesses (tree *tp, int *walk_subtrees, void *data)
 static void expand_omp (struct omp_region *region);
 
 /* If TARGET region contains a kernel body for loop, remove its region from the
-   TARGET and expand it in GPGPU kernel fashion. */
+   TARGET and expand it in HSA gridified kernel fashion. */
 
 static void
 grid_expand_target_grid_body (struct omp_region *target)
@@ -13738,11 +13771,29 @@ grid_expand_target_grid_body (struct omp_region *target)
 
   struct omp_region *kfor = *pp;
   gcc_assert (kfor);
-  gcc_assert (gimple_omp_for_kind (last_stmt ((kfor)->entry))
-	      == GF_OMP_FOR_KIND_GRID_LOOP);
+  gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
+  gcc_assert (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP);
   *pp = kfor->next;
   if (kfor->inner)
-    expand_omp (kfor->inner);
+    {
+      if (gimple_omp_for_grid_group_iter (for_stmt))
+	{
+	  struct omp_region **next_pp;
+	  for (pp = &kfor->inner; *pp; pp = next_pp)
+	    {
+	      next_pp = &(*pp)->next;
+	      if ((*pp)->type != GIMPLE_OMP_FOR)
+		continue;
+	      gomp_for *inner = as_a <gomp_for *> (last_stmt ((*pp)->entry));
+	      gcc_assert (gimple_omp_for_kind (inner)
+			  == GF_OMP_FOR_KIND_GRID_LOOP);
+	      grid_expand_omp_for_loop (*pp, true);
+	      *pp = (*pp)->next;
+	      next_pp = pp;
+	    }
+	}
+      expand_omp (kfor->inner);
+    }
   if (gpukernel->inner)
     expand_omp (gpukernel->inner);
 
@@ -13772,8 +13823,7 @@ grid_expand_target_grid_body (struct omp_region *target)
   struct function *kern_cfun = DECL_STRUCT_FUNCTION (kern_fndecl);
   kern_cfun->curr_properties = cfun->curr_properties;
 
-  remove_edge (BRANCH_EDGE (kfor->entry));
-  grid_expand_omp_for_loop (kfor);
+  grid_expand_omp_for_loop (kfor, false);
 
   /* Remove the omp for statement */
   gimple_stmt_iterator gsi = gsi_last_bb (gpukernel->entry);
@@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -14180,7 +14230,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -15000,6 +15050,46 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   BLOCK_VARS (block) = gimple_bind_vars (bind);
 }
 
+/* Return the lastprivate predicate for a given gridified loop described by FD).
+   TODO: When grid stuff is moved to a separate file, move this too.  */
+
+static tree
+grid_lastprivate_predicate (struct omp_for_data *fd)
+{
+  /* When dealing with a gridified loop, we need to check up to three collapsed
+     iteration variables but they are not actually captured in this fd.
+     Fortunately, we can easily rely on HSA builtins to get this
+     information. */
+
+  tree id, size;
+  if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP
+      && gimple_omp_for_grid_intra_group (fd->for_stmt))
+    {
+      id = builtin_decl_explicit (BUILT_IN_HSA_WORKITEMID);
+      size = builtin_decl_explicit (BUILT_IN_HSA_CURRENTWORKGROUPSIZE);
+    }
+  else
+    {
+      id = builtin_decl_explicit (BUILT_IN_HSA_WORKITEMABSID);
+      size = builtin_decl_explicit (BUILT_IN_HSA_GRIDSIZE);
+    }
+  tree cond = NULL;
+  for (int dim = 0; dim < fd->collapse; dim++)
+    {
+      tree dim_tree = build_int_cstu (unsigned_type_node, dim);
+      tree u1 = build_int_cstu (unsigned_type_node, 1);
+      tree c2
+	= build2 (EQ_EXPR, boolean_type_node,
+		  build2 (PLUS_EXPR, unsigned_type_node,
+			  build_call_expr (id, 1, dim_tree), u1),
+		  build_call_expr (size, 1, dim_tree));
+      if (cond)
+	cond = build2 (TRUTH_AND_EXPR, boolean_type_node, cond, c2);
+      else
+	cond = c2;
+    }
+  return cond;
+}
 
 /* A subroutine of lower_omp_for.  Generate code to emit the predicate
    for a lastprivate clause.  Given a loop control predicate of (V
@@ -15027,58 +15117,65 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
 	cond_code = EQ_EXPR;
     }
 
-  tree n2 = fd->loop.n2;
-  if (fd->collapse > 1
-      && TREE_CODE (n2) != INTEGER_CST
-      && gimple_omp_for_combined_into_p (fd->for_stmt))
+  if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP
+      || gimple_omp_for_grid_phony (fd->for_stmt))
+    cond = grid_lastprivate_predicate (fd);
+  else
     {
-      struct omp_context *taskreg_ctx = NULL;
-      if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
+      tree n2 = fd->loop.n2;
+      if (fd->collapse > 1
+	  && TREE_CODE (n2) != INTEGER_CST
+	  && gimple_omp_for_combined_into_p (fd->for_stmt))
 	{
-	  gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
-	  if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
-	      || gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
+	  struct omp_context *taskreg_ctx = NULL;
+	  if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
 	    {
-	      if (gimple_omp_for_combined_into_p (gfor))
-		{
-		  gcc_assert (ctx->outer->outer
-			      && is_parallel_ctx (ctx->outer->outer));
-		  taskreg_ctx = ctx->outer->outer;
-		}
-	      else
+	      gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
+	      if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
+		  || gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
 		{
-		  struct omp_for_data outer_fd;
-		  extract_omp_for_data (gfor, &outer_fd, NULL);
-		  n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
+		  if (gimple_omp_for_combined_into_p (gfor))
+		    {
+		      gcc_assert (ctx->outer->outer
+				  && is_parallel_ctx (ctx->outer->outer));
+		      taskreg_ctx = ctx->outer->outer;
+		    }
+		  else
+		    {
+		      struct omp_for_data outer_fd;
+		      extract_omp_for_data (gfor, &outer_fd, NULL);
+		      n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
+		    }
 		}
+	      else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
+		taskreg_ctx = ctx->outer->outer;
 	    }
-	  else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
-	    taskreg_ctx = ctx->outer->outer;
-	}
-      else if (is_taskreg_ctx (ctx->outer))
-	taskreg_ctx = ctx->outer;
-      if (taskreg_ctx)
-	{
-	  int i;
-	  tree innerc
-	    = find_omp_clause (gimple_omp_taskreg_clauses (taskreg_ctx->stmt),
-			       OMP_CLAUSE__LOOPTEMP_);
-	  gcc_assert (innerc);
-	  for (i = 0; i < fd->collapse; i++)
+	  else if (is_taskreg_ctx (ctx->outer))
+	    taskreg_ctx = ctx->outer;
+	  if (taskreg_ctx)
 	    {
+	      int i;
+	      tree taskreg_clauses
+		= gimple_omp_taskreg_clauses (taskreg_ctx->stmt);
+	      tree innerc = find_omp_clause (taskreg_clauses,
+					     OMP_CLAUSE__LOOPTEMP_);
+	      gcc_assert (innerc);
+	      for (i = 0; i < fd->collapse; i++)
+		{
+		  innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
+					    OMP_CLAUSE__LOOPTEMP_);
+		  gcc_assert (innerc);
+		}
 	      innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
 					OMP_CLAUSE__LOOPTEMP_);
-	      gcc_assert (innerc);
+	      if (innerc)
+		n2 = fold_convert (TREE_TYPE (n2),
+				   lookup_decl (OMP_CLAUSE_DECL (innerc),
+						taskreg_ctx));
 	    }
-	  innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
-				    OMP_CLAUSE__LOOPTEMP_);
-	  if (innerc)
-	    n2 = fold_convert (TREE_TYPE (n2),
-			       lookup_decl (OMP_CLAUSE_DECL (innerc),
-					    taskreg_ctx));
 	}
+      cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
     }
-  cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
 
   clauses = gimple_omp_for_clauses (fd->for_stmt);
   stmts = NULL;
@@ -15247,11 +15344,13 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						ctx);
 	}
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  bool phony_loop = (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP
+		     && gimple_omp_for_grid_phony (stmt));
+  if (!phony_loop)
     gimple_seq_add_stmt (&body, stmt);
   gimple_seq_add_seq (&body, gimple_omp_body (stmt));
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  if (!phony_loop)
     gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
 							   fd.loop.v));
 
@@ -15265,7 +15364,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   body = maybe_catch_exception (body);
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  if (!phony_loop)
     {
       /* Region exit marker goes at the end of the loop body.  */
       gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
@@ -17249,60 +17348,90 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
-/* Returen true if STMT is an assignment of a register-type into a local
-   VAR_DECL.  */
+/* Structure describing the basic properties of the loop we ara analyzing
+   whether it can be gridified and when it is gridified. */
+
+struct grid_prop
+{
+  /* True when we are doing tiling gridification, i.e. when there is a distinct
+     distribute loop over groups and a loop construct over work-items.  False
+     when distribute and parallel for loops form a combined construct.  */
+  bool tiling;
+  /* Location of the target construct for optimization information
+     messages.  */
+  location_t target_loc;
+  /* The collapse clause of the involved loops.  Collapse value of all of them
+     must be the same for gridification to take place.  */
+  size_t collapse;
+  /* Group sizes, if requested by the user or NULL if not requested.  */
+  tree group_sizes[3];
+};
+
+#define GRID_MISSED_MSG_PREFIX "Will not turn target construct into a " \
+  "gridified HSA kernel because "
+
+/* Return true if STMT is an assignment of a register-type into a local
+   VAR_DECL.  If GRID is non-NULL, the assignment additionally must not be to
+   any of the trees specifying group sizes there.  */
 
 static bool
-grid_reg_assignment_to_local_var_p (gimple *stmt)
+grid_safe_assignment_p (gimple *stmt, grid_prop *grid)
 {
   gassign *assign = dyn_cast <gassign *> (stmt);
   if (!assign)
     return false;
+  if (gimple_clobber_p (assign))
+    return true;
   tree lhs = gimple_assign_lhs (assign);
   if (!VAR_P (lhs)
       || !is_gimple_reg_type (TREE_TYPE (lhs))
       || is_global_var (lhs))
     return false;
+  if (grid)
+    for (unsigned i = 0; i < grid->collapse; i++)
+      if (lhs == grid->group_sizes[i])
+	return false;
   return true;
 }
 
 /* Return true if all statements in SEQ are assignments to local register-type
-   variables.  */
+   variables that do not hold group size information.  */
 
 static bool
-grid_seq_only_contains_local_assignments (gimple_seq seq)
+grid_seq_only_contains_local_assignments (gimple_seq seq, grid_prop *grid)
 {
   if (!seq)
     return true;
 
   gimple_stmt_iterator gsi;
   for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
-    if (!grid_reg_assignment_to_local_var_p (gsi_stmt (gsi)))
+    if (!grid_safe_assignment_p (gsi_stmt (gsi), grid))
       return false;
   return true;
 }
 
-/* Scan statements in SEQ and call itself recursively on any bind.  If during
-   whole search only assignments to register-type local variables and one
-   single OMP statement is encountered, return true, otherwise return false.
-   RET is where we store any OMP statement encountered.  TARGET_LOC and NAME
-   are used for dumping a note about a failure.  */
+/* Scan statements in SEQ and call itself recursively on any bind.  GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  If during whole search only assignments to
+   register-type local variables (that do not overwrite group size information)
+   and one single OMP statement is encountered, return true, otherwise return
+   false.  RET is where we store any OMP statement encountered.  */
 
 static bool
-grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
-				     const char *name, gimple **ret)
+grid_find_single_omp_among_assignments_1 (gimple_seq seq, grid_prop *grid,
+					  const char *name, gimple **ret)
 {
   gimple_stmt_iterator gsi;
   for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gimple *stmt = gsi_stmt (gsi);
 
-      if (grid_reg_assignment_to_local_var_p (stmt))
+      if (grid_safe_assignment_p (stmt, grid))
 	continue;
       if (gbind *bind = dyn_cast <gbind *> (stmt))
 	{
 	  if (!grid_find_single_omp_among_assignments_1 (gimple_bind_body (bind),
-							 target_loc, name, ret))
+							 grid, name, ret))
 	      return false;
 	}
       else if (is_gimple_omp (stmt))
@@ -17310,10 +17439,18 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
 	  if (*ret)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, target_loc,
-				 "Will not turn target construct into a simple "
-				 "GPGPU kernel because %s construct contains "
-				 "multiple OpenMP constructs\n", name);
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "%s construct "
+				   "contains multiple OpenMP constructs\n",
+				   name);
+		  dump_printf_loc (MSG_NOTE, gimple_location (*ret),
+				   "The first OpenMP construct within "
+				   "a parallel\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "The second OpenMP construct within "
+				   "a parallel\n");
+		}
 	      return false;
 	    }
 	  *ret = stmt;
@@ -17321,10 +17458,14 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
       else
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, target_loc,
-			     "Will not turn target construct into a simple "
-			     "GPGPU kernel because %s construct contains "
-			     "a complex statement\n", name);
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "%s construct contains "
+			       "a complex statement\n", name);
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "gridification\n");
+	    }
 	  return false;
 	}
     }
@@ -17332,33 +17473,32 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
 }
 
 /* Scan statements in SEQ and make sure that it and any binds in it contain
-   only assignments to local register-type variables and one OMP construct.  If
-   so, return that construct, otherwise return NULL.  If dumping is enabled and
-   function fails, use TARGET_LOC and NAME to dump a note with the reason for
-   failure.  */
+   only assignments to local register-type variables (that do not overwrite
+   group size information) and one OMP construct.  If so, return that
+   construct, otherwise return NULL.  GRID describes hitherto discovered
+   properties of the loop that is evaluated for possible gridification.  If
+   dumping is enabled and function fails, use NAME to dump a note with the
+   reason for failure.  */
 
 static gimple *
-grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
+grid_find_single_omp_among_assignments (gimple_seq seq, grid_prop *grid,
 					const char *name)
 {
   if (!seq)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
-			 "Will not turn target construct into a simple "
-			 "GPGPU kernel because %s construct has empty "
-			 "body\n",
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			 GRID_MISSED_MSG_PREFIX "%s construct has empty body\n",
 			 name);
       return NULL;
     }
 
   gimple *ret = NULL;
-  if (grid_find_single_omp_among_assignments_1 (seq, target_loc, name, &ret))
+  if (grid_find_single_omp_among_assignments_1 (seq, grid, name, &ret))
     {
       if (!ret && dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
-			 "Will not turn target construct into a simple "
-			 "GPGPU kernel because %s construct does not contain"
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			 GRID_MISSED_MSG_PREFIX "%s construct does not contain"
 			 "any other OpenMP construct\n", name);
       return ret;
     }
@@ -17401,218 +17541,81 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi,
       *handled_ops_p = true;
       wi->info = stmt;
       return error_mark_node;
-
-    case GIMPLE_OMP_FOR:
-      if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD)
-	  && gimple_omp_for_combined_into_p (stmt))
-	{
-	  *handled_ops_p = true;
-	  wi->info = stmt;
-	  return error_mark_node;
-	}
-      break;
-
     default:
       break;
     }
   return NULL;
 }
 
-
-/* If TARGET follows a pattern that can be turned into a gridified GPGPU
-   kernel, return true, otherwise return false.  In the case of success, also
-   fill in GROUP_SIZE_P with the requested group size or NULL if there is
-   none.  */
+/* Examine clauses of omp parallel statement PAR and if any prevents
+   gridification, issue a missed-optimization diagnostics and return false,
+   otherwise return true.  GRID describes hitherto discovered properties of the
+   loop that is evaluated for possible gridification.  */
 
 static bool
-grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p)
+grid_parallel_clauses_gridifiable (gomp_parallel *par, location_t tloc)
 {
-  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
-    return false;
-
-  location_t tloc = gimple_location (target);
-  gimple *stmt
-    = grid_find_single_omp_among_assignments (gimple_omp_body (target),
-					      tloc, "target");
-  if (!stmt)
-    return false;
-  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
-  tree group_size = NULL;
-  if (!teams)
-    {
-      dump_printf_loc (MSG_NOTE, tloc,
-		       "Will not turn target construct into a simple "
-		       "GPGPU kernel because it does not have a sole teams "
-		       "construct in it.\n");
-      return false;
-    }
-
-  tree clauses = gimple_omp_teams_clauses (teams);
+  tree clauses = gimple_omp_parallel_clauses (par);
   while (clauses)
     {
       switch (OMP_CLAUSE_CODE (clauses))
 	{
-	case OMP_CLAUSE_NUM_TEAMS:
+	case OMP_CLAUSE_NUM_THREADS:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because we cannot "
-			     "handle num_teams clause of teams "
-			     "construct\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			       GRID_MISSED_MSG_PREFIX "because there is "
+			       "a num_threads clause of the parallel "
+			       "construct\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (par),
+			       "Parallel construct has a num_threads clause\n");
+	    }
 	  return false;
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			       GRID_MISSED_MSG_PREFIX "a reduction clause"
+			       "is present\n ");
+	      dump_printf_loc (MSG_NOTE, gimple_location (par),
+			       "Parallel construct has a reduction clause\n");
+	    }
 	  return false;
 
-	case OMP_CLAUSE_THREAD_LIMIT:
-	  group_size = OMP_CLAUSE_OPERAND (clauses, 0);
-	  break;
-
 	default:
 	  break;
 	}
       clauses = OMP_CLAUSE_CHAIN (clauses);
     }
+  return true;
+}
 
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (teams), tloc,
-						 "teams");
-  if (!stmt)
-    return false;
-  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
-  if (!dist)
-    {
-      dump_printf_loc (MSG_NOTE, tloc,
-		       "Will not turn target construct into a simple "
-		       "GPGPU kernel because the teams construct  does not have "
-		       "a sole distribute construct in it.\n");
-      return false;
-    }
+/* Examine clauses and the body of omp loop statement GFOR and if something
+   prevents gridification, issue a missed-optimization diagnostics and return
+   false, otherwise return true. GRID describes hitherto discovered properties
+   of the loop that is evaluated for possible gridification.  */
 
-  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
-  if (!gimple_omp_for_combined_p (dist))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because we cannot handle a standalone "
-			 "distribute construct\n ");
-      return false;
-    }
-  if (dist->collapse > 1)
+static bool
+grid_inner_loop_gridifiable_p (gomp_for *gfor, grid_prop *grid)
+{
+  if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor),
+						 grid))
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the distribute construct contains "
-			 "collapse clause\n");
-      return false;
-    }
-  struct omp_for_data fd;
-  extract_omp_for_data (dist, &fd, NULL);
-  if (fd.chunk_size)
-    {
-      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because the teams "
-			     "thread limit is different from distribute "
-			     "schedule chunk\n");
-	  return false;
-	}
-      group_size = fd.chunk_size;
-    }
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (dist), tloc,
-						 "distribute");
-  gomp_parallel *par;
-  if (!stmt || !(par = dyn_cast <gomp_parallel *> (stmt)))
-    return false;
-
-  clauses = gimple_omp_parallel_clauses (par);
-  while (clauses)
-    {
-      switch (OMP_CLAUSE_CODE (clauses))
 	{
-	case OMP_CLAUSE_NUM_THREADS:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified"
-			     "GPGPU kernel because there is a num_threads "
-			     "clause of the parallel construct\n");
-	  return false;
-
-	case OMP_CLAUSE_REDUCTION:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
-	  return false;
-
-	default:
-	  break;
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "the inner loop "
+			   "loop bounds computation contains a complex "
+			   "statement\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "Loop construct cannot be analyzed for "
+			   "gridification\n");
 	}
-      clauses = OMP_CLAUSE_CHAIN (clauses);
-    }
-
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (par), tloc,
-						 "parallel");
-  gomp_for *gfor;
-  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
-    return false;
-
-  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop is not a simple for "
-			 "loop\n");
-      return false;
-    }
-  if (gfor->collapse > 1)
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop contains collapse "
-			 "clause\n");
-      return false;
-    }
-
-  if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor)))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop pre_body contains"
-			 "a complex instruction\n");
       return false;
     }
 
-  clauses = gimple_omp_for_clauses (gfor);
+  tree clauses = gimple_omp_for_clauses (gfor);
   while (clauses)
     {
       switch (OMP_CLAUSE_CODE (clauses))
@@ -17621,28 +17624,28 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	  if (OMP_CLAUSE_SCHEDULE_KIND (clauses) != OMP_CLAUSE_SCHEDULE_AUTO)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, tloc,
-				 "Will not turn target construct into a "
-				 "gridified GPGPU kernel because the inner "
-				 "loop has a non-automatic scheduling clause\n");
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "the inner loop "
+				   "has a non-automatic schedule clause\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+				   "Loop construct has a non automatic "
+				   "schedule clause\n");
+		}
 	      return false;
 	    }
 	  break;
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "a reduction "
+			       "clause is present\n ");
+	      dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			       "Loop construct has a reduction schedule "
+			       "clause\n");
+	    }
 	  return false;
 
 	default:
@@ -17650,7 +17653,6 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	}
       clauses = OMP_CLAUSE_CHAIN (clauses);
     }
-
   struct walk_stmt_info wi;
   memset (&wi, 0, sizeof (wi));
   if (walk_gimple_seq (gimple_omp_body (gfor),
@@ -17661,62 +17663,560 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
       if (dump_enabled_p ())
 	{
 	  if (is_gimple_call (bad))
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     " GPGPU kernel because the inner loop contains "
-			     "call to a noreturn function\n");
-	  if (gimple_code (bad) == GIMPLE_OMP_FOR)
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     " GPGPU kernel because the inner loop contains "
-			     "a simd construct\n");
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the inner loop contains "
+			       "call to a noreturn function\n");
 	  else
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     "GPGPU kernel because the inner loop contains "
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			     GRID_MISSED_MSG_PREFIX "the inner loop contains "
 			     "statement %s which cannot be transformed\n",
 			     gimple_code_name[(int) gimple_code (bad)]);
+	  dump_printf_loc (MSG_NOTE, gimple_location (bad),
+			   "This statement cannot be analyzed for "
+			   "gridification\n");
 	}
       return false;
     }
-
-  *group_size_p = group_size;
   return true;
 }
 
-/* Operand walker, used to remap pre-body declarations according to a hash map
-   provided in DATA.  */
+/* Given distribute omp construct represented by DIST, which in the original
+   source forms a compound construct with a looping construct, return true if it
+   can be turned into a gridified HSA kernel.  Otherwise return false. GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  */
 
-static tree
-grid_remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+static bool
+grid_dist_follows_simple_pattern (gomp_for *dist, grid_prop *grid)
 {
-  tree t = *tp;
+  location_t tloc = grid->target_loc;
+  gimple *stmt = grid_find_single_omp_among_assignments (gimple_omp_body (dist),
+							 grid, "distribute");
+  gomp_parallel *par;
+  if (!stmt
+      || !(par = dyn_cast <gomp_parallel *> (stmt))
+      || !grid_parallel_clauses_gridifiable (par, tloc))
+    return false;
 
-  if (DECL_P (t) || TYPE_P (t))
-    *walk_subtrees = 0;
-  else
-    *walk_subtrees = 1;
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (par), grid,
+						 "parallel");
+  gomp_for *gfor;
+  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
+    return false;
 
-  if (VAR_P (t))
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
     {
-      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
-      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
-      tree *repl = declmap->get (t);
-      if (repl)
-	*tp = *repl;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "the inner loop is not "
+			 "a simple for loop\n");
+      return false;
     }
-  return NULL_TREE;
+  gcc_assert (gimple_omp_for_collapse (gfor) == grid->collapse);
+
+  if (!grid_inner_loop_gridifiable_p (gfor, grid))
+    return false;
+
+  return true;
+}
+
+/* Given an omp loop statement GFOR, return true if it can participate in
+   tiling gridification, i.e. in one where the distribute and parallel for
+   loops do not form a compound statement.  GRID describes hitherto discovered
+   properties of the loop that is evaluated for possible gridification. */
+
+static bool
+grid_gfor_follows_tiling_pattern (gomp_for *gfor, grid_prop *grid)
+{
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "an inner loop is not "
+			   "a simple for loop\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "This statement is not a simple for loop\n");
+	}
+      return false;
+    }
+
+  if (!grid_inner_loop_gridifiable_p (gfor, grid))
+    return false;
+
+  if (gimple_omp_for_collapse (gfor) != grid->collapse)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "an inner loop does not "
+			   "have use the same collapse clause\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "Loop construct uses a different collapse clause\n");
+	}
+      return false;
+    }
+
+  struct omp_for_data fd;
+  struct omp_for_data_loop *loops
+    = (struct omp_for_data_loop *)alloca (grid->collapse
+					  * sizeof (struct omp_for_data_loop));
+  extract_omp_for_data (gfor, &fd, loops);
+  for (unsigned i = 0; i < grid->collapse; i++)
+    {
+      tree itype, type = TREE_TYPE (fd.loops[i].v);
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
+      else
+	itype = type;
+
+      tree n1 = fold_convert (itype, fd.loops[i].n1);
+      tree n2 = fold_convert (itype, fd.loops[i].n2);
+      tree t = build_int_cst (itype,
+			      (fd.loops[i].cond_code == LT_EXPR ? -1 : 1));
+      t = fold_build2 (PLUS_EXPR, itype, fd.loops[i].step, t);
+      t = fold_build2 (PLUS_EXPR, itype, t, n2);
+      t = fold_build2 (MINUS_EXPR, itype, t, n1);
+      if (TYPE_UNSIGNED (itype) && fd.loops[i].cond_code == GT_EXPR)
+	t = fold_build2 (TRUNC_DIV_EXPR, itype,
+			 fold_build1 (NEGATE_EXPR, itype, t),
+			 fold_build1 (NEGATE_EXPR, itype, fd.loops[i].step));
+      else
+	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, fd.loops[i].step);
+
+      if (!operand_equal_p (grid->group_sizes[i], t, 0))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute and "
+			       "an internal loop do not agree on tile size\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			       "Loop construct does not seem to loop over "
+			       "a tile size\n");
+	    }
+	  return false;
+	}
+    }
+  return true;
+}
+
+/* Facing a call to FNDECL in the body of a distribute construct, return true
+   if we can handle it or false if it precludes gridification.  */
+
+static bool
+grid_call_permissible_in_distribute_p (tree fndecl)
+{
+  if (DECL_PURE_P (fndecl) || TREE_READONLY (fndecl))
+    return true;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if (strstr (name, "omp_") != name)
+    return false;
+
+  if ((strcmp (name, "omp_get_thread_num") == 0)
+      || (strcmp (name, "omp_get_num_threads") == 0)
+      || (strcmp (name, "omp_get_num_teams") == 0)
+      || (strcmp (name, "omp_get_team_num") == 0)
+      || (strcmp (name, "omp_get_level") == 0)
+      || (strcmp (name, "omp_get_active_level") == 0)
+      || (strcmp (name, "omp_in_parallel") == 0))
+    return true;
+
+  return false;
+}
+
+/* Facing a call satisfying grid_call_permissible_in_distribute_p in the body
+   of a distribute construct that is pointed at by GSI, modify it as necessary
+   for gridification.  If the statement itself got removed, return true.  */
+
+static bool
+grid_handle_call_in_distribute (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  gcc_checking_assert (stmt);
+  if (DECL_PURE_P (fndecl) || TREE_READONLY (fndecl))
+    return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if ((strcmp (name, "omp_get_thread_num") == 0)
+      || (strcmp (name, "omp_get_level") == 0)
+      || (strcmp (name, "omp_get_active_level") == 0)
+      || (strcmp (name, "omp_in_parallel") == 0))
+    {
+      tree lhs = gimple_call_lhs (stmt);
+      if (lhs)
+	{
+	  gassign *assign
+	    = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs)));
+	  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
+	}
+      gsi_remove (gsi, true);
+      return true;
+    }
+
+  /* The rest of the omp functions can stay as they are, HSA back-end will
+     handle them correctly.  */
+  gcc_checking_assert ((strcmp (name, "omp_get_num_threads") == 0)
+		       || (strcmp (name, "omp_get_num_teams") == 0)
+		       || (strcmp (name, "omp_get_team_num") == 0));
+  return false;
+}
+
+/* Given a sequence of statements within a distribute omp construct or a
+   parallel construct, which in the original source does not form a compound
+   construct with a looping construct, return true if it does not prevent us
+   from turning it into a gridified HSA kernel.  Otherwise return false. GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  IN_PARALLEL must be true if seq is within a
+   parallel construct and flase if it is only within a distribute
+   construct.  */
+
+static bool
+grid_dist_follows_tiling_pattern (gimple_seq seq, grid_prop *grid,
+				  bool in_parallel)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (grid_safe_assignment_p (stmt, grid)
+	  || gimple_code (stmt) == GIMPLE_GOTO
+	  || gimple_code (stmt) == GIMPLE_LABEL
+	  || gimple_code (stmt) == GIMPLE_COND)
+	continue;
+      else if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  if (!grid_dist_follows_tiling_pattern (gimple_bind_body (bind),
+						 grid, in_parallel))
+	    return false;
+	  continue;
+	}
+      else if (gtry *try_stmt = dyn_cast <gtry *> (stmt))
+	{
+	  if (gimple_try_kind (try_stmt) == GIMPLE_TRY_CATCH)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "the distribute "
+				   "construct contains a try..catch region\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (try_stmt),
+				   "This statement cannot be analyzed for "
+				   "tiled gridification\n");
+		}
+	      return false;
+	    }
+	  if (!grid_dist_follows_tiling_pattern (gimple_try_eval (try_stmt),
+						 grid, in_parallel))
+	    return false;
+	  if (!grid_dist_follows_tiling_pattern (gimple_try_cleanup (try_stmt),
+						 grid, in_parallel))
+	    return false;
+	  continue;
+	}
+      else if (is_gimple_call (stmt))
+	{
+	  tree fndecl = gimple_call_fndecl (stmt);
+	  if (fndecl && grid_call_permissible_in_distribute_p (fndecl))
+	    continue;
+
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute "
+			       "construct contains a call\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "tiled gridification\n");
+	    }
+	  return false;
+	}
+      else if (gomp_parallel *par = dyn_cast <gomp_parallel *> (stmt))
+	{
+	  if (in_parallel)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "a parallel "
+				   "construct contains another parallel "
+				   "construct\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "This parallel construct is nested in "
+				   "another one\n");
+		}
+	      return false;
+	    }
+	  if (!grid_parallel_clauses_gridifiable (par, grid->target_loc)
+	      || !grid_dist_follows_tiling_pattern (gimple_omp_body (par),
+						    grid, true))
+	    return false;
+	}
+      else if (gomp_for *gfor = dyn_cast <gomp_for *> (stmt))
+	{
+	  if (!in_parallel)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "a loop "
+				   "construct is not nested within a parallel "
+				   "construct\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "This loop construct is not nested in "
+				   "a parallel construct\n");
+		}
+	      return false;
+	    }
+	  if (!grid_gfor_follows_tiling_pattern (gfor, grid))
+	    return false;
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute "
+			       "construct contains a complex statement\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "tiled gridification\n");
+	    }
+	  return false;
+	}
+    }
+    return true;
+}
+
+/* If TARGET follows a pattern that can be turned into a gridified HSA kernel,
+   return true, otherwise return false.  In the case of success, also fill in
+   GRID with information describing the kernel grid.  */
+
+static bool
+grid_target_follows_gridifiable_pattern (gomp_target *target, grid_prop *grid)
+{
+  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
+    return false;
+
+  location_t tloc = gimple_location (target);
+  grid->target_loc = tloc;
+  gimple *stmt
+    = grid_find_single_omp_among_assignments (gimple_omp_body (target),
+					      grid, "target");
+  if (!stmt)
+    return false;
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  tree group_size = NULL;
+  if (!teams)
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+		       GRID_MISSED_MSG_PREFIX "it does not have a sole teams "
+		       "construct in it.\n");
+      return false;
+    }
+
+  tree clauses = gimple_omp_teams_clauses (teams);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_TEAMS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "the teams construct "
+			     "contains a num_teams clause\n ");
+	  return false;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_THREAD_LIMIT:
+	  if (!integer_zerop (OMP_CLAUSE_OPERAND (clauses, 0)))
+	    group_size = OMP_CLAUSE_OPERAND (clauses, 0);
+	  break;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (teams), grid,
+						 "teams");
+  if (!stmt)
+    return false;
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  if (!dist)
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+		       GRID_MISSED_MSG_PREFIX "the teams construct does not "
+		       "have a single distribute construct in it.\n");
+      return false;
+    }
+
+  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
+
+  grid->collapse = gimple_omp_for_collapse (dist);
+  if (grid->collapse > 3)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "the distribute construct "
+			 "contains collapse clause with parameter greater "
+			 "than 3\n");
+      return false;
+    }
+
+  struct omp_for_data fd;
+  struct omp_for_data_loop *dist_loops
+    = (struct omp_for_data_loop *)alloca (grid->collapse
+					  * sizeof (struct omp_for_data_loop));
+  extract_omp_for_data (dist, &fd, dist_loops);
+  if (fd.chunk_size)
+    {
+      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "the teams "
+			     "thread limit is different from distribute "
+			     "schedule chunk\n");
+	  return false;
+	}
+      group_size = fd.chunk_size;
+    }
+  if (group_size && grid->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "group size cannot be "
+			 "set using thread_limit or schedule clauses "
+			 "when also using a collapse clause greater than 1\n");
+      return false;
+    }
+
+  if (gimple_omp_for_combined_p (dist))
+    {
+      grid->tiling = false;
+      grid->group_sizes[0] = group_size;
+      for (unsigned i = 1; i < grid->collapse; i++)
+	grid->group_sizes[i] = NULL;
+      return grid_dist_follows_simple_pattern (dist, grid);
+    }
+  else
+    {
+      grid->tiling = true;
+      if (group_size)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "group size cannot be set "
+			     "using thread_limit or schedule clauses when "
+			     "distribute and loop constructs do not form "
+			     "one combined construct\n");
+	  return false;
+	}
+      for (unsigned i = 0; i < grid->collapse; i++)
+	{
+	  if (fd.loops[i].cond_code == GT_EXPR)
+	    grid->group_sizes[i] = fold_build1 (NEGATE_EXPR,
+						TREE_TYPE (fd.loops[i].step),
+						fd.loops[i].step);
+	  else
+	    grid->group_sizes[i] = fd.loops[i].step;
+	}
+      return grid_dist_follows_tiling_pattern (gimple_omp_body (dist), grid,
+					       false);
+    }
+}
+
+/* Operand walker, used to remap pre-body declarations according to a hash map
+   provided in DATA.  */
+
+static tree
+grid_remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (DECL_P (t) || TYPE_P (t))
+    *walk_subtrees = 0;
+  else
+    *walk_subtrees = 1;
+
+  if (VAR_P (t))
+    {
+      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+      tree *repl = declmap->get (t);
+      if (repl)
+	*tp = *repl;
+    }
+  return NULL_TREE;
+}
+
+/* Identifiers of segments into which a particular variable should be places
+   when gridifying.  */
+
+enum grid_var_segment {GRID_SEGMENT_PRIVATE, GRID_SEGMENT_GROUP,
+		       GRID_SEGMENT_GLOBAL};
+
+/* Mark VAR so that it is eventually placed into SEGMENT.  Place an artificial
+   builtin call into SEQ that will make sure the variable is always considered
+   address taken.  */
+
+static void
+grid_mark_variable_segment (tree var, enum grid_var_segment segment)
+{
+  /* Making a non-addressable variables would require that we re-gimplify all
+     their uses.  Fortunately, we do not have to do this because if they are
+     not addressable, it means they are not used in atomic or parallel
+     statements and so relaxed GPU consistency rules mean we can just keep them
+     private. */
+  if (!TREE_ADDRESSABLE (var))
+    return;
+
+  switch (segment)
+    {
+    case GRID_SEGMENT_GROUP:
+      DECL_ATTRIBUTES (var) = tree_cons (get_identifier ("hsa_group_segment"),
+					 NULL, DECL_ATTRIBUTES (var));
+      break;
+    case GRID_SEGMENT_GLOBAL:
+      DECL_ATTRIBUTES (var) = tree_cons (get_identifier ("hsa_global_segment"),
+					 NULL, DECL_ATTRIBUTES (var));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (!TREE_STATIC (var))
+    {
+      TREE_STATIC (var) = 1;
+      varpool_node::finalize_decl (var);
+    }
+
 }
 
 /* Copy leading register-type assignments to local variables in SRC to just
    before DST, Creating temporaries, adjusting mapping of operands in WI and
    remapping operands as necessary.  Add any new temporaries to TGT_BIND.
-   Return the first statement that does not conform to
-   grid_reg_assignment_to_local_var_p or NULL.  */
+   Return the first statement that does not conform to grid_safe_assignment_p
+   or NULL.  If VAR_SEGMENT is not GRID_SEGMENT_PRIVATE, also mark all
+   variables in traversed bind statements so that they are put into the
+   appropriate segment.  */
 
 static gimple *
 grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
-				gbind *tgt_bind, struct walk_stmt_info *wi)
+				     gbind *tgt_bind,
+				     enum grid_var_segment var_segment,
+				     struct walk_stmt_info *wi)
 {
   hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
   gimple_stmt_iterator gsi;
@@ -17726,13 +18226,17 @@ grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
       if (gbind *bind = dyn_cast <gbind *> (stmt))
 	{
 	  gimple *r = grid_copy_leading_local_assignments
-	    (gimple_bind_body (bind), dst, tgt_bind, wi);
+	    (gimple_bind_body (bind), dst, tgt_bind, var_segment, wi);
+
+	  if (var_segment != GRID_SEGMENT_PRIVATE)
+	    for (tree var = gimple_bind_vars (bind); var; var = DECL_CHAIN (var))
+	      grid_mark_variable_segment (var, var_segment);
 	  if (r)
 	    return r;
 	  else
 	    continue;
 	}
-      if (!grid_reg_assignment_to_local_var_p (stmt))
+      if (!grid_safe_assignment_p (stmt, NULL))
 	return stmt;
       tree lhs = gimple_assign_lhs (as_a <gassign *> (stmt));
       tree repl = copy_var_decl (lhs, create_tmp_var_name (NULL),
@@ -17748,43 +18252,262 @@ grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
   return NULL;
 }
 
+/* Statement walker function to make adjustments to statements within the
+   gridifed kernel copy.  */
+
+static tree
+grid_process_grid_body (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD))
+  {
+    gomp_for *loop = as_a <gomp_for *> (stmt);
+    tree clauses = gimple_omp_for_clauses (loop);
+    tree cl = find_omp_clause (clauses, OMP_CLAUSE_SAFELEN);
+    if (cl)
+      OMP_CLAUSE_SAFELEN_EXPR (cl) = integer_one_node;
+    else
+      {
+	tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
+	OMP_CLAUSE_SAFELEN_EXPR (c) = integer_one_node;
+	OMP_CLAUSE_CHAIN (c) = clauses;
+	gimple_omp_for_set_clauses (loop, c);
+      }
+  }
+  return NULL_TREE;
+}
+
+/* Given a PARLOOP that is a normal for looping construct but also a part of a
+   combined construct with a simd loop, eliminate the simd loop.  */
+
+static void
+grid_eliminate_combined_simd_part (gomp_for *parloop)
+{
+  struct walk_stmt_info wi;
+
+  memset (&wi, 0, sizeof (wi));
+  wi.val_only = true;
+  enum gf_mask msk = GF_OMP_FOR_SIMD;
+  wi.info = (void *) &msk;
+  walk_gimple_seq (gimple_omp_body (parloop), find_combined_for, NULL, &wi);
+  gimple *stmt = (gimple *) wi.info;
+  /* We expect that the SIMD id the only statement in the parallel loop.  */
+  gcc_assert (stmt
+	      && gimple_code (stmt) == GIMPLE_OMP_FOR
+	      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_SIMD)
+	      && gimple_omp_for_combined_into_p (stmt)
+	      && !gimple_omp_for_combined_p (stmt));
+  gomp_for *simd = as_a <gomp_for *> (stmt);
+
+  /* Copy over the iteration properties because the body refers to the index in
+     the bottmom-most loop.  */
+  unsigned i, collapse = gimple_omp_for_collapse (parloop);
+  gcc_checking_assert (collapse == gimple_omp_for_collapse (simd));
+  for (i = 0; i < collapse; i++)
+    {
+      gimple_omp_for_set_index (parloop, i, gimple_omp_for_index (simd, i));
+      gimple_omp_for_set_initial (parloop, i, gimple_omp_for_initial (simd, i));
+      gimple_omp_for_set_final (parloop, i, gimple_omp_for_final (simd, i));
+      gimple_omp_for_set_incr (parloop, i, gimple_omp_for_incr (simd, i));
+    }
+
+  tree *tgt= gimple_omp_for_clauses_ptr (parloop);
+  while (*tgt)
+    tgt = &OMP_CLAUSE_CHAIN (*tgt);
+
+  /* Copy over all clauses, except for linaer clauses, which are turned into
+     private clauses, and all other simd-specificl clauses, which are
+     ignored.  */
+  tree *pc = gimple_omp_for_clauses_ptr (simd);
+  while (*pc)
+    {
+      tree c = *pc;
+      switch (TREE_CODE (c))
+	{
+	case OMP_CLAUSE_LINEAR:
+	  {
+	    tree priv = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_PRIVATE);
+	    OMP_CLAUSE_DECL (priv) = OMP_CLAUSE_DECL (c);
+	    OMP_CLAUSE_CHAIN (priv) = NULL;
+	    *tgt = priv;
+	    tgt = &OMP_CLAUSE_CHAIN (priv);
+	    pc = &OMP_CLAUSE_CHAIN (c);
+	    break;
+	  }
+
+	case OMP_CLAUSE_SAFELEN:
+	case OMP_CLAUSE_SIMDLEN:
+	case OMP_CLAUSE_ALIGNED:
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  break;
+
+	default:
+	  *pc = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (c) = NULL;
+	  *tgt = c;
+	  tgt = &OMP_CLAUSE_CHAIN(c);
+	  break;
+	}
+    }
+
+  /* Finally, throw away the simd and mark the parallel loop as not
+     combined.  */
+  gimple_omp_set_body (parloop, gimple_omp_body (simd));
+  gimple_omp_for_set_combined_p (parloop, false);
+}
+
+/* Statement walker function marking all parallels as grid_phony and loops as
+   grid ones representing threads of a particular thread group.  */
+
+static tree
+grid_mark_tiling_loops (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			struct walk_stmt_info *wi_in)
+{
+  *handled_ops_p = false;
+  if (gomp_for *loop = dyn_cast <gomp_for *> (gsi_stmt (*gsi)))
+    {
+      *handled_ops_p = true;
+      gimple_omp_for_set_kind (loop, GF_OMP_FOR_KIND_GRID_LOOP);
+      gimple_omp_for_set_grid_intra_group (loop, true);
+      if (gimple_omp_for_combined_p (loop))
+	grid_eliminate_combined_simd_part (loop);
+
+      struct walk_stmt_info body_wi;
+      memset (&body_wi, 0, sizeof (body_wi));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (loop),
+			   grid_process_grid_body, NULL, &body_wi);
+
+      gbind *bind = (gbind *) wi_in->info;
+      tree c;
+      for (c = gimple_omp_for_clauses (loop); c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
+	  {
+	    push_gimplify_context ();
+	    tree ov = OMP_CLAUSE_DECL (c);
+	    tree gv = copy_var_decl (ov, create_tmp_var_name (NULL),
+				    TREE_TYPE (ov));
+
+	    grid_mark_variable_segment (gv, GRID_SEGMENT_GROUP);
+	    DECL_CONTEXT (gv) = current_function_decl;
+	    gimple_bind_append_vars (bind, gv);
+	    tree x = lang_hooks.decls.omp_clause_assign_op (c, gv, ov);
+	    gimplify_and_add (x, &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c));
+	    x = lang_hooks.decls.omp_clause_copy_ctor (c, ov, gv);
+	    gimple_seq l = NULL;
+	    gimplify_and_add (x, &l);
+	    gsi_insert_seq_after (gsi, l, GSI_SAME_STMT);
+	    pop_gimplify_context (bind);
+	  }
+    }
+  return NULL_TREE;
+}
+
+/* Statement walker function marking all parallels as grid_phony and loops as
+   grid ones representing threads of a particular thread group.  */
+
+static tree
+grid_mark_tiling_parallels_and_loops (gimple_stmt_iterator *gsi,
+				      bool *handled_ops_p,
+				      struct walk_stmt_info *wi_in)
+{
+  *handled_ops_p = false;
+  wi_in->removed_stmt = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  if (gbind *bind = dyn_cast <gbind *> (stmt))
+    {
+      for (tree var = gimple_bind_vars (bind); var; var = DECL_CHAIN (var))
+	grid_mark_variable_segment (var, GRID_SEGMENT_GROUP);
+    }
+  else if (gomp_parallel *parallel = dyn_cast <gomp_parallel *> (stmt))
+    {
+      *handled_ops_p = true;
+      gimple_omp_parallel_set_grid_phony (parallel, true);
+
+      gbind *new_bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
+      gimple_bind_set_body (new_bind, gimple_omp_body (parallel));
+      gimple_seq s = NULL;
+      gimple_seq_add_stmt (&s, new_bind);
+      gimple_omp_set_body (parallel, s);
+
+      struct walk_stmt_info wi_par;
+      memset (&wi_par, 0, sizeof (wi_par));
+      wi_par.info = new_bind;
+      walk_gimple_seq_mod (gimple_bind_body_ptr (new_bind),
+			   grid_mark_tiling_loops, NULL, &wi_par);
+    }
+  else if (is_a <gcall *> (stmt))
+    wi_in->removed_stmt = grid_handle_call_in_distribute (gsi);
+  return NULL_TREE;
+}
+
 /* Given freshly copied top level kernel SEQ, identify the individual OMP
-   components, mark them as part of kernel and return the inner loop, and copy
-   assignment leading to them just before DST, remapping them using WI and
-   adding new temporaries to TGT_BIND.  */
+   components, mark them as part of kernel, copy assignment leading to them
+   just before DST, remapping them using WI and adding new temporaries to
+   TGT_BIND, and and return the loop that will be used for kernel dispatch.  */
 
 static gomp_for *
-grid_process_kernel_body_copy (gimple_seq seq, gimple_stmt_iterator *dst,
+grid_process_kernel_body_copy (grid_prop *grid, gimple_seq seq,
+			       gimple_stmt_iterator *dst,
 			       gbind *tgt_bind, struct walk_stmt_info *wi)
 {
-  gimple *stmt = grid_copy_leading_local_assignments (seq, dst, tgt_bind, wi);
+  gimple *stmt = grid_copy_leading_local_assignments (seq, dst, tgt_bind,
+						      GRID_SEGMENT_GLOBAL, wi);
   gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
   gcc_assert (teams);
   gimple_omp_teams_set_grid_phony (teams, true);
   stmt = grid_copy_leading_local_assignments (gimple_omp_body (teams), dst,
-					 tgt_bind, wi);
+					      tgt_bind, GRID_SEGMENT_GLOBAL, wi);
   gcc_checking_assert (stmt);
   gomp_for *dist = dyn_cast <gomp_for *> (stmt);
   gcc_assert (dist);
   gimple_seq prebody = gimple_omp_for_pre_body (dist);
   if (prebody)
-    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
-  gimple_omp_for_set_grid_phony (dist, true);
-  stmt = grid_copy_leading_local_assignments (gimple_omp_body (dist), dst,
-					 tgt_bind, wi);
-  gcc_checking_assert (stmt);
+    grid_copy_leading_local_assignments (prebody, dst, tgt_bind,
+					 GRID_SEGMENT_GROUP, wi);
 
-  gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
-  gimple_omp_parallel_set_grid_phony (parallel, true);
-  stmt = grid_copy_leading_local_assignments (gimple_omp_body (parallel), dst,
-					 tgt_bind, wi);
-  gomp_for *inner_loop = as_a <gomp_for *> (stmt);
-  gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_GRID_LOOP);
-  prebody = gimple_omp_for_pre_body (inner_loop);
-  if (prebody)
-    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+  if (grid->tiling)
+    {
+      gimple_omp_for_set_kind (dist, GF_OMP_FOR_KIND_GRID_LOOP);
+      gimple_omp_for_set_grid_group_iter (dist, true);
 
-  return inner_loop;
+      struct walk_stmt_info wi_tiled;
+      memset (&wi_tiled, 0, sizeof (wi_tiled));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (dist),
+			   grid_mark_tiling_parallels_and_loops, NULL,
+			   &wi_tiled);
+      return dist;
+    }
+  else
+    {
+      gimple_omp_for_set_grid_phony (dist, true);
+      stmt = grid_copy_leading_local_assignments (gimple_omp_body (dist), dst,
+						  tgt_bind,
+						  GRID_SEGMENT_PRIVATE, wi);
+      gcc_checking_assert (stmt);
+      gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
+      gimple_omp_parallel_set_grid_phony (parallel, true);
+      stmt = grid_copy_leading_local_assignments (gimple_omp_body (parallel),
+						  dst, tgt_bind,
+						  GRID_SEGMENT_PRIVATE, wi);
+      gomp_for *inner_loop = as_a <gomp_for *> (stmt);
+      gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_GRID_LOOP);
+      prebody = gimple_omp_for_pre_body (inner_loop);
+      if (prebody)
+	grid_copy_leading_local_assignments (prebody, dst, tgt_bind,
+					     GRID_SEGMENT_PRIVATE, wi);
+
+      if (gimple_omp_for_combined_p (inner_loop))
+	grid_eliminate_combined_simd_part (inner_loop);
+      struct walk_stmt_info body_wi;;
+      memset (&body_wi, 0, sizeof (body_wi));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (inner_loop),
+			   grid_process_grid_body, NULL, &body_wi);
+
+      return inner_loop;
+    }
 }
 
 /* If TARGET points to a GOMP_TARGET which follows a gridifiable pattern,
@@ -17797,14 +18520,16 @@ grid_attempt_target_gridification (gomp_target *target,
 				   gimple_stmt_iterator *gsi,
 				   gbind *tgt_bind)
 {
-  tree group_size;
-  if (!target || !grid_target_follows_gridifiable_pattern (target, &group_size))
+  /* removed group_size */
+  grid_prop grid;
+  memset (&grid, 0, sizeof (grid));
+  if (!target || !grid_target_follows_gridifiable_pattern (target, &grid))
     return;
 
   location_t loc = gimple_location (target);
   if (dump_enabled_p ())
     dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-		     "Target construct will be turned into a gridified GPGPU "
+		     "Target construct will be turned into a gridified HSA "
 		     "kernel\n");
 
   /* Copy target body to a GPUKERNEL construct:  */
@@ -17817,8 +18542,8 @@ grid_attempt_target_gridification (gomp_target *target,
   wi.info = declmap;
 
   /* Copy assignments in between OMP statements before target, mark OMP
-     statements within copy appropriatly.  */
-  gomp_for *inner_loop = grid_process_kernel_body_copy (kernel_seq, gsi,
+     statements within copy appropriately.  */
+  gomp_for *inner_loop = grid_process_kernel_body_copy (&grid, kernel_seq, gsi,
 							tgt_bind, &wi);
 
   gbind *old_bind = as_a <gbind *> (gimple_seq_first (gimple_omp_body (target)));
@@ -17833,10 +18558,10 @@ grid_attempt_target_gridification (gomp_target *target,
     (gimple_bind_body_ptr (as_a <gbind *> (gimple_omp_body (target))),
      gpukernel);
 
-  walk_tree (&group_size, grid_remap_prebody_decls, &wi, NULL);
+  for (size_t i = 0; i < grid.collapse; i++)
+    walk_tree (&grid.group_sizes[i], grid_remap_prebody_decls, &wi, NULL);
   push_gimplify_context ();
-  size_t collapse = gimple_omp_for_collapse (inner_loop);
-  for (size_t i = 0; i < collapse; i++)
+  for (size_t i = 0; i < grid.collapse; i++)
     {
       tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i));
       if (POINTER_TYPE_P (type))
@@ -17850,12 +18575,12 @@ grid_attempt_target_gridification (gomp_target *target,
       tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i));
       walk_tree (&n2, grid_remap_prebody_decls, &wi, NULL);
       adjust_for_condition (loc, &cond_code, &n2);
-      tree step;
-      step = get_omp_for_step_from_incr (loc,
-					 gimple_omp_for_incr (inner_loop, i));
-      gimple_seq tmpseq = NULL;
       n1 = fold_convert (itype, n1);
       n2 = fold_convert (itype, n2);
+
+      tree step
+	= get_omp_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i));
+
       tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1));
       t = fold_build2 (PLUS_EXPR, itype, step, t);
       t = fold_build2 (PLUS_EXPR, itype, t, n2);
@@ -17866,15 +18591,23 @@ grid_attempt_target_gridification (gomp_target *target,
 			 fold_build1 (NEGATE_EXPR, itype, step));
       else
 	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
+      if (grid.tiling)
+        {
+          if (cond_code == GT_EXPR)
+            step = fold_build1 (NEGATE_EXPR, itype, step);
+          t = fold_build2 (MULT_EXPR, itype, t, step);
+        }
+
       tree gs = fold_convert (uint32_type_node, t);
+      gimple_seq tmpseq = NULL;
       gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue);
       if (!gimple_seq_empty_p (tmpseq))
 	gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
 
       tree ws;
-      if (i == 0 && group_size)
+      if (grid.group_sizes[i])
 	{
-	  ws = fold_convert (uint32_type_node, group_size);
+	  ws = fold_convert (uint32_type_node, grid.group_sizes[i]);
 	  tmpseq = NULL;
 	  gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue);
 	  if (!gimple_seq_empty_p (tmpseq))
@@ -17995,7 +18728,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp, /* properties_provided */
diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-2.c b/gcc/testsuite/c-c++-common/gomp/gridify-2.c
new file mode 100644
index 0000000..3c13025
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-2.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+#define BLOCK_SIZE 16
+
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+#pragma omp parallel
+	 {
+         int C_row, C_col;
+         float Cval = 0.0;
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+	       for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cval += As[row][e] * Bs[e][col];
+		 }
+	   }  /* End for kblock .. */
+
+
+#pragma omp for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
+
+	     }
+         } /* end parallel */
+      }	   /* end target teams distribute */
+}
+
+/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified GPGPU kernel" "omplower" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-3.c b/gcc/testsuite/c-c++-common/gomp/gridify-3.c
new file mode 100644
index 0000000..9e73133
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-3.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+#define BLOCK_SIZE 16
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC)
+{
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+	  float As[BLOCK_SIZE][BLOCK_SIZE];
+	  float Bs[BLOCK_SIZE][BLOCK_SIZE];
+	  float Cs[BLOCK_SIZE][BLOCK_SIZE];
+	  int C_row, C_col;
+
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               Cs[row][col] = 0.0;
+	     }
+
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp parallel for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp parallel for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cs[row][col] += As[row][e] * Bs[e][col];
+		 }
+         }  /* End for kblock .. */
+
+
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
+	     }
+      }	/* End distribute */
+}
+
+/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified GPGPU kernel" "omplower" } } */
diff --git a/libgomp/testsuite/libgomp.hsa.c/tiling-1.c b/libgomp/testsuite/libgomp.hsa.c/tiling-1.c
new file mode 100644
index 0000000..9149adc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/tiling-1.c
@@ -0,0 +1,212 @@
+/*
+
+   matmul.c : Matrix Multiplication with tiling for openmp4 example
+
+*/
+
+#include <stdlib.h>
+#include <math.h>
+
+#define BLOCK_SIZE 16
+/*
+  #define BLOCK_SIZE 32
+*/
+#define NSECPERSEC 1000000000L
+
+typedef struct {
+   int width;
+   int height;
+   int stride;
+   int hpad;
+   float* elements;
+} Matrix;
+
+/* Correctly extract the number of nanoseconds from the two time structures */
+long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
+   long int nanosecs;
+   if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
+      ((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
+      ( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
+   else nanosecs =
+      (((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
+      ( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
+   return nanosecs;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void  tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+
+int verify(float* v_res, float* v_ref, int len) {
+    int passed = 1;
+    int i;
+    for (i = 0; i < len; ++i) {
+        if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
+	  __builtin_abort ();
+        }
+    }
+    return passed;
+}
+
+
+int main(int argc, char* argv[]){
+
+   Matrix A,B,Bt,C,Cref;
+   int a1,a2,a3,i,j;
+   struct timespec start_time1, end_time1;
+   struct timespec start_time2, end_time2;
+   long int nanosecs,total_ops;
+   float gflopsTiled,gflopsCPU;
+
+   a1 = 35;
+   a2 = 28;
+   a3 = 47;
+
+   A.height = a1;
+   A.width = a2;
+   A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
+
+   B.height = a2;
+   B.width = a3;
+   B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
+
+   /* Bt is same as B but stored in column-major order */
+   Bt.height = B.height;
+   Bt.width = B.width;
+   Bt.stride = B.stride;
+   Bt.hpad = B.hpad;
+   Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
+
+   C.height = a1;
+   C.width = a3;
+   C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
+
+   Cref.height = a1;
+   Cref.width = a3;
+   Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
+
+   for(i = 0; i < A.hpad ; i++)
+      for(j = 0; j < A.stride; j++) {
+         if (( j<A.width ) && (i<A.height)) {
+            A.elements[i*A.stride + j] = (i % 3);
+         } else {
+            A.elements[i*A.stride + j] = 0.0;
+         }
+      }
+
+   /*  Initialize B and Bt */
+   for(i = 0; i < B.hpad ; i++)
+      for(j = 0; j < B.stride; j++) {
+         if (( j<B.width ) && (i<B.height)) {
+            B.elements[i*B.stride+j] = (j % 2);
+            Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
+         } else {
+            B.elements[i*B.stride+j] = 0.0;
+            Bt.elements[j*Bt.stride+i] = 0.0;
+         }
+      }
+
+   /* zero C, and Cref */
+   for(i = 0; i < C.hpad; i++)
+      for(j = 0; j < C.stride; j++) {
+         C.elements[i*C.stride+j] = 0.0;
+         Cref.elements[i*Cref.stride+j] = 0.0;
+      }
+
+   simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
+   tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
+
+   verify(C.elements, Cref.elements, C.height * C.stride);
+   return 0;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+const float* B,const int LDB, const float beta,float* C, const int LDC) {
+   /*  A,B, and C  are in row-major order */
+   int c_row,c_col,inner;
+   float sum;
+   for (c_col  = 0 ;  c_col<N; c_col++ ) {
+      for (c_row = 0 ; c_row<M; c_row++ ) {
+         sum = 0.0 ;
+         for (inner = 0 ; inner<K; inner++ ) {
+            sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
+         }
+         C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
+      }
+   }
+}
+
+/***************************
+
+   tiled_sgemm_tt:  Tiled matrix multiplication:
+
+***************************/
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+#pragma omp parallel
+	 {
+         int C_row, C_col;
+         float Cval = 0.0;
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+	       for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cval += As[row][e] * Bs[e][col];
+		 }
+	   }  /* End for kblock .. */
+
+
+#pragma omp for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
+
+	     }
+         } /* end parallel */
+      }	   /* end target teams distribute */
+}
diff --git a/libgomp/testsuite/libgomp.hsa.c/tiling-2.c b/libgomp/testsuite/libgomp.hsa.c/tiling-2.c
new file mode 100644
index 0000000..6e54304
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/tiling-2.c
@@ -0,0 +1,258 @@
+/*
+
+   matmul.c : Matrix Multiplication with tiling for openmp4 example
+
+*/
+
+#include <stdlib.h>
+#include <math.h>
+
+#define BLOCK_SIZE 16
+/*
+  #define BLOCK_SIZE 32
+*/
+#define NSECPERSEC 1000000000L
+
+typedef struct {
+   int width;
+   int height;
+   int stride;
+   int hpad;
+   float* elements;
+} Matrix;
+
+/* Correctly extract the number of nanoseconds from the two time structures */
+long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
+   long int nanosecs;
+   if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
+      ((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
+      ( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
+   else nanosecs =
+      (((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
+      ( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
+   return nanosecs;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void  tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+
+int verify(float* v_res, float* v_ref, int len) {
+    int passed = 1;
+    int i;
+    for (i = 0; i < len; ++i) {
+        if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
+	  __builtin_abort ();
+        }
+    }
+    return passed;
+}
+
+
+int main(int argc, char* argv[]){
+
+   Matrix A,B,Bt,C,Cref;
+   int a1,a2,a3,i,j;
+   struct timespec start_time1, end_time1;
+   struct timespec start_time2, end_time2;
+   long int nanosecs,total_ops;
+   float gflopsTiled,gflopsCPU;
+
+   a1 = 35;
+   a2 = 28;
+   a3 = 47;
+
+   A.height = a1;
+   A.width = a2;
+   A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
+
+   B.height = a2;
+   B.width = a3;
+   B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
+
+   /* Bt is same as B but stored in column-major order */
+   Bt.height = B.height;
+   Bt.width = B.width;
+   Bt.stride = B.stride;
+   Bt.hpad = B.hpad;
+   Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
+
+   C.height = a1;
+   C.width = a3;
+   C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
+
+   Cref.height = a1;
+   Cref.width = a3;
+   Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
+
+   for(i = 0; i < A.hpad ; i++)
+      for(j = 0; j < A.stride; j++) {
+         if (( j<A.width ) && (i<A.height)) {
+            A.elements[i*A.stride + j] = (i % 3);
+         } else {
+            A.elements[i*A.stride + j] = 0.0;
+         }
+      }
+
+   /*  Initialize B and Bt */
+   for(i = 0; i < B.hpad ; i++)
+      for(j = 0; j < B.stride; j++) {
+         if (( j<B.width ) && (i<B.height)) {
+            B.elements[i*B.stride+j] = (j % 2);
+            Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
+         } else {
+            B.elements[i*B.stride+j] = 0.0;
+            Bt.elements[j*Bt.stride+i] = 0.0;
+         }
+      }
+
+   /* zero C, and Cref */
+   for(i = 0; i < C.hpad; i++)
+      for(j = 0; j < C.stride; j++) {
+         C.elements[i*C.stride+j] = 0.0;
+         Cref.elements[i*Cref.stride+j] = 0.0;
+      }
+
+   simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
+   tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
+
+   verify(C.elements, Cref.elements, C.height * C.stride);
+   return 0;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+const float* B,const int LDB, const float beta,float* C, const int LDC) {
+   /*  A,B, and C  are in row-major order */
+   int c_row,c_col,inner;
+   float sum;
+   for (c_col  = 0 ;  c_col<N; c_col++ ) {
+      for (c_row = 0 ; c_row<M; c_row++ ) {
+         sum = 0.0 ;
+         for (inner = 0 ; inner<K; inner++ ) {
+            sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
+         }
+         C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
+      }
+   }
+}
+
+/***************************
+
+   tiled_sgemm_tt:  Tiled matrix multiplication:
+
+***************************/
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE) {
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE) {
+
+// We now have M/BLOCK_SIZE * N/BLOCK_SIZE teams = (M*N)/(BLOCK_SIZE*BLOCK_SIZE)
+// The grid global dimensions are M,N,1
+// The grid local dimensions are BLOCK_SIZE,BLOCK_SIZE,1
+
+// -------------------------------------------------------------------
+//      The rest of this code forms the HSAIL kernel with the
+//      pairs of "paralell for collapse(2)" loops repalced with a barrier.
+//      The kernel initializes these values
+//      C_row_start = get_group_id(0) * BLOCK_SIZE
+//      C_col_start = get_group_id(1) * BLOCK_SIZE
+//      row=get_local_id(0)
+//      col=get_local_id(1)
+// -------------------------------------------------------------------
+
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+         float Cs[BLOCK_SIZE][BLOCK_SIZE];
+         int C_row, C_col;
+
+         /* Zero Cs for this BLOCK */
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++) {
+            for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+               Cs[row][col] = 0.0;
+            }
+         }
+
+         // This kblock loop is run on the master thread of each team
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE ) {
+
+            // Copy global memory values to local memory
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+            for (int row=0 ; row < BLOCK_SIZE ; row++) {
+               for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+                  C_row = C_row_start + row;
+                  C_col = C_col_start + col;
+		  if ((C_row < M) && (kblock + col < K))
+		    As[row][col] = A[(C_row*LDA)+ kblock + col];
+		  else
+		    As[row][col] = 0;
+		  if ((kblock + row < K) && C_col < N)
+		    Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		  else
+		    Bs[row][col] = 0;
+               }
+            }
+
+            // Calculate Cs <- Sum(As X Bs) across all kblocks
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+            for (int row=0 ; row < BLOCK_SIZE ; row++) {
+               for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+                  for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cs[row][col] += As[row][e] * Bs[e][col];
+                }
+            }
+
+         }  /* End for kblock .. */
+
+
+         // Scale Update actual C from Cs
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++) {
+            for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N)) {
+		 C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
+	       }
+            }
+         }
+
+// -------------------------------------------------------------------
+// This is the end of the kernel
+
+      }
+   }
+
+}
-- 
2.10.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/4] HSA specific built-ins
  2016-11-13 23:20 [PATCH 0/4] Merge from HSA branch to trunk Martin Jambor
  2016-11-13 23:20 ` [PATCH 4/4] Back-end and IPA bits of hsa branch merge Martin Jambor
  2016-11-13 23:20 ` [PATCH 1/4] Remove build dependence on HSA run-time Martin Jambor
@ 2016-11-13 23:20 ` Martin Jambor
  2016-11-18 10:27   ` Jakub Jelinek
  2016-11-13 23:20 ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
  3 siblings, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-11-13 23:20 UTC (permalink / raw)
  To: GCC Patches

Hello,

this patch adds a small file hsa-builtins.def which defines a few
builtins that I then use in OpenMP lowering and expansion.

After we split gridification stuff in omp-low.c to a separate file, we
should be able to only conditionally include the file and remove the
weird conditional ifdef.

OK for trunk?

Thanks,

Martin


2016-11-11  Martin Jambor  <mjambor@suse.cz>

gcc/
	* hsa-builtins.def: New file.
	* Makefile.in (BUILTINS_DEF): Add hsa-builtins.def dependency.
	* builtins.def: Include hsa-builtins.def.
	(DEF_HSA_BUILTIN): New macro.

fortran/
	* f95-lang.c (DEF_HSA_BUILTIN): New macro.
---
 gcc/Makefile.in        |  3 ++-
 gcc/builtins.def       | 16 ++++++++++++++++
 gcc/fortran/f95-lang.c | 11 +++++++++++
 gcc/hsa-builtins.def   | 39 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+), 1 deletion(-)
 create mode 100644 gcc/hsa-builtins.def

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7ecd1e4..4e64960 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -911,7 +911,8 @@ RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
 READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 PARAMS_H = params.h params-enum.h params.def
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
-	gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def
+	gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def \
+	hsa-builtins.def
 INTERNAL_FN_DEF = internal-fn.def
 INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
 TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 219feeb..4e8f140 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -201,6 +201,19 @@ along with GCC; see the file COPYING3.  If not see
 		|| flag_cilkplus \
 		|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
+#undef DEF_HSA_BUILTIN
+#ifdef ENABLE_HSA
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS)			\
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
+               false, false, true, ATTRS, false, \
+	       (!flag_disable_hsa))
+#else
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS)			\
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
+               false, false, true, ATTRS, false, \
+	       (false))
+#endif
+
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
    by the compiler but a few are implemented in libcilkrts.  */ 
 #undef DEF_CILK_BUILTIN_STUB
@@ -968,6 +981,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
 /* Offloading and Multi Processing builtins.  */
 #include "omp-builtins.def"
 
+/* Heterogeneous Systems Architecture.  */
+#include "hsa-builtins.def"
+
 /* Cilk keywords builtins.  */
 #include "cilk-builtins.def"
 
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index cea6675..22d29da 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -1224,6 +1224,17 @@ gfc_init_builtin_functions (void)
 #undef DEF_GOMP_BUILTIN
     }
 
+#ifdef ENABLE_HSA
+  if (!flag_disable_hsa)
+    {
+#undef DEF_HSA_BUILTIN
+#define DEF_HSA_BUILTIN(code, name, type, attr) \
+      gfc_define_builtin ("__builtin_" name, builtin_types[type], \
+			  code, name, attr);
+#include "../hsa-builtins.def"
+    }
+#endif
+
   gfc_define_builtin ("__builtin_trap", builtin_types[BT_FN_VOID],
 		      BUILT_IN_TRAP, NULL, ATTR_NOTHROW_LEAF_LIST);
   TREE_THIS_VOLATILE (builtin_decl_explicit (BUILT_IN_TRAP)) = 1;
diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def
new file mode 100644
index 0000000..cc0409e
--- /dev/null
+++ b/gcc/hsa-builtins.def
@@ -0,0 +1,39 @@
+/* This file contains the definitions and documentation for the
+   Offloading and Multi Processing builtins used in the GNU compiler.
+   Copyright (C) 2005-2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Before including this file, you should define a macro:
+
+     DEF_HSA_BUILTIN (ENUM, NAME, TYPE, ATTRS)
+
+   See builtins.def for details.  */
+
+/* The reason why they aren't in gcc/builtins.def is that the Fortran front end
+   doesn't source those.  */
+
+DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKGROUPID, "hsa_workgroupid",
+	  	 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMID, "hsa_workitemid",
+	  	 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMABSID, "hsa_workitemabsid",
+	  	 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_GRIDSIZE, "hsa_gridsize",
+		 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_CURRENTWORKGROUPSIZE, "hsa_currentworkgroupsize",
+		 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-- 
2.10.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4/4] Back-end and IPA bits of hsa branch merge
  2016-11-13 23:20 [PATCH 0/4] Merge from HSA branch to trunk Martin Jambor
@ 2016-11-13 23:20 ` Martin Jambor
       [not found]   ` <yxfpftb48jra.fsf@hertz.schwinge.homeip.net>
  2016-11-13 23:20 ` [PATCH 1/4] Remove build dependence on HSA run-time Martin Jambor
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-11-13 23:20 UTC (permalink / raw)
  To: GCC Patches

Hi,

so this patch bundles together all the various fixes, cleanups and
improvements to the HSAIL generation itself which are far too many to
list here individually, more details can be found in the email
messages that i sent when committing a given change to the branch.

As the HSA maintainer I am going to approve this after the previous
two patches are approved by others, but if anybody has any comment or
suggestion, I will be glad to know.

Thanks,

Martin



2016-11-11  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>

	* hsa.h (hsa_bb): Add method method append_phi.
	(hsa_insn_br): Renamed to hsa_insn_cbr, renamed all
	occurences in all files too.
	(hsa_insn_br): New class, now the ancestor of hsa_incn_cbr.
	(is_a_helper <hsa_insn_br *>::test): New function.
	(is_a_helper <hsa_insn_cbr *>::test): Adjust to only cover conditional
	branch instructions.
	(hsa_insn_signal): Make a direct descendant of
	hsa_insn_basic.  Add memorder constructor parameter and
	m_memory_order and m_signalop member variables.
	(hsa_insn_queue): Changed constructor parameters to common form.
	Added m_segment and m_memory_order member variables.
	(hsa_summary_t): Add private member function
	process_gpu_implementation_attributes.
	(hsa_function_summary): Rename m_binded_function to
	m_bound_function.
	(hsa_insn_basic_p): Remove typedef.
	(hsa_op_with_type): Change hsa_insn_basic_p into plain pointers.
	(hsa_op_reg_p): Remove typedef.
	(hsa_function_representation): Change hsa_op_reg_p into plain
	pointers.
	(hsa_insn_phi): Removed new and delete operators.
	(hsa_insn_br): Likewise.
	(hsa_insn_cbr): Likewise.
	(hsa_insn_sbr): Likewise.
	(hsa_insn_cmp): Likewise.
	(hsa_insn_mem): Likewise.
	(hsa_insn_atomic): Likewise.
	(hsa_insn_signal): Likewise.
	(hsa_insn_seg): Likewise.
	(hsa_insn_call): Likewise.
	(hsa_insn_arg_block): Likewise.
	(hsa_insn_comment): Likewise.
	(hsa_insn_srctype): Likewise.
	(hsa_insn_packed): Likewise.
	(hsa_insn_cvt): Likewise.
	(hsa_insn_alloca): Likewise.

	* hsa.c (hsa_destroy_insn): Also handle instances of hsa_insn_br.
	(process_gpu_implementation_attributes): New function.
	(link_functions): Move some functionality into it.  Adjust after
	renaming m_binded_functions to m_bound_functions.
	(hsa_insn_basic::op_output_p): Add BRIG_OPCODE_DEBUGTRAP
	to the list of instructions with no output registers.
	(get_in_type): Return this if it is a register of
	matching size.
	(hsa_get_declaration_name): Moved to...

        * hsa-gen.c (hsa_get_declaration_name): ...here.  Allocate
	temporary string on an obstack instead from ggc.
	(query_hsa_grid): Renamed to query_hsa_grid_dim, reimplemented, cut
	down to two overloads.
	(hsa_allocp_operand_address): Removed.
	(hsa_allocp_operand_immed): Likewise.
	(hsa_allocp_operand_reg): Likewise.
	(hsa_allocp_operand_code_list): Likewise.
	(hsa_allocp_operand_operand_list): Likewise.
	(hsa_allocp_inst_basic): Likewise.
	(hsa_allocp_inst_phi): Likewise.
	(hsa_allocp_inst_mem): Likewise.
	(hsa_allocp_inst_atomic): Likewise.
	(hsa_allocp_inst_signal): Likewise.
	(hsa_allocp_inst_seg): Likewise.
	(hsa_allocp_inst_cmp): Likewise.
	(hsa_allocp_inst_br): Likewise.
	(hsa_allocp_inst_sbr): Likewise.
	(hsa_allocp_inst_call): Likewise.
	(hsa_allocp_inst_arg_block): Likewise.
	(hsa_allocp_inst_comment): Likewise.
	(hsa_allocp_inst_queue): Likewise.
	(hsa_allocp_inst_srctype): Likewise.
	(hsa_allocp_inst_packed): Likewise.
	(hsa_allocp_inst_cvt): Likewise.
	(hsa_allocp_inst_alloca): Likewise.
	(hsa_allocp_bb): Likewise.
	(hsa_obstack): New.
	(hsa_init_data_for_cfun): Initialize obstack.
	(hsa_deinit_data_for_cfun): Release memory of the obstack.
	(hsa_op_immed::operator new): Use obstack instead of object_allocator.
	(hsa_op_reg::operator new): Likewise.
	(hsa_op_address::operator new): Likewise.
	(hsa_op_code_list::operator new): Likewise.
	(hsa_op_operand_list::operator new): Likewise.
	(hsa_insn_basic::operator new): Likewise.
	(hsa_insn_phi::operator new): Likewise.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_sbr::operator new): Likewise.
	(hsa_insn_cmp::operator new): Likewise.
	(hsa_insn_mem::operator new): Likewise.
	(hsa_insn_atomic::operator new): Likewise.
	(hsa_insn_signal::operator new): Likewise.
	(hsa_insn_seg::operator new): Likewise.
	(hsa_insn_call::operator new): Likewise.
	(hsa_insn_arg_block::operator new): Likewise.
	(hsa_insn_comment::operator new): Likewise.
	(hsa_insn_srctype::operator new): Likewise.
	(hsa_insn_packed::operator new): Likewise.
	(hsa_insn_cvt::operator new): Likewise.
	(hsa_insn_alloca::operator new): Likewise.
	(hsa_init_new_bb): Likewise.
	(hsa_bb::append_phi): New function.
	(gen_hsa_phi_from_gimple_phi): Use it.
	(get_symbol_for_decl): Fix dinstinguishing between
	global and local functions.  Put local variables into a segment
	according to their attribute or static flag, if there is one.
	(hsa_insn_br::hsa_insn_br): New.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_cbr::hsa_insn_cbr): Set width via ancestor constructor.
	(query_hsa_grid_nodim): New function.
	(multiply_grid_dim_characteristics): Likewise.
	(gen_get_num_threads): Likewise.
	(gen_get_num_teams): Reimplemented.
	(gen_get_team_num): Likewise.
	(gen_hsa_insns_for_known_library_call): Updated calls to the above
	helper functions.
	(get_memory_order_name): Removed.
	(get_memory_order): Likewise.
	(hsa_memorder_from_tree): New function.
	(gen_hsa_ternary_atomic_for_builtin): Renamed to
	gen_hsa_atomic_for_builtin, can also create signals.
	(gen_hsa_insns_for_call): Handle many new builtins.  Adjust to use
	hsa_memory_order_from_tree and gen_hsa_atomic_for_builtin.
	(hsa_insn_atomic): Fix function comment.
	(hsa_insn_signal::hsa_insn_signal): Fix comment.  Update call to
	ancestor constructor and initialization of new member variables.
	(hsa_insn_queue::hsa_insn_queue): Added initialization of new
	member variables.
	(hsa_get_host_function): Handle functions with no bound CPU
	implementation.  Fix binded to bound.
	(get_brig_function_name): Likewise.
	(HSA_SORRY_ATV): Remove semicolon after macro.
	(HSA_SORRY_AT): Likewise.
	(omp_simple_builtin::generate): Add missing semicolons.
	(hsa_insn_phi::operator new): Removed.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_cbr::operator new): Likewise.
	(hsa_insn_sbr::operator new): Likewise.
	(hsa_insn_cmp::operator new): Likewise.
	(hsa_insn_mem::operator new): Likewise.
	(hsa_insn_atomic::operator new): Likewise.
	(hsa_insn_signal::operator new): Likewise.
	(hsa_insn_seg::operator new): Likewise.
	(hsa_insn_call::operator new): Likewise.
	(hsa_insn_arg_block::operator new): Likewise.
	(hsa_insn_comment::operator new): Likewise.
	(hsa_insn_srctype::operator new): Likewise.
	(hsa_insn_packed::operator new): Likewise.
	(hsa_insn_cvt::operator new): Likewise.
	(hsa_insn_alloca::operator new): Likewise.
	(get_symbol_for_decl): Accept CONST_DECLs, put them to
	readonly segment.
	(gen_hsa_addr): Also process CONST_DECLs.
	(gen_hsa_addr_insns): Process CONST_DECLs by creating private
	copies.
	(gen_hsa_unary_operation): Make sure the function does
	not use bittype source type for firstbit and lastbit operations.
	(gen_hsa_popcount_to_dest): Make sure the function uses a bittype
	source type.

	* hsa-brig.c (emit_insn_operands): Cope with zero operands in an
	instruction.
	(emit_branch_insn): Renamed to emit_cond_branch_insn.
	Emit the width stored in the class.
	(emit_generic_branch_insn): New function.
	(emit_insn): Call emit_generic_branch_insn.
	(emit_signal_insn): Remove obsolete comment.  Update
	member variable name, pick a type according to profile.
	(emit_alloca_insn): Remove obsolete comment.
	(emit_atomic_insn): Likewise.
	(emit_queue_insn): Get segment and memory order from the IR object.
	(hsa_brig_section): Make allocate_new_chunk, chunks
	and cur_chunk provate, add a default NULL parameter to add method.
	(hsa_brig_section::add): Added a new parameter, store pointer to
	output data there if it is non-NULL.
	(emit_function_directives): Use this new parameter instead of
	calculating the pointer itself, fix function comment.
	(hsa_brig_emit_function): Add forgotten endian conversion.
	(hsa_output_kernels): Remove unnecessary building of
	kernel_dependencies_vector_type.
	(emit_immediate_operand): Declare.
	(emit_directive_variable): Also emit initializers of CONST_DECLs.
	(gen_hsa_insn_for_internal_fn_call): Also handle IFN_RSQRT.
	(verify_function_arguments): Properly detect variadic
	arguments.

	* hsa-dump.c (hsa_width_specifier_name): New function.
	(dump_hsa_insn_1): Dump generic branch instructions, update signal
	member variable name.  Special dumping for queue objects.

	* ipa-hsa.c (process_hsa_functions): Adjust after renaming
	m_binded_functions to m_bound_functions.  Copy externally visible flag
	to the node.
	(ipa_hsa_write_summary): Likewise.
	(ipa_hsa_read_section): Likewise.

libgomp/
	* testsuite/libgomp.hsa.c/bits-insns.c: New test.
---
 gcc/hsa-brig.c                               | 140 ++--
 gcc/hsa-dump.c                               | 107 +++-
 gcc/hsa-gen.c                                | 914 ++++++++++++++-------------
 gcc/hsa.c                                    |  60 +-
 gcc/hsa.h                                    | 157 ++---
 gcc/ipa-hsa.c                                |  14 +-
 libgomp/testsuite/libgomp.hsa.c/bits-insns.c |  73 +++
 7 files changed, 838 insertions(+), 627 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bits-insns.c

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 66ff8f9..acd9164 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -161,19 +161,21 @@ public:
   /* The size of the header of the section without any padding.  */
   unsigned header_byte_delta;
 
-  /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
-  vec <struct hsa_brig_data_chunk> chunks;
-
-  /* More convenient access to the last chunk from the vector above.  */
-  struct hsa_brig_data_chunk *cur_chunk;
-
-  void allocate_new_chunk ();
   void init (const char *name);
   void release ();
   void output ();
-  unsigned add (const void *data, unsigned len);
+  unsigned add (const void *data, unsigned len, void **output = NULL);
   void round_size_up (int factor);
   void *get_ptr_by_offset (unsigned int offset);
+
+private:
+  void allocate_new_chunk ();
+
+  /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
+  vec <struct hsa_brig_data_chunk> chunks;
+
+  /* More convenient access to the last chunk from the vector above.  */
+  struct hsa_brig_data_chunk *cur_chunk;
 };
 
 static struct hsa_brig_section brig_data, brig_code, brig_operand;
@@ -271,10 +273,11 @@ hsa_brig_section::output ()
 }
 
 /* Add to the stream LEN bytes of opaque binary DATA.  Return the offset at
-   which it was stored.  */
+   which it was stored.  If OUTPUT is not NULL, store into it the pointer to
+   the place where DATA was actually stored.  */
 
 unsigned
-hsa_brig_section::add (const void *data, unsigned len)
+hsa_brig_section::add (const void *data, unsigned len, void **output)
 {
   unsigned offset = total_size;
 
@@ -282,7 +285,10 @@ hsa_brig_section::add (const void *data, unsigned len)
   if (cur_chunk->size > (BRIG_CHUNK_MAX_SIZE - len))
     allocate_new_chunk ();
 
-  memcpy (cur_chunk->data + cur_chunk->size, data, len);
+  char *dst = cur_chunk->data + cur_chunk->size;
+  memcpy (dst, data, len);
+  if (output)
+    *output = dst;
   cur_chunk->size += len;
   total_size += len;
 
@@ -565,6 +571,7 @@ enqueue_op (hsa_op_base *op)
   return ret;
 }
 
+static void emit_immediate_operand (hsa_op_immed *imm);
 
 /* Emit directive describing a symbol if it has not been emitted already.
    Return the offset of the directive.  */
@@ -603,7 +610,14 @@ emit_directive_variable (struct hsa_symbol *symbol)
     }
 
   dirvar.name = lendian32 (name_offset);
-  dirvar.init = 0;
+
+  if (symbol->m_decl && TREE_CODE (symbol->m_decl) == CONST_DECL)
+    {
+      hsa_op_immed *tmp = new hsa_op_immed (DECL_INITIAL (symbol->m_decl));
+      dirvar.init = lendian32 (enqueue_op (tmp));
+    }
+  else
+    dirvar.init = 0;
   dirvar.type = lendian16 (symbol->m_type);
   dirvar.segment = symbol->m_segment;
   dirvar.align = symbol->m_align;
@@ -626,8 +640,12 @@ emit_directive_variable (struct hsa_symbol *symbol)
   return symbol->m_directive_offset;
 }
 
-/* Emit directives describing either a function declaration or
-   definition F.  */
+/* Emit directives describing either a function declaration or definition F and
+   return the produced BrigDirectiveExecutable structure.  The function does
+   not take into account any instructions when calculating nextModuleEntry
+   field of the produced BrigDirectiveExecutable structure so when emitting
+   actual definitions, this field needs to be updated after all of the function
+   is actually added to the code section.  */
 
 static BrigDirectiveExecutable *
 emit_function_directives (hsa_function_representation *f, bool is_declaration)
@@ -635,7 +653,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   struct BrigDirectiveExecutable fndir;
   unsigned name_offset, inarg_off, scoped_off, next_toplev_off;
   int count = 0;
-  BrigDirectiveExecutable *ptr_to_fndir;
+  void *ptr_to_fndir;
   hsa_symbol *sym;
 
   if (!f->m_declaration_p)
@@ -693,17 +711,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
       *slot = int_fn;
     }
 
-  brig_code.add (&fndir, sizeof (fndir));
-  /* terrible hack: we need to set instCount after we emit all
-     insns, but we need to emit directive in order, and we emit directives
-     during insn emitting.  So we need to emit the FUNCTION directive
-     early, then the insns, and then we need to set instCount, so remember
-     a pointer to it, in some horrible way.  cur_chunk.data+size points
-     directly to after fndir here.  */
-  ptr_to_fndir
-      = (BrigDirectiveExecutable *)(brig_code.cur_chunk->data
-				    + brig_code.cur_chunk->size
-				    - sizeof (fndir));
+  brig_code.add (&fndir, sizeof (fndir), &ptr_to_fndir);
 
   if (f->m_output_arg)
     emit_directive_variable (f->m_output_arg);
@@ -724,7 +732,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
 	}
     }
 
-  return ptr_to_fndir;
+  return (BrigDirectiveExecutable *) ptr_to_fndir;
 }
 
 /* Emit a label directive for the given HBB.  We assume it is about to start on
@@ -1237,20 +1245,20 @@ emit_insn_operands (hsa_insn_basic *insn)
     operand_offsets;
 
   unsigned l = insn->operand_count ();
-  operand_offsets.safe_grow (l);
-
-  for (unsigned i = 0; i < l; i++)
-    operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
   /* We have N operands so use 4 * N for the byte_count.  */
   uint32_t byte_count = lendian32 (4 * l);
-
   unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
-  brig_data.add (operand_offsets.address (),
-		 l * sizeof (BrigOperandOffset32_t));
+  if (l > 0)
+    {
+      operand_offsets.safe_grow (l);
+      for (unsigned i = 0; i < l; i++)
+	operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
+      brig_data.add (operand_offsets.address (),
+		     l * sizeof (BrigOperandOffset32_t));
+    }
   brig_data.round_size_up (4);
-
   return offset;
 }
 
@@ -1334,10 +1342,6 @@ emit_signal_insn (hsa_insn_signal *mem)
 {
   struct BrigInstSignal repr;
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_SIGNAL);
@@ -1345,9 +1349,9 @@ emit_signal_insn (hsa_insn_signal *mem)
   repr.base.type = lendian16 (mem->m_type);
   repr.base.operands = lendian32 (emit_insn_operands (mem));
 
-  repr.memoryOrder = mem->m_memoryorder;
-  repr.signalOperation = mem->m_atomicop;
-  repr.signalType = BRIG_TYPE_SIG64;
+  repr.memoryOrder = mem->m_memory_order;
+  repr.signalOperation = mem->m_signalop;
+  repr.signalType = hsa_machine_large_p () ? BRIG_TYPE_SIG64 : BRIG_TYPE_SIG32;
 
   brig_code.add (&repr, sizeof (repr));
   brig_insn_count++;
@@ -1368,10 +1372,6 @@ emit_atomic_insn (hsa_insn_atomic *mem)
   else
     addr = as_a <hsa_op_address *> (mem->get_op (1));
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_ATOMIC);
@@ -1448,10 +1448,6 @@ emit_alloca_insn (hsa_insn_alloca *alloca)
   struct BrigInstMem repr;
   gcc_checking_assert (alloca->operand_count () == 2);
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_MEM);
@@ -1497,11 +1493,29 @@ emit_cmp_insn (hsa_insn_cmp *cmp)
   brig_insn_count++;
 }
 
-/* Emit an HSA branching instruction and all necessary directives, schedule
-   necessary operands for writing.  */
+/* Emit an HSA generic branching/sycnronization instruction.  */
+
+static void
+emit_generic_branch_insn (hsa_insn_br *br)
+{
+  struct BrigInstBr repr;
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (br->m_opcode);
+  repr.width = br->m_width;
+  repr.base.type = lendian16 (br->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (br));
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA conditional branching instruction and all necessary directives,
+   schedule necessary operands for writing.  */
 
 static void
-emit_branch_insn (hsa_insn_br *br)
+emit_cond_branch_insn (hsa_insn_cbr *br)
 {
   struct BrigInstBr repr;
 
@@ -1514,7 +1528,7 @@ emit_branch_insn (hsa_insn_br *br)
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
   repr.base.opcode = lendian16 (br->m_opcode);
-  repr.width = BRIG_WIDTH_1;
+  repr.width = br->m_width;
   /* For Conditional jumps the type is always B1.  */
   repr.base.type = lendian16 (BRIG_TYPE_B1);
 
@@ -1730,8 +1744,8 @@ emit_queue_insn (hsa_insn_queue *insn)
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_QUEUE);
   repr.base.opcode = lendian16 (insn->m_opcode);
   repr.base.type = lendian16 (insn->m_type);
-  repr.segment = BRIG_SEGMENT_GLOBAL;
-  repr.memoryOrder = BRIG_MEMORY_ORDER_SC_RELEASE;
+  repr.segment = insn->m_segment;
+  repr.memoryOrder = insn->m_memory_order;
   repr.base.operands = lendian32 (emit_insn_operands (insn));
   brig_data.round_size_up (4);
   brig_code.add (&repr, sizeof (repr));
@@ -1886,8 +1900,8 @@ emit_insn (hsa_insn_basic *insn)
     emit_segment_insn (seg);
   else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
     emit_cmp_insn (cmp);
-  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
-    emit_branch_insn (br);
+  else if (hsa_insn_cbr *br = dyn_cast <hsa_insn_cbr *> (insn))
+    emit_cond_branch_insn (br);
   else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
     {
       if (switch_instructions == NULL)
@@ -1896,6 +1910,8 @@ emit_insn (hsa_insn_basic *insn)
       switch_instructions->safe_push (sbr);
       emit_switch_insn (sbr);
     }
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    emit_generic_branch_insn (br);
   else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
     emit_arg_block_insn (block);
   else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
@@ -2006,7 +2022,7 @@ hsa_brig_emit_function (void)
       prev_bb = bb;
     }
   perhaps_emit_branch (prev_bb, NULL);
-  ptr_to_fndir->nextModuleEntry = brig_code.total_size;
+  ptr_to_fndir->nextModuleEntry = lendian32 (brig_code.total_size);
 
   /* Fill up label references for all sbr instructions.  */
   if (switch_instructions)
@@ -2225,11 +2241,6 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
       tree gridified_kernel_p_tree = build_int_cstu (boolean_type_node,
 						     gridified_kernel_p);
       unsigned count = 0;
-
-      kernel_dependencies_vector_type
-	= build_array_type (build_pointer_type (char_type_node),
-			    build_index_type (size_int (0)));
-
       vec<constructor_elt, va_gc> *kernel_dependencies_vec = NULL;
       if (hsa_decl_kernel_dependencies)
 	{
@@ -2279,6 +2290,7 @@ hsa_output_kernels (tree *host_func_table, tree *kernels)
       if (count > 0)
 	{
 	  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_dependencies_list", i);
+	  gcc_checking_assert (kernel_dependencies_vector_type);
 	  tree dependencies_list = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 					       get_identifier (tmp_name),
 					       kernel_dependencies_vector_type);
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 985caca..7e3b9f0 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -621,6 +621,88 @@ hsa_m_atomicop_name (enum BrigAtomicOperation op)
     }
 }
 
+/* Return textual name for atomic operation.  */
+
+static const char *
+hsa_width_specifier_name (BrigWidth8_t width)
+{
+  switch (width)
+    {
+    case BRIG_WIDTH_NONE:
+      return "none";
+    case BRIG_WIDTH_1:
+      return "1";
+    case BRIG_WIDTH_2:
+      return "2";
+    case BRIG_WIDTH_4:
+      return "4";
+    case BRIG_WIDTH_8:
+      return "8";
+    case BRIG_WIDTH_16:
+      return "16";
+    case BRIG_WIDTH_32:
+      return "32";
+    case BRIG_WIDTH_64:
+      return "64";
+    case BRIG_WIDTH_128:
+      return "128";
+    case BRIG_WIDTH_256:
+      return "256";
+    case BRIG_WIDTH_512:
+      return "512";
+    case BRIG_WIDTH_1024:
+      return "1024";
+    case BRIG_WIDTH_2048:
+      return "2048";
+    case BRIG_WIDTH_4096:
+      return "4096";
+    case BRIG_WIDTH_8192:
+      return "8192";
+    case BRIG_WIDTH_16384:
+      return "16384";
+    case BRIG_WIDTH_32768:
+      return "32768";
+    case BRIG_WIDTH_65536:
+      return "65536";
+    case BRIG_WIDTH_131072:
+      return "131072";
+    case BRIG_WIDTH_262144:
+      return "262144";
+    case BRIG_WIDTH_524288:
+      return "524288";
+    case BRIG_WIDTH_1048576:
+      return "1048576";
+    case BRIG_WIDTH_2097152:
+      return "2097152";
+    case BRIG_WIDTH_4194304:
+      return "4194304";
+    case BRIG_WIDTH_8388608:
+      return "8388608";
+    case BRIG_WIDTH_16777216:
+      return "16777216";
+    case BRIG_WIDTH_33554432:
+      return "33554432";
+    case BRIG_WIDTH_67108864:
+      return "67108864";
+    case BRIG_WIDTH_134217728:
+      return "134217728";
+    case BRIG_WIDTH_268435456:
+      return "268435456";
+    case BRIG_WIDTH_536870912:
+      return "536870912";
+    case BRIG_WIDTH_1073741824:
+      return "1073741824";
+    case BRIG_WIDTH_2147483648:
+      return "2147483648";
+    case BRIG_WIDTH_WAVESIZE:
+      return "wavesize";
+    case BRIG_WIDTH_ALL:
+      return "all";
+    default:
+      return "UNKNOWN_WIDTH";
+    }
+}
+
 /* Dump textual representation of HSA IL register REG to file F.  */
 
 static void
@@ -793,9 +875,9 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
       hsa_insn_signal *mem = as_a <hsa_insn_signal *> (insn);
 
       fprintf (f, "%s", hsa_opcode_name (mem->m_opcode));
-      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_atomicop));
-      if (mem->m_memoryorder != BRIG_MEMORY_ORDER_NONE)
-	fprintf (f, "_%s", hsa_memsem_name (mem->m_memoryorder));
+      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_signalop));
+      if (mem->m_memory_order != BRIG_MEMORY_ORDER_NONE)
+	fprintf (f, "_%s", hsa_memsem_name (mem->m_memory_order));
       fprintf (f, "_%s ", hsa_type_name (mem->m_type));
 
       dump_hsa_operands (f, mem);
@@ -884,9 +966,9 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
       fprintf (f, ", ");
       dump_hsa_operand (f, cmp->get_op (2));
     }
-  else if (is_a <hsa_insn_br *> (insn))
+  else if (is_a <hsa_insn_cbr *> (insn))
     {
-      hsa_insn_br *br = as_a <hsa_insn_br *> (insn);
+      hsa_insn_cbr *br = as_a <hsa_insn_cbr *> (insn);
       basic_block target = NULL;
       edge_iterator ei;
       edge e;
@@ -921,6 +1003,12 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 	    fprintf (f, ", ");
 	}
     }
+  else if (is_a <hsa_insn_br *> (insn))
+    {
+      hsa_insn_br *br = as_a <hsa_insn_br *> (insn);
+      fprintf (f, "%s_width(%s) ", hsa_opcode_name (br->m_opcode),
+	       hsa_width_specifier_name (br->m_width));
+    }
   else if (is_a <hsa_insn_arg_block *> (insn))
     {
       hsa_insn_arg_block *arg_block = as_a <hsa_insn_arg_block *> (insn);
@@ -1018,6 +1106,15 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 
       dump_hsa_operands (f, insn);
     }
+  else if (hsa_insn_queue *qi = dyn_cast <hsa_insn_queue *> (insn))
+    {
+      fprintf (f, "%s_%s_%s_%s ", hsa_opcode_name (qi->m_opcode),
+	       hsa_seg_name (qi->m_segment),
+	       hsa_memsem_name (qi->m_memory_order),
+	       hsa_type_name (qi->m_type));
+
+      dump_hsa_operands (f, qi);
+    }
   else
     {
       fprintf (f, "%s_%s ", hsa_opcode_name (insn->m_opcode),
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 21c35e6..a88294e 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -39,7 +39,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "diagnostic-core.h"
-#include "alloc-pool.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "stringpool.h"
@@ -72,7 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 		    HSA_SORRY_MSG)) \
       inform (location, message, __VA_ARGS__); \
   } \
-  while (false);
+  while (false)
 
 /* Same as previous, but highlight a location.  */
 
@@ -84,7 +83,7 @@ along with GCC; see the file COPYING3.  If not see
 		    HSA_SORRY_MSG)) \
       inform (location, message); \
   } \
-  while (false);
+  while (false)
 
 /* Default number of threads used by kernel dispatch.  */
 
@@ -127,31 +126,7 @@ struct hsa_queue
   uint64_t id;
 };
 
-/* Alloc pools for allocating basic hsa structures such as operands,
-   instructions and other basic entities.  */
-static object_allocator<hsa_op_address> *hsa_allocp_operand_address;
-static object_allocator<hsa_op_immed> *hsa_allocp_operand_immed;
-static object_allocator<hsa_op_reg> *hsa_allocp_operand_reg;
-static object_allocator<hsa_op_code_list> *hsa_allocp_operand_code_list;
-static object_allocator<hsa_op_operand_list> *hsa_allocp_operand_operand_list;
-static object_allocator<hsa_insn_basic> *hsa_allocp_inst_basic;
-static object_allocator<hsa_insn_phi> *hsa_allocp_inst_phi;
-static object_allocator<hsa_insn_mem> *hsa_allocp_inst_mem;
-static object_allocator<hsa_insn_atomic> *hsa_allocp_inst_atomic;
-static object_allocator<hsa_insn_signal> *hsa_allocp_inst_signal;
-static object_allocator<hsa_insn_seg> *hsa_allocp_inst_seg;
-static object_allocator<hsa_insn_cmp> *hsa_allocp_inst_cmp;
-static object_allocator<hsa_insn_br> *hsa_allocp_inst_br;
-static object_allocator<hsa_insn_sbr> *hsa_allocp_inst_sbr;
-static object_allocator<hsa_insn_call> *hsa_allocp_inst_call;
-static object_allocator<hsa_insn_arg_block> *hsa_allocp_inst_arg_block;
-static object_allocator<hsa_insn_comment> *hsa_allocp_inst_comment;
-static object_allocator<hsa_insn_queue> *hsa_allocp_inst_queue;
-static object_allocator<hsa_insn_srctype> *hsa_allocp_inst_srctype;
-static object_allocator<hsa_insn_packed> *hsa_allocp_inst_packed;
-static object_allocator<hsa_insn_cvt> *hsa_allocp_inst_cvt;
-static object_allocator<hsa_insn_alloca> *hsa_allocp_inst_alloca;
-static object_allocator<hsa_bb> *hsa_allocp_bb;
+static struct obstack hsa_obstack;
 
 /* List of pointers to all instructions that come from an object allocator.  */
 static vec <hsa_insn_basic *> hsa_instructions;
@@ -486,52 +461,7 @@ static void
 hsa_init_data_for_cfun ()
 {
   hsa_init_compilation_unit_data ();
-  hsa_allocp_operand_address
-    = new object_allocator<hsa_op_address> ("HSA address operands");
-  hsa_allocp_operand_immed
-    = new object_allocator<hsa_op_immed> ("HSA immediate operands");
-  hsa_allocp_operand_reg
-    = new object_allocator<hsa_op_reg> ("HSA register operands");
-  hsa_allocp_operand_code_list
-    = new object_allocator<hsa_op_code_list> ("HSA code list operands");
-  hsa_allocp_operand_operand_list
-    = new object_allocator<hsa_op_operand_list> ("HSA operand list operands");
-  hsa_allocp_inst_basic
-    = new object_allocator<hsa_insn_basic> ("HSA basic instructions");
-  hsa_allocp_inst_phi
-    = new object_allocator<hsa_insn_phi> ("HSA phi operands");
-  hsa_allocp_inst_mem
-    = new object_allocator<hsa_insn_mem> ("HSA memory instructions");
-  hsa_allocp_inst_atomic
-    = new object_allocator<hsa_insn_atomic> ("HSA atomic instructions");
-  hsa_allocp_inst_signal
-    = new object_allocator<hsa_insn_signal> ("HSA signal instructions");
-  hsa_allocp_inst_seg
-    = new object_allocator<hsa_insn_seg> ("HSA segment conversion "
-					  "instructions");
-  hsa_allocp_inst_cmp
-    = new object_allocator<hsa_insn_cmp> ("HSA comparison instructions");
-  hsa_allocp_inst_br
-    = new object_allocator<hsa_insn_br> ("HSA branching instructions");
-  hsa_allocp_inst_sbr
-    = new object_allocator<hsa_insn_sbr> ("HSA switch branching instructions");
-  hsa_allocp_inst_call
-    = new object_allocator<hsa_insn_call> ("HSA call instructions");
-  hsa_allocp_inst_arg_block
-    = new object_allocator<hsa_insn_arg_block> ("HSA arg block instructions");
-  hsa_allocp_inst_comment
-    = new object_allocator<hsa_insn_comment> ("HSA comment instructions");
-  hsa_allocp_inst_queue
-    = new object_allocator<hsa_insn_queue> ("HSA queue instructions");
-  hsa_allocp_inst_srctype
-    = new object_allocator<hsa_insn_srctype> ("HSA source type instructions");
-  hsa_allocp_inst_packed
-    = new object_allocator<hsa_insn_packed> ("HSA packed instructions");
-  hsa_allocp_inst_cvt
-    = new object_allocator<hsa_insn_cvt> ("HSA convert instructions");
-  hsa_allocp_inst_alloca
-    = new object_allocator<hsa_insn_alloca> ("HSA alloca instructions");
-  hsa_allocp_bb = new object_allocator<hsa_bb> ("HSA basic blocks");
+  gcc_obstack_init (&hsa_obstack);
 }
 
 /* Deinitialize HSA subsystem and free all allocated memory.  */
@@ -565,29 +495,7 @@ hsa_deinit_data_for_cfun (void)
       omp_simple_builtins = NULL;
     }
 
-  delete hsa_allocp_operand_address;
-  delete hsa_allocp_operand_immed;
-  delete hsa_allocp_operand_reg;
-  delete hsa_allocp_operand_code_list;
-  delete hsa_allocp_operand_operand_list;
-  delete hsa_allocp_inst_basic;
-  delete hsa_allocp_inst_phi;
-  delete hsa_allocp_inst_atomic;
-  delete hsa_allocp_inst_mem;
-  delete hsa_allocp_inst_signal;
-  delete hsa_allocp_inst_seg;
-  delete hsa_allocp_inst_cmp;
-  delete hsa_allocp_inst_br;
-  delete hsa_allocp_inst_sbr;
-  delete hsa_allocp_inst_call;
-  delete hsa_allocp_inst_arg_block;
-  delete hsa_allocp_inst_comment;
-  delete hsa_allocp_inst_queue;
-  delete hsa_allocp_inst_srctype;
-  delete hsa_allocp_inst_packed;
-  delete hsa_allocp_inst_cvt;
-  delete hsa_allocp_inst_alloca;
-  delete hsa_allocp_bb;
+  obstack_free (&hsa_obstack, NULL);
   delete hsa_cfun;
 }
 
@@ -873,6 +781,49 @@ hsa_needs_cvt (BrigType16_t dtype, BrigType16_t stype)
   return false;
 }
 
+/* Return declaration name if it exists or create one from UID if it does not.
+   If DECL is a local variable, make UID part of its name.  */
+
+const char *
+hsa_get_declaration_name (tree decl)
+{
+  if (!DECL_NAME (decl))
+    {
+      char buf[64];
+      snprintf (buf, 64, "__hsa_anon_%u", DECL_UID (decl));
+      size_t len = strlen (buf);
+      char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1);
+      memcpy (copy, buf, len + 1);
+      return copy;
+    }
+
+  tree name_tree;
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      || (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)))
+    name_tree = DECL_ASSEMBLER_NAME (decl);
+  else
+    name_tree = DECL_NAME (decl);
+
+  const char *name = IDENTIFIER_POINTER (name_tree);
+  /* User-defined assembly names have prepended asterisk symbol.  */
+  if (name[0] == '*')
+    name++;
+
+  if ((TREE_CODE (decl) == VAR_DECL)
+      && decl_function_context (decl))
+    {
+      size_t len = strlen (name);
+      char *buf = (char *) alloca (len + 32);
+      snprintf (buf, len + 32, "%s_%u", name, DECL_UID (decl));
+      len = strlen (buf);
+      char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1);
+      memcpy (copy, buf, len + 1);
+      return copy;
+    }
+  else
+    return name;
+}
+
 /* Lookup or create the associated hsa_symbol structure with a given VAR_DECL
    or lookup the hsa_structure corresponding to a PARM_DECL.  */
 
@@ -884,11 +835,13 @@ get_symbol_for_decl (tree decl)
 
   gcc_assert (TREE_CODE (decl) == PARM_DECL
 	      || TREE_CODE (decl) == RESULT_DECL
-	      || VAR_P (decl));
+	      || TREE_CODE (decl) == VAR_DECL
+	      || TREE_CODE (decl) == CONST_DECL);
 
   dummy.m_decl = decl;
 
-  bool is_in_global_vars = VAR_P (decl) && is_global_var (decl);
+  bool is_in_global_vars = ((TREE_CODE (decl) == VAR_DECL)
+			    && !decl_function_context (decl));
 
   if (is_in_global_vars)
     slot = hsa_global_variable_symbols->find_slot (&dummy, INSERT);
@@ -925,11 +878,14 @@ get_symbol_for_decl (tree decl)
   else
     {
       hsa_symbol *sym;
-      gcc_assert (VAR_P (decl));
+      /* PARM_DECLs and RESULT_DECL should be already in m_local_symbols.  */
+      gcc_assert (TREE_CODE (decl) == VAR_DECL
+		  || TREE_CODE (decl) == CONST_DECL);
       BrigAlignment8_t align = hsa_object_alignment (decl);
 
       if (is_in_global_vars)
 	{
+	  gcc_checking_assert (TREE_CODE (decl) != CONST_DECL);
 	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_GLOBAL,
 				BRIG_LINKAGE_PROGRAM, true,
 				BRIG_ALLOCATION_PROGRAM, align);
@@ -951,12 +907,25 @@ get_symbol_for_decl (tree decl)
 	  if (AGGREGATE_TYPE_P (TREE_TYPE (decl)))
 	    align = MAX ((BrigAlignment8_t) BRIG_ALIGNMENT_8, align);
 
-	  /* PARM_DECL and RESULT_DECL should be already in m_local_symbols.  */
-	  gcc_assert (VAR_P (decl));
+	  BrigAllocation allocation = BRIG_ALLOCATION_AUTOMATIC;
+	  BrigSegment8_t segment;
+	  if (TREE_CODE (decl) == CONST_DECL)
+	    {
+	      segment = BRIG_SEGMENT_READONLY;
+	      allocation = BRIG_ALLOCATION_AGENT;
+	    }
+	  else if (lookup_attribute ("hsa_group_segment",
+				     DECL_ATTRIBUTES (decl)))
+	    segment = BRIG_SEGMENT_GROUP;
+	  else if (TREE_STATIC (decl)
+		   || lookup_attribute ("hsa_global_segment",
+					DECL_ATTRIBUTES (decl)))
+	    segment = BRIG_SEGMENT_GLOBAL;
+	  else
+	    segment = BRIG_SEGMENT_PRIVATE;
 
-	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_PRIVATE,
-				BRIG_LINKAGE_FUNCTION);
-	  sym->m_align = align;
+	  sym = new hsa_symbol (BRIG_TYPE_NONE, segment, BRIG_LINKAGE_FUNCTION,
+				false, allocation, align);
 	  sym->fillup_for_decl (decl);
 	  hsa_cfun->m_private_variables.safe_push (sym);
 	}
@@ -978,7 +947,7 @@ hsa_get_host_function (tree decl)
   gcc_assert (s->m_kind != HSA_NONE);
   gcc_assert (s->m_gpu_implementation_p);
 
-  return s->m_binded_function->decl;
+  return s->m_bound_function ? s->m_bound_function->decl : NULL;
 }
 
 /* Return true if function DECL has a host equivalent function.  */
@@ -989,8 +958,10 @@ get_brig_function_name (tree decl)
   tree d = decl;
 
   hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (d));
-  if (s->m_kind != HSA_NONE && s->m_gpu_implementation_p)
-    d = s->m_binded_function->decl;
+  if (s->m_kind != HSA_NONE
+      && s->m_gpu_implementation_p
+      && s->m_bound_function)
+    d = s->m_bound_function->decl;
 
   /* IPA split can create a function that has no host equivalent.  */
   if (d == NULL)
@@ -1066,6 +1037,14 @@ hsa_op_with_type::get_in_type (BrigType16_t dtype, hsa_bb *hbb)
       dest = new hsa_op_reg (dtype);
       hbb->append_insn (new hsa_insn_cvt (dest, this));
     }
+  else if (is_a <hsa_op_reg *> (this))
+    {
+      /* In the end, HSA registers do not really have types, only sizes, so if
+	 the sizes match, we can use the register directly.  */
+      gcc_checking_assert (hsa_type_bit_size (dtype)
+			   == hsa_type_bit_size (m_type));
+      return this;
+    }
   else
     {
       dest = new hsa_op_reg (m_type);
@@ -1128,12 +1107,12 @@ hsa_op_immed::hsa_op_immed ()
 {
 }
 
-/* New operator to allocate immediate operands from pool alloc.  */
+/* New operator to allocate immediate operands from obstack.  */
 
 void *
-hsa_op_immed::operator new (size_t)
+hsa_op_immed::operator new (size_t size)
 {
-  return hsa_allocp_operand_immed->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Destructor.  */
@@ -1160,12 +1139,12 @@ hsa_op_reg::hsa_op_reg (BrigType16_t t)
 {
 }
 
-/* New operator to allocate a register from pool alloc.  */
+/* New operator to allocate a register from obstack.  */
 
 void *
-hsa_op_reg::operator new (size_t)
+hsa_op_reg::operator new (size_t size)
 {
-  return hsa_allocp_operand_reg->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Verify register operand.  */
@@ -1244,12 +1223,12 @@ hsa_op_address::hsa_op_address (hsa_op_reg *r, HOST_WIDE_INT offset)
 {
 }
 
-/* New operator to allocate address operands from pool alloc.  */
+/* New operator to allocate address operands from obstack.  */
 
 void *
-hsa_op_address::operator new (size_t)
+hsa_op_address::operator new (size_t size)
 {
-  return hsa_allocp_operand_address->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Constructor of an operand referring to HSAIL code.  */
@@ -1269,12 +1248,12 @@ hsa_op_code_list::hsa_op_code_list (unsigned elements)
   m_offsets.safe_grow_cleared (elements);
 }
 
-/* New operator to allocate code list operands from pool alloc.  */
+/* New operator to allocate code list operands from obstack.  */
 
 void *
-hsa_op_code_list::operator new (size_t)
+hsa_op_code_list::operator new (size_t size)
 {
-  return hsa_allocp_operand_code_list->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Constructor of an operand representing an operand list.
@@ -1287,12 +1266,12 @@ hsa_op_operand_list::hsa_op_operand_list (unsigned elements)
   m_offsets.safe_grow (elements);
 }
 
-/* New operator to allocate operand list operands from pool alloc.  */
+/* New operator to allocate operand list operands from obstack.  */
 
 void *
-hsa_op_operand_list::operator new (size_t)
+hsa_op_operand_list::operator new (size_t size)
 {
-  return hsa_allocp_operand_operand_list->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 hsa_op_operand_list::~hsa_op_operand_list ()
@@ -1437,12 +1416,12 @@ hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc, BrigType16_t t,
   hsa_instructions.safe_push (this);
 }
 
-/* New operator to allocate basic instruction from pool alloc.  */
+/* New operator to allocate basic instruction from obstack.  */
 
 void *
-hsa_insn_basic::operator new (size_t)
+hsa_insn_basic::operator new (size_t size)
 {
-  return hsa_allocp_inst_basic->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Verify the instruction.  */
@@ -1495,32 +1474,27 @@ hsa_insn_phi::hsa_insn_phi (unsigned nops, hsa_op_reg *dst)
   dst->set_definition (this);
 }
 
-/* New operator to allocate PHI instruction from pool alloc.  */
+/* Constructor of class representing instructions for control flow and
+   sychronization,   */
 
-void *
-hsa_insn_phi::operator new (size_t)
+hsa_insn_br::hsa_insn_br (unsigned nops, int opc, BrigType16_t t,
+			  BrigWidth8_t width, hsa_op_base *arg0,
+			  hsa_op_base *arg1, hsa_op_base *arg2,
+			  hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
+    m_width (width)
 {
-  return hsa_allocp_inst_phi->allocate_raw ();
 }
 
 /* Constructor of class representing instruction for conditional jump, CTRL is
    the control register determining whether the jump will be carried out, the
    new instruction is automatically added to its uses list.  */
 
-hsa_insn_br::hsa_insn_br (hsa_op_reg *ctrl)
-  : hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
-    m_width (BRIG_WIDTH_1)
+hsa_insn_cbr::hsa_insn_cbr (hsa_op_reg *ctrl)
+  : hsa_insn_br (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, BRIG_WIDTH_1, ctrl)
 {
 }
 
-/* New operator to allocate branch instruction from pool alloc.  */
-
-void *
-hsa_insn_br::operator new (size_t)
-{
-  return hsa_allocp_inst_br->allocate_raw ();
-}
-
 /* Constructor of class representing instruction for switch jump, CTRL is
    the index register.  */
 
@@ -1531,14 +1505,6 @@ hsa_insn_sbr::hsa_insn_sbr (hsa_op_reg *index, unsigned jump_count)
 {
 }
 
-/* New operator to allocate switch branch instruction from pool alloc.  */
-
-void *
-hsa_insn_sbr::operator new (size_t)
-{
-  return hsa_allocp_inst_sbr->allocate_raw ();
-}
-
 /* Replace all occurrences of OLD_BB with NEW_BB in the statements
    jump table.  */
 
@@ -1565,14 +1531,6 @@ hsa_insn_cmp::hsa_insn_cmp (BrigCompareOperation8_t cmp, BrigType16_t t,
 {
 }
 
-/* New operator to allocate compare instruction from pool alloc.  */
-
-void *
-hsa_insn_cmp::operator new (size_t)
-{
-  return hsa_allocp_inst_cmp->allocate_raw ();
-}
-
 /* Constructor of classes representing memory accesses.  OPC is the opcode (must
    be BRIG_OPCODE_ST or BRIG_OPCODE_LD) and T is the type.  The instruction
    operands are provided as ARG0 and ARG1.  */
@@ -1598,18 +1556,9 @@ hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, BrigType16_t t,
 {
 }
 
-/* New operator to allocate memory instruction from pool alloc.  */
-
-void *
-hsa_insn_mem::operator new (size_t)
-{
-  return hsa_allocp_inst_mem->allocate_raw ();
-}
-
-/* Constructor of class representing atomic instructions and signals.  OPC is
-   the principal opcode, aop is the specific atomic operation opcode.  T is the
-   type of the instruction.  The instruction operands
-   are provided as ARG[0-3].  */
+/* Constructor of class representing atomic instructions.  OPC is the principal
+   opcode, AOP is the specific atomic operation opcode.  T is the type of the
+   instruction.  The instruction operands are provided as ARG[0-3].  */
 
 hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
 				  enum BrigAtomicOperation aop,
@@ -1627,34 +1576,18 @@ hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
 		       opc == BRIG_OPCODE_SIGNALNORET);
 }
 
-/* New operator to allocate signal instruction from pool alloc.  */
-
-void *
-hsa_insn_atomic::operator new (size_t)
-{
-  return hsa_allocp_inst_atomic->allocate_raw ();
-}
-
 /* Constructor of class representing signal instructions.  OPC is the prinicpal
-   opcode, sop is the specific signal operation opcode.  T is the type of the
+   opcode, SOP is the specific signal operation opcode.  T is the type of the
    instruction.  The instruction operands are provided as ARG[0-3].  */
 
 hsa_insn_signal::hsa_insn_signal (int nops, int opc,
 				  enum BrigAtomicOperation sop,
-				  BrigType16_t t, hsa_op_base *arg0,
-				  hsa_op_base *arg1, hsa_op_base *arg2,
-				  hsa_op_base *arg3)
-  : hsa_insn_atomic (nops, opc, sop, t, BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE,
-		     arg0, arg1, arg2, arg3)
-{
-}
-
-/* New operator to allocate signal instruction from pool alloc.  */
-
-void *
-hsa_insn_signal::operator new (size_t)
+				  BrigType16_t t, BrigMemoryOrder memorder,
+				  hsa_op_base *arg0, hsa_op_base *arg1,
+				  hsa_op_base *arg2, hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
+    m_memory_order (memorder), m_signalop (sop)
 {
-  return hsa_allocp_inst_signal->allocate_raw ();
 }
 
 /* Constructor of class representing segment conversion instructions.  OPC is
@@ -1672,14 +1605,6 @@ hsa_insn_seg::hsa_insn_seg (int opc, BrigType16_t dest, BrigType16_t srct,
   gcc_checking_assert (opc == BRIG_OPCODE_STOF || opc == BRIG_OPCODE_FTOS);
 }
 
-/* New operator to allocate address conversion instruction from pool alloc.  */
-
-void *
-hsa_insn_seg::operator new (size_t)
-{
-  return hsa_allocp_inst_seg->allocate_raw ();
-}
-
 /* Constructor of class representing a call instruction.  CALLEE is the tree
    representation of the function being called.  */
 
@@ -1696,14 +1621,6 @@ hsa_insn_call::hsa_insn_call (hsa_internal_fn *fn)
 {
 }
 
-/* New operator to allocate call instruction from pool alloc.  */
-
-void *
-hsa_insn_call::operator new (size_t)
-{
-  return hsa_allocp_inst_call->allocate_raw ();
-}
-
 hsa_insn_call::~hsa_insn_call ()
 {
   for (unsigned i = 0; i < m_input_args.length (); i++)
@@ -1724,14 +1641,6 @@ hsa_insn_arg_block::hsa_insn_arg_block (BrigKind brig_kind,
 {
 }
 
-/* New operator to allocate argument block instruction from pool alloc.  */
-
-void *
-hsa_insn_arg_block::operator new (size_t)
-{
-  return hsa_allocp_inst_arg_block->allocate_raw ();
-}
-
 hsa_insn_comment::hsa_insn_comment (const char *s)
   : hsa_insn_basic (0, BRIG_KIND_DIRECTIVE_COMMENT)
 {
@@ -1743,14 +1652,6 @@ hsa_insn_comment::hsa_insn_comment (const char *s)
   m_comment = buf;
 }
 
-/* New operator to allocate comment instruction from pool alloc.  */
-
-void *
-hsa_insn_comment::operator new (size_t)
-{
-  return hsa_allocp_inst_comment->allocate_raw ();
-}
-
 hsa_insn_comment::~hsa_insn_comment ()
 {
   gcc_checking_assert (m_comment);
@@ -1759,17 +1660,14 @@ hsa_insn_comment::~hsa_insn_comment ()
 }
 
 /* Constructor of class representing the queue instruction in HSAIL.  */
-hsa_insn_queue::hsa_insn_queue (int nops, BrigOpcode opcode)
-  : hsa_insn_basic (nops, opcode, BRIG_TYPE_U64)
-{
-}
 
-/* New operator to allocate source type instruction from pool alloc.  */
-
-void *
-hsa_insn_srctype::operator new (size_t)
+hsa_insn_queue::hsa_insn_queue (int nops, int opcode, BrigSegment segment,
+				BrigMemoryOrder memory_order,
+				hsa_op_base *arg0, hsa_op_base *arg1,
+				hsa_op_base *arg2, hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opcode, BRIG_TYPE_U64, arg0, arg1, arg2, arg3),
+    m_segment (segment), m_memory_order (memory_order)
 {
-  return hsa_allocp_inst_srctype->allocate_raw ();
 }
 
 /* Constructor of class representing the source type instruction in HSAIL.  */
@@ -1782,14 +1680,6 @@ hsa_insn_srctype::hsa_insn_srctype (int nops, BrigOpcode opcode,
     m_source_type (srct)
 {}
 
-/* New operator to allocate packed instruction from pool alloc.  */
-
-void *
-hsa_insn_packed::operator new (size_t)
-{
-  return hsa_allocp_inst_packed->allocate_raw ();
-}
-
 /* Constructor of class representing the packed instruction in HSAIL.  */
 
 hsa_insn_packed::hsa_insn_packed (int nops, BrigOpcode opcode,
@@ -1801,14 +1691,6 @@ hsa_insn_packed::hsa_insn_packed (int nops, BrigOpcode opcode,
   m_operand_list = new hsa_op_operand_list (nops - 1);
 }
 
-/* New operator to allocate convert instruction from pool alloc.  */
-
-void *
-hsa_insn_cvt::operator new (size_t)
-{
-  return hsa_allocp_inst_cvt->allocate_raw ();
-}
-
 /* Constructor of class representing the convert instruction in HSAIL.  */
 
 hsa_insn_cvt::hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src)
@@ -1816,14 +1698,6 @@ hsa_insn_cvt::hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src)
 {
 }
 
-/* New operator to allocate alloca from pool alloc.  */
-
-void *
-hsa_insn_alloca::operator new (size_t)
-{
-  return hsa_allocp_inst_alloca->allocate_raw ();
-}
-
 /* Constructor of class representing the alloca in HSAIL.  */
 
 hsa_insn_alloca::hsa_insn_alloca (hsa_op_with_type *dest,
@@ -1854,6 +1728,20 @@ hsa_bb::append_insn (hsa_insn_basic *insn)
     m_first_insn = insn;
 }
 
+void
+hsa_bb::append_phi (hsa_insn_phi *hphi)
+{
+  hphi->m_bb = m_bb;
+
+  hphi->m_prev = m_last_phi;
+  hphi->m_next = NULL;
+  if (m_last_phi)
+    m_last_phi->m_next = hphi;
+  m_last_phi = hphi;
+  if (!m_first_phi)
+    m_first_phi = hphi;
+}
+
 /* Insert HSA instruction NEW_INSN immediately before an existing instruction
    OLD_INSN.  */
 
@@ -2078,6 +1966,7 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOST_WIDE_INT *output_bitsize = NULL,
     case PARM_DECL:
     case VAR_DECL:
     case RESULT_DECL:
+    case CONST_DECL:
       gcc_assert (!symbol);
       symbol = get_symbol_for_decl (ref);
       addrtype = hsa_get_segment_addr_type (symbol->m_segment);
@@ -2295,6 +2184,34 @@ gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb)
     val = TREE_OPERAND (val, 0);
   addr = gen_hsa_addr (val, hbb);
 
+  if (TREE_CODE (val) == CONST_DECL
+      && is_gimple_reg_type (TREE_TYPE (val)))
+    {
+      gcc_assert (addr->m_symbol
+		  && addr->m_symbol->m_segment == BRIG_SEGMENT_READONLY);
+      /* CONST_DECLs are in readonly segment which however does not have
+	 addresses convertible to flat segments.  So copy it to a private one
+	 and take address of that.  */
+      BrigType16_t csttype
+	= mem_type_for_type (hsa_type_for_scalar_tree_type (TREE_TYPE (val),
+							    false));
+      hsa_op_reg *r = new hsa_op_reg (csttype);
+      hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_LD, csttype, r,
+					  new hsa_op_address (addr->m_symbol)));
+      hsa_symbol *copysym = hsa_cfun->create_hsa_temporary (csttype);
+      hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_ST, csttype, r,
+					  new hsa_op_address (copysym)));
+      addr->m_symbol = copysym;
+    }
+  else if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_READONLY)
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (val), "support for HSA does "
+		     "not implement taking addresses of complex "
+		     "CONST_DECLs such as %E", val);
+      return;
+    }
+
+
   convert_addr_to_flat_segment (addr, dest, hbb);
 }
 
@@ -2324,8 +2241,10 @@ hsa_reg_or_immed_for_gimple_op (tree op, hsa_bb *hbb)
 void
 hsa_build_append_simple_mov (hsa_op_reg *dest, hsa_op_base *src, hsa_bb *hbb)
 {
-  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type,
-					     dest, src);
+  /* Moves of packed data between registers need to adhere to the same type
+     rules like when dealing with memory.  */
+  BrigType16_t tp = mem_type_for_type (dest->m_type);
+  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, tp, dest, src);
   if (hsa_op_reg *sreg = dyn_cast <hsa_op_reg *> (src))
     gcc_assert (hsa_type_bit_size (dest->m_type)
 		== hsa_type_bit_size (sreg->m_type));
@@ -3054,8 +2973,12 @@ gen_hsa_unary_operation (BrigOpcode opcode, hsa_op_reg *dest,
   if (opcode == BRIG_OPCODE_MOV && hsa_needs_cvt (dest->m_type, op1->m_type))
     insn = new hsa_insn_cvt (dest, op1);
   else if (opcode == BRIG_OPCODE_FIRSTBIT || opcode == BRIG_OPCODE_LASTBIT)
-    insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, op1->m_type, NULL,
-				 op1);
+    {
+      BrigType16_t srctype = hsa_type_integer_p (op1->m_type) ? op1->m_type
+	: hsa_unsigned_type_for_type (op1->m_type);
+      insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, srctype, NULL,
+				   op1);
+    }
   else
     {
       insn = new hsa_insn_basic (2, opcode, dest->m_type, dest, op1);
@@ -3169,6 +3092,23 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
     case NEGATE_EXPR:
       opcode = BRIG_OPCODE_NEG;
       break;
+    case FMA_EXPR:
+      /* There is a native HSA instruction for scalar FMAs but not for vector
+	 ones.  */
+      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
+	{
+	  hsa_op_reg *dest
+	    = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
+	  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+	  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+	  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+	  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
+	  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
+	  return;
+	}
+      opcode = BRIG_OPCODE_MAD;
+      break;
     case MIN_EXPR:
       opcode = BRIG_OPCODE_MIN;
       break;
@@ -3368,14 +3308,18 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
   switch (rhs_class)
     {
     case GIMPLE_TERNARY_RHS:
-      gcc_unreachable ();
+      {
+	hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+	hsa_insn_basic *insn = new hsa_insn_basic (4, opcode, dest->m_type, dest,
+						   op1, op2, op3);
+	hbb->append_insn (insn);
+      }
       return;
 
-      /* Fall through */
     case GIMPLE_BINARY_RHS:
       gen_hsa_binary_operation (opcode, dest, op1, op2, hbb);
       break;
-      /* Fall through */
+
     case GIMPLE_UNARY_RHS:
       gen_hsa_unary_operation (opcode, dest, op1, hbb);
       break;
@@ -3392,14 +3336,14 @@ static void
 gen_hsa_insns_for_cond_stmt (gimple *cond, hsa_bb *hbb)
 {
   hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1);
-  hsa_insn_br *cbr;
+  hsa_insn_cbr *cbr;
 
   gen_hsa_cmp_insn_from_gimple (gimple_cond_code (cond),
 				gimple_cond_lhs (cond),
 				gimple_cond_rhs (cond),
 				ctrl, hbb);
 
-  cbr = new hsa_insn_br (ctrl);
+  cbr = new hsa_insn_cbr (ctrl);
   hbb->append_insn (cbr);
 }
 
@@ -3476,7 +3420,7 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
   hbb->append_insn (new hsa_insn_basic (3, BRIG_OPCODE_AND, cmp_reg->m_type,
 					cmp_reg, cmp1_reg, cmp2_reg));
 
-  hbb->append_insn (new hsa_insn_br (cmp_reg));
+  hbb->append_insn (new hsa_insn_cbr (cmp_reg));
 
   tree default_label = gimple_switch_default_label (s);
   basic_block default_label_bb = label_to_block_fn (func,
@@ -3537,13 +3481,14 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
 static void
 verify_function_arguments (tree decl)
 {
+  tree type = TREE_TYPE (decl);
   if (DECL_STATIC_CHAIN (decl))
     {
       HSA_SORRY_ATV (EXPR_LOCATION (decl),
 		     "HSA does not support nested functions: %D", decl);
       return;
     }
-  else if (!TYPE_ARG_TYPES (TREE_TYPE (decl)))
+  else if (!TYPE_ARG_TYPES (type) || stdarg_p (type))
     {
       HSA_SORRY_ATV (EXPR_LOCATION (decl),
 		     "HSA does not support functions with variadic arguments "
@@ -3839,33 +3784,58 @@ hsa_insn_basic::set_output_in_type (hsa_op_reg *dest, unsigned op_index,
    HBB.  */
 
 static void
-query_hsa_grid (hsa_op_reg *dest, BrigType16_t opcode, int dimension,
-		hsa_bb *hbb)
+query_hsa_grid_dim (hsa_op_reg *dest, int opcode, hsa_op_immed *dimension,
+		    hsa_bb *hbb)
 {
-  /* We're using just one-dimensional kernels, so hard-coded
-     dimension X.  */
-  hsa_op_immed *imm
-    = new hsa_op_immed (dimension, (BrigKind16_t) BRIG_TYPE_U32);
   hsa_insn_basic *insn = new hsa_insn_basic (2, opcode, BRIG_TYPE_U32, NULL,
-					     imm);
+					     dimension);
   hbb->append_insn (insn);
   insn->set_output_in_type (dest, 0, hbb);
 }
 
-/* Generate a special HSA-related instruction for gimple STMT.
-   Instructions are appended to basic block HBB.  */
+/* Generate instruction OPCODE to query a property of HSA grid along the given
+   dimension which is an immediate in first argument of STMT.  Store result
+   into the register corresponding to LHS of STMT and append the instruction to
+   HBB.  */
 
 static void
-query_hsa_grid (gimple *stmt, BrigOpcode16_t opcode, int dimension,
-		hsa_bb *hbb)
+query_hsa_grid_dim (gimple *stmt, int opcode, hsa_bb *hbb)
 {
   tree lhs = gimple_call_lhs (dyn_cast <gcall *> (stmt));
   if (lhs == NULL_TREE)
     return;
 
+  tree arg = gimple_call_arg (stmt, 0);
+  unsigned HOST_WIDE_INT dim = 5;
+  if (tree_fits_uhwi_p (arg))
+    dim = tree_to_uhwi (arg);
+  if (dim > 2)
+    {
+      HSA_SORRY_AT (gimple_location (stmt),
+		    "HSA grid query dimension must be immediate constant 0, 1 "
+		    "or 2");
+      return;
+    }
+
+  hsa_op_immed *hdim = new hsa_op_immed (dim, (BrigKind16_t) BRIG_TYPE_U32);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  query_hsa_grid_dim (dest, opcode, hdim, hbb);
+}
+
+/* Generate instruction OPCODE to query a property of HSA grid that is
+   independent of any dimension.  Store result into the register corresponding
+   to LHS of STMT and append the instruction to HBB.  */
 
-  query_hsa_grid (dest, opcode, dimension, hbb);
+static void
+query_hsa_grid_nodim (gimple *stmt, BrigOpcode16_t opcode, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (dyn_cast <gcall *> (stmt));
+  if (lhs == NULL_TREE)
+    return;
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  BrigType16_t brig_type = hsa_unsigned_type_for_type (dest->m_type);
+  hsa_insn_basic *insn = new hsa_insn_basic (1, opcode, brig_type, dest);
+  hbb->append_insn (insn);
 }
 
 /* Emit instructions that set hsa_num_threads according to provided VALUE.
@@ -4012,6 +3982,44 @@ gen_num_threads_for_dispatch (hsa_bb *hbb)
   return as_a <hsa_op_reg *> (dest);
 }
 
+/* Build OPCODE query for all three hsa dimensions, multiply them and store the
+   result into DEST.  */
+
+static void
+multiply_grid_dim_characteristics (hsa_op_reg *dest, int opcode, hsa_bb *hbb)
+{
+  hsa_op_reg *dimx = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimx, opcode,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *dimy = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimy, opcode,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *dimz = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimz, opcode,
+		      new hsa_op_immed (2, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp,
+			    dimx->get_in_type (dest->m_type, hbb),
+			    dimy->get_in_type (dest->m_type, hbb), hbb);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, dest, tmp,
+			    dimz->get_in_type (dest->m_type, hbb), hbb);
+}
+
+/* Emit instructions that assign number of threads to lhs of gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_get_num_threads (gimple *stmt, hsa_bb *hbb)
+{
+  if (gimple_call_lhs (stmt) == NULL_TREE)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_num_threads"));
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  multiply_grid_dim_characteristics (dest, BRIG_OPCODE_CURRENTWORKGROUPSIZE,
+				     hbb);
+}
 
 /* Emit instructions that assign number of teams to lhs of gimple STMT.
    Instructions are appended to basic block HBB.  */
@@ -4023,15 +4031,9 @@ gen_get_num_teams (gimple *stmt, hsa_bb *hbb)
     return;
 
   hbb->append_insn (new hsa_insn_comment ("omp_get_num_teams"));
-
   tree lhs = gimple_call_lhs (stmt);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-  hsa_op_immed *one = new hsa_op_immed (1, dest->m_type);
-
-  hsa_insn_basic *basic
-    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, one);
-
-  hbb->append_insn (basic);
+  multiply_grid_dim_characteristics (dest, BRIG_OPCODE_GRIDGROUPS, hbb);
 }
 
 /* Emit instructions that assign a team number to lhs of gimple STMT.
@@ -4044,15 +4046,42 @@ gen_get_team_num (gimple *stmt, hsa_bb *hbb)
     return;
 
   hbb->append_insn (new hsa_insn_comment ("omp_get_team_num"));
-
   tree lhs = gimple_call_lhs (stmt);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-  hsa_op_immed *zero = new hsa_op_immed (0, dest->m_type);
 
-  hsa_insn_basic *basic
-    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, zero);
-
-  hbb->append_insn (basic);
+  hsa_op_reg *gnum_x = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gnum_x, BRIG_OPCODE_GRIDGROUPS,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *gnum_y = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gnum_y, BRIG_OPCODE_GRIDGROUPS,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+
+  hsa_op_reg *gno_z = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_z, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (2, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+
+  hsa_op_reg *tmp1 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp1,
+			    gnum_x->get_in_type (dest->m_type, hbb),
+			    gnum_y->get_in_type (dest->m_type, hbb), hbb);
+  hsa_op_reg *tmp2 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp2, tmp1,
+			    gno_z->get_in_type (dest->m_type, hbb), hbb);
+
+  hsa_op_reg *gno_y = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_y, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *tmp3 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp3,
+			    gnum_x->get_in_type (dest->m_type, hbb),
+			    gno_y->get_in_type (dest->m_type, hbb), hbb);
+  hsa_op_reg *tmp4 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_ADD, tmp4, tmp3, tmp2, hbb);
+  hsa_op_reg *gno_x = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_x, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp4,
+			    gno_x->get_in_type (dest->m_type, hbb), hbb);
 }
 
 /* Emit instructions that get levels-var ICV to lhs of gimple STMT.
@@ -4263,12 +4292,13 @@ gen_hsa_popcount_to_dest (hsa_op_reg *dest, hsa_op_with_type *arg, hsa_bb *hbb)
   if (hsa_type_bit_size (arg->m_type) < 32)
     arg = arg->get_in_type (BRIG_TYPE_B32, hbb);
 
+  BrigType16_t srctype = hsa_bittype_for_type (arg->m_type);
   if (!hsa_btype_p (arg->m_type))
-    arg = arg->get_in_type (hsa_bittype_for_type (arg->m_type), hbb);
+    arg = arg->get_in_type (srctype, hbb);
 
   hsa_insn_srctype *popcount
     = new hsa_insn_srctype (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32,
-			    arg->m_type, NULL, arg);
+			    srctype, NULL, arg);
   hbb->append_insn (popcount);
   popcount->set_output_in_type (dest, 0, hbb);
 }
@@ -4339,11 +4369,11 @@ omp_simple_builtin::generate (gimple *stmt, hsa_bb *hbb)
   if (m_sorry)
     {
       if (m_warning_message)
-	HSA_SORRY_AT (gimple_location (stmt), m_warning_message)
+	HSA_SORRY_AT (gimple_location (stmt), m_warning_message);
       else
 	HSA_SORRY_ATV (gimple_location (stmt),
 		       "Support for HSA does not implement calls to %s\n",
-		       m_name)
+		       m_name);
     }
   else if (m_warning_message != NULL)
     warning_at (gimple_location (stmt), OPT_Whsa, m_warning_message);
@@ -4398,12 +4428,12 @@ gen_hsa_insns_for_known_library_call (gimple *stmt, hsa_bb *hbb)
       else if (strcmp (name, "omp_get_thread_num") == 0)
 	{
 	  hbb->append_insn (new hsa_insn_comment (name));
-	  query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	  query_hsa_grid_nodim (stmt, BRIG_OPCODE_WORKITEMFLATABSID, hbb);
 	}
       else if (strcmp (name, "omp_get_num_threads") == 0)
 	{
 	  hbb->append_insn (new hsa_insn_comment (name));
-	  query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	  gen_get_num_threads (stmt, hbb);
 	}
       else if (strcmp (name, "omp_get_num_teams") == 0)
 	gen_get_num_teams (stmt, hbb);
@@ -4589,7 +4619,7 @@ expand_string_operation_builtin (gimple *stmt, hsa_bb *hbb,
 {
   edge e = split_block (hbb->m_bb, stmt);
   basic_block condition_bb = e->src;
-  hbb->append_insn (new hsa_insn_br (misaligned_flag));
+  hbb->append_insn (new hsa_insn_cbr (misaligned_flag));
 
   /* Prepare the control flow.  */
   edge condition_edge = EDGE_SUCC (condition_bb, 0);
@@ -4718,95 +4748,86 @@ expand_memory_set (gimple *stmt, unsigned HOST_WIDE_INT n,
   expand_lhs_of_string_op (stmt, n, merge_bb, builtin);
 }
 
-/* Return string for MEMMODEL.  */
+/* Store into MEMORDER the memory order specified by tree T, which must be an
+   integer constant representing a C++ memory order.  If it isn't, issue an HSA
+   sorry message using LOC and return true, otherwise return false and store
+   the name of the requested order to *MNAME.  */
 
-static const char *
-get_memory_order_name (unsigned memmodel)
+static bool
+hsa_memorder_from_tree (tree t, BrigMemoryOrder *memorder, const char **mname,
+			location_t loc)
 {
-  switch (memmodel & MEMMODEL_BASE_MASK)
+  if (!tree_fits_uhwi_p (t))
     {
-    case MEMMODEL_RELAXED:
-      return "relaxed";
-    case MEMMODEL_CONSUME:
-      return "consume";
-    case MEMMODEL_ACQUIRE:
-      return "acquire";
-    case MEMMODEL_RELEASE:
-      return "release";
-    case MEMMODEL_ACQ_REL:
-      return "acq_rel";
-    case MEMMODEL_SEQ_CST:
-      return "seq_cst";
-    default:
-      return NULL;
+      HSA_SORRY_ATV (loc, "support for HSA does not implement memory model %E",
+		     t);
+      return true;
     }
-}
-
-/* Return memory order according to predefined __atomic memory model
-   constants.  LOCATION is provided to locate the problematic statement.  */
 
-static BrigMemoryOrder
-get_memory_order (unsigned memmodel, location_t location)
-{
-  switch (memmodel & MEMMODEL_BASE_MASK)
+  unsigned HOST_WIDE_INT mm = tree_to_uhwi (t);
+  switch (mm & MEMMODEL_BASE_MASK)
     {
     case MEMMODEL_RELAXED:
-      return BRIG_MEMORY_ORDER_RELAXED;
+      *memorder = BRIG_MEMORY_ORDER_RELAXED;
+      *mname = "relaxed";
+      break;
     case MEMMODEL_CONSUME:
       /* HSA does not have an equivalent, but we can use the slightly stronger
 	 ACQUIRE.  */
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *mname = "consume";
+      break;
     case MEMMODEL_ACQUIRE:
-      return BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *mname = "acquire";
+      break;
     case MEMMODEL_RELEASE:
-      return BRIG_MEMORY_ORDER_SC_RELEASE;
+      *memorder = BRIG_MEMORY_ORDER_SC_RELEASE;
+      *mname = "release";
+      break;
     case MEMMODEL_ACQ_REL:
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *mname = "acq_rel";
+      break;
     case MEMMODEL_SEQ_CST:
       /* Callers implementing a simple load or store need to remove the release
 	 or acquire part respectively.  */
-      return BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *mname = "seq_cst";
+      break;
     default:
       {
-	const char *mmname = get_memory_order_name (memmodel);
-	HSA_SORRY_ATV (location,
-		       "support for HSA does not implement the specified "
-		       " memory model%s %s",
-		       mmname ? ": " : "", mmname ? mmname : "");
-	return BRIG_MEMORY_ORDER_NONE;
+	HSA_SORRY_AT (loc, "support for HSA does not implement the specified "
+		      "memory model");
+	return true;
       }
     }
+  return false;
 }
 
-/* Helper function to create an HSA atomic binary operation instruction out of
-   calls to atomic builtins.  RET_ORIG is true if the built-in is the variant
-   that return s the value before applying operation, and false if it should
-   return the value after applying the operation (if it returns value at all).
-   ACODE is the atomic operation code, STMT is a gimple call to a builtin.  HBB
-   is the HSA BB to which the instruction should be added.  */
+/* Helper function to create an HSA atomic operation instruction out of calls
+   to atomic builtins.  RET_ORIG is true if the built-in is the variant that
+   return s the value before applying operation, and false if it should return
+   the value after applying the operation (if it returns value at all).  ACODE
+   is the atomic operation code, STMT is a gimple call to a builtin.  HBB is
+   the HSA BB to which the instruction should be added.  If SIGNAL is true, the
+   created operation will work on HSA signals rather than atomic variables.  */
 
 static void
-gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
- 				    enum BrigAtomicOperation acode,
-				    gimple *stmt,
-				    hsa_bb *hbb)
+gen_hsa_atomic_for_builtin (bool ret_orig, enum BrigAtomicOperation acode,
+			    gimple *stmt, hsa_bb *hbb, bool signal)
 {
   tree lhs = gimple_call_lhs (stmt);
 
   tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
   BrigType16_t hsa_type = hsa_type_for_scalar_tree_type (type, false);
   BrigType16_t mtype = mem_type_for_type (hsa_type);
-  tree model = gimple_call_arg (stmt, 2);
+  BrigMemoryOrder memorder;
+  const char *mmname;
 
-  if (!tree_fits_uhwi_p (model))
-    {
-      HSA_SORRY_ATV (gimple_location (stmt),
-		     "support for HSA does not implement memory model %E",
-		     model);
-      return;
-    }
-
-  unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
-
-  BrigMemoryOrder memorder = get_memory_order (mmodel, gimple_location (stmt));
+  if (hsa_memorder_from_tree (gimple_call_arg (stmt, 2), &memorder, &mmname,
+			      gimple_location (stmt)))
+    return;
 
   /* Certain atomic insns must have Bx memory types.  */
   switch (acode)
@@ -4831,13 +4852,13 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 	dest = hsa_cfun->reg_for_gimple_ssa (lhs);
       else
 	dest = new hsa_op_reg (hsa_type);
-      opcode = BRIG_OPCODE_ATOMIC;
+      opcode = signal ? BRIG_OPCODE_SIGNAL : BRIG_OPCODE_ATOMIC;
       nops = 3;
     }
   else
     {
       dest = NULL;
-      opcode = BRIG_OPCODE_ATOMICNORET;
+      opcode = signal ? BRIG_OPCODE_SIGNALNORET : BRIG_OPCODE_ATOMICNORET;
       nops = 2;
     }
 
@@ -4852,35 +4873,44 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 	{
 	  HSA_SORRY_ATV (gimple_location (stmt),
 			 "support for HSA does not implement memory model for "
-			 "ATOMIC_ST: %s", get_memory_order_name (mmodel));
+			 "ATOMIC_ST: %s", mmname);
 	  return;
 	}
     }
 
-  hsa_insn_atomic *atominsn = new hsa_insn_atomic (nops, opcode, acode, mtype,
-						   memorder);
-
-  hsa_op_address *addr;
-  addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
-  if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE)
+  hsa_insn_basic *atominsn;
+  hsa_op_base *tgt;
+  if (signal)
     {
-      HSA_SORRY_AT (gimple_location (stmt),
-		    "HSA does not implement atomic operations in private "
-		    "segment");
-      return;
+      atominsn = new hsa_insn_signal (nops, opcode, acode, mtype, memorder);
+      tgt = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0), hbb);
     }
+  else
+    {
+      atominsn = new hsa_insn_atomic (nops, opcode, acode, mtype, memorder);
+      hsa_op_address *addr;
+      addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+      if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE)
+	{
+	  HSA_SORRY_AT (gimple_location (stmt),
+			"HSA does not implement atomic operations in private "
+			"segment");
+	  return;
+	}
+      tgt = addr;
+    }
+
   hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1),
 						    hbb);
-
   if (lhs)
     {
       atominsn->set_op (0, dest);
-      atominsn->set_op (1, addr);
+      atominsn->set_op (1, tgt);
       atominsn->set_op (2, op);
     }
   else
     {
-      atominsn->set_op (0, addr);
+      atominsn->set_op (0, tgt);
       atominsn->set_op (1, op);
     }
 
@@ -4950,6 +4980,10 @@ gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb)
       gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
       break;
 
+    case IFN_RSQRT:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NRSQRT, stmt, hbb);
+      break;
+
     case IFN_TRUNC:
       gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb);
       break;
@@ -5068,6 +5102,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (DECL_BUILT_IN_CLASS (function_decl)
+	  && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
+	return;
+
       if (function_decl == NULL_TREE)
 	{
 	  HSA_SORRY_AT (gimple_location (stmt),
@@ -5185,21 +5225,14 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_LOAD_16:
       {
 	BrigType16_t mtype;
-	hsa_op_address *addr;
-	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
-	tree model = gimple_call_arg (stmt, 1);
-	if (!tree_fits_uhwi_p (model))
-	  {
-	    HSA_SORRY_ATV (gimple_location (stmt),
-			   "support for HSA does not implement "
-			   "memory model: %E",
-			   model);
-	    return;
-	  }
+	hsa_op_base *src;
+	src = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
 
-	unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
-	BrigMemoryOrder memorder = get_memory_order (mmodel,
-						     gimple_location (stmt));
+	BrigMemoryOrder memorder;
+	const char *mmname;
+	if (hsa_memorder_from_tree (gimple_call_arg (stmt, 1), &memorder,
+				    &mmname, gimple_location (stmt)))
+	  return;
 
 	if (memorder == BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE)
 	  memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
@@ -5210,8 +5243,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  {
 	    HSA_SORRY_ATV (gimple_location (stmt),
 			   "support for HSA does not implement "
-			   "memory model for ATOMIC_LD: %s",
-			   get_memory_order_name (mmodel));
+			   "memory model for atomic loads: %s", mmname);
 	    return;
 	  }
 
@@ -5229,9 +5261,9 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	    dest = new hsa_op_reg (mtype);
 	  }
 
-	hsa_insn_atomic *atominsn
-	  = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_LD, mtype,
-				 memorder, dest, addr);
+	hsa_insn_basic *atominsn;
+	atominsn = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_LD,
+					mtype, memorder, dest, src);
 
 	hbb->append_insn (atominsn);
 	break;
@@ -5242,7 +5274,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_EXCHANGE_4:
     case BUILT_IN_ATOMIC_EXCHANGE_8:
     case BUILT_IN_ATOMIC_EXCHANGE_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_EXCH, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_EXCH, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_ADD_1:
@@ -5250,7 +5283,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_ADD_4:
     case BUILT_IN_ATOMIC_FETCH_ADD_8:
     case BUILT_IN_ATOMIC_FETCH_ADD_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ADD, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_ADD, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_SUB_1:
@@ -5258,7 +5292,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_SUB_4:
     case BUILT_IN_ATOMIC_FETCH_SUB_8:
     case BUILT_IN_ATOMIC_FETCH_SUB_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_SUB, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_SUB, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_AND_1:
@@ -5266,7 +5301,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_AND_4:
     case BUILT_IN_ATOMIC_FETCH_AND_8:
     case BUILT_IN_ATOMIC_FETCH_AND_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_AND, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_AND, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_XOR_1:
@@ -5274,7 +5310,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_XOR_4:
     case BUILT_IN_ATOMIC_FETCH_XOR_8:
     case BUILT_IN_ATOMIC_FETCH_XOR_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_XOR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_XOR, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_OR_1:
@@ -5282,7 +5319,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_OR_4:
     case BUILT_IN_ATOMIC_FETCH_OR_8:
     case BUILT_IN_ATOMIC_FETCH_OR_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_OR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_OR, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_STORE_1:
@@ -5291,7 +5329,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_STORE_8:
     case BUILT_IN_ATOMIC_STORE_16:
       /* Since there cannot be any LHS, the first parameter is meaningless.  */
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ST, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_ST, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_ADD_FETCH_1:
@@ -5299,7 +5338,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_ADD_FETCH_4:
     case BUILT_IN_ATOMIC_ADD_FETCH_8:
     case BUILT_IN_ATOMIC_ADD_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_ADD, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_ADD, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_SUB_FETCH_1:
@@ -5307,7 +5346,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_SUB_FETCH_4:
     case BUILT_IN_ATOMIC_SUB_FETCH_8:
     case BUILT_IN_ATOMIC_SUB_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_SUB, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_SUB, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_AND_FETCH_1:
@@ -5315,7 +5354,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_AND_FETCH_4:
     case BUILT_IN_ATOMIC_AND_FETCH_8:
     case BUILT_IN_ATOMIC_AND_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_AND, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_AND, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_XOR_FETCH_1:
@@ -5323,7 +5362,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_XOR_FETCH_4:
     case BUILT_IN_ATOMIC_XOR_FETCH_8:
     case BUILT_IN_ATOMIC_XOR_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_XOR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_XOR, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_OR_FETCH_1:
@@ -5331,7 +5370,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_OR_FETCH_4:
     case BUILT_IN_ATOMIC_OR_FETCH_8:
     case BUILT_IN_ATOMIC_OR_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_OR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_OR, stmt, hbb, false);
       break;
 
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1:
@@ -5340,27 +5379,23 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8:
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16:
       {
-	/* TODO: Use the appropriate memory model for now.  */
 	tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
-
 	BrigType16_t atype
 	  = hsa_bittype_for_type (hsa_type_for_scalar_tree_type (type, false));
-
-	hsa_insn_atomic *atominsn
-	  = new hsa_insn_atomic (4, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_CAS, atype,
-				 BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE);
-	hsa_op_address *addr;
-	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+	BrigMemoryOrder memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+	hsa_insn_basic *atominsn;
+	hsa_op_base *tgt;
+	atominsn = new hsa_insn_atomic (4, BRIG_OPCODE_ATOMIC,
+					BRIG_ATOMIC_CAS, atype, memorder);
+	tgt = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
 
 	if (lhs != NULL)
 	  dest = hsa_cfun->reg_for_gimple_ssa (lhs);
 	else
 	  dest = new hsa_op_reg (atype);
 
-	/* Should check what the memory scope is.  */
-	atominsn->m_memoryscope = BRIG_MEMORY_SCOPE_WORKGROUP;
 	atominsn->set_op (0, dest);
-	atominsn->set_op (1, addr);
+	atominsn->set_op (1, tgt);
 
 	hsa_op_with_type *op
 	  = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb);
@@ -5371,20 +5406,42 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	hbb->append_insn (atominsn);
 	break;
       }
+
+    case BUILT_IN_HSA_WORKGROUPID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKGROUPID, hbb);
+      break;
+    case BUILT_IN_HSA_WORKITEMID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMID, hbb);
+      break;
+    case BUILT_IN_HSA_WORKITEMABSID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMABSID, hbb);
+      break;
+    case BUILT_IN_HSA_GRIDSIZE:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_GRIDSIZE, hbb);
+      break;
+    case BUILT_IN_HSA_CURRENTWORKGROUPSIZE:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_CURRENTWORKGROUPSIZE, hbb);
+      break;
+
+    case BUILT_IN_GOMP_BARRIER:
+      hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, BRIG_TYPE_NONE,
+					 BRIG_WIDTH_ALL));
+      break;
     case BUILT_IN_GOMP_PARALLEL:
       HSA_SORRY_AT (gimple_location (stmt),
 		    "support for HSA does not implement non-gridified "
 		    "OpenMP parallel constructs.");
       break;
+
     case BUILT_IN_OMP_GET_THREAD_NUM:
       {
-	query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	query_hsa_grid_nodim (stmt, BRIG_OPCODE_WORKITEMFLATABSID, hbb);
 	break;
       }
 
     case BUILT_IN_OMP_GET_NUM_THREADS:
       {
-	query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	gen_get_num_threads (stmt, hbb);
 	break;
       }
     case BUILT_IN_GOMP_TEAMS:
@@ -5469,9 +5526,19 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	gen_hsa_alloca (call, hbb);
 	break;
       }
+    case BUILT_IN_PREFETCH:
+      break;
     default:
       {
-	gen_hsa_insns_for_direct_call (stmt, hbb);
+	tree name_tree = DECL_NAME (fndecl);
+	const char *s = IDENTIFIER_POINTER (name_tree);
+	size_t len = strlen (s);
+	if (len > 4 && (strncmp (s, "__builtin_GOMP_", 15) == 0))
+	  HSA_SORRY_ATV (gimple_location (stmt),
+			 "support for HSA does not implement GOMP function %s",
+			 s);
+	else
+	  gen_hsa_insns_for_direct_call (stmt, hbb);
 	return;
       }
     }
@@ -5601,13 +5668,7 @@ gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
 	}
     }
 
-  hphi->m_prev = hbb->m_last_phi;
-  hphi->m_next = NULL;
-  if (hbb->m_last_phi)
-    hbb->m_last_phi->m_next = hphi;
-  hbb->m_last_phi = hphi;
-  if (!hbb->m_first_phi)
-    hbb->m_first_phi = hphi;
+  hbb->append_phi (hphi);
 }
 
 /* Constructor of class containing HSA-specific information about a basic
@@ -5650,7 +5711,8 @@ hsa_bb::~hsa_bb ()
 hsa_bb *
 hsa_init_new_bb (basic_block bb)
 {
-  return new (*hsa_allocp_bb) hsa_bb (bb);
+  void *m = obstack_alloc (&hsa_obstack, sizeof (hsa_bb));
+  return new (m) hsa_bb (bb);
 }
 
 /* Initialize OMP in an HSA basic block PROLOGUE.  */
diff --git a/gcc/hsa.c b/gcc/hsa.c
index 168cfe3..f881e78 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -170,6 +170,7 @@ hsa_insn_basic::op_output_p (unsigned opnum)
     case BRIG_OPCODE_SBR:
     case BRIG_OPCODE_ST:
     case BRIG_OPCODE_SIGNALNORET:
+    case BRIG_OPCODE_DEBUGTRAP:
       /* FIXME: There are probably missing cases here, double check.  */
       return false;
     case BRIG_OPCODE_EXPAND:
@@ -605,8 +606,8 @@ hsa_destroy_insn (hsa_insn_basic *insn)
 {
   if (hsa_insn_phi *phi = dyn_cast <hsa_insn_phi *> (insn))
     phi->~hsa_insn_phi ();
-  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
-    br->~hsa_insn_br ();
+  else if (hsa_insn_cbr *br = dyn_cast <hsa_insn_cbr *> (insn))
+    br->~hsa_insn_cbr ();
   else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
     cmp->~hsa_insn_cmp ();
   else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
@@ -621,6 +622,8 @@ hsa_destroy_insn (hsa_insn_basic *insn)
     block->~hsa_insn_arg_block ();
   else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
     sbr->~hsa_insn_sbr ();
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    br->~hsa_insn_br ();
   else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
     comment->~hsa_insn_comment ();
   else
@@ -783,32 +786,22 @@ hsa_brig_function_name (const char *p)
   return buf;
 }
 
-/* Return declaration name if exists.  */
+/* Add a flatten attribute and disable vectorization for gpu implementation
+   function decl GDECL.  */
 
-const char *
-hsa_get_declaration_name (tree decl)
+void hsa_summary_t::process_gpu_implementation_attributes (tree gdecl)
 {
-  if (!DECL_NAME (decl))
-    {
-      char buf[64];
-      snprintf (buf, 64, "__hsa_anonymous_%i", DECL_UID (decl));
-      const char *ggc_str = ggc_strdup (buf);
-      return ggc_str;
-    }
-
-  tree name_tree;
-  if (TREE_CODE (decl) == FUNCTION_DECL
-      || (VAR_P (decl) && is_global_var (decl)))
-    name_tree = DECL_ASSEMBLER_NAME (decl);
-  else
-    name_tree = DECL_NAME (decl);
-
-  const char *name = IDENTIFIER_POINTER (name_tree);
-  /* User-defined assembly names have prepended asterisk symbol.  */
-  if (name[0] == '*')
-    name++;
+  DECL_ATTRIBUTES (gdecl)
+    = tree_cons (get_identifier ("flatten"), NULL_TREE,
+		 DECL_ATTRIBUTES (gdecl));
 
-  return name;
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
+  if (fn_opts == NULL_TREE)
+    fn_opts = optimization_default_node;
+  fn_opts = copy_node (fn_opts);
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
 }
 
 void
@@ -827,21 +820,10 @@ hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
   gpu_summary->m_gridified_kernel_p = gridified_kernel_p;
   host_summary->m_gridified_kernel_p = gridified_kernel_p;
 
-  gpu_summary->m_binded_function = host;
-  host_summary->m_binded_function = gpu;
-
-  tree gdecl = gpu->decl;
-  DECL_ATTRIBUTES (gdecl)
-    = tree_cons (get_identifier ("flatten"), NULL_TREE,
-		 DECL_ATTRIBUTES (gdecl));
+  gpu_summary->m_bound_function = host;
+  host_summary->m_bound_function = gpu;
 
-  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
-  if (fn_opts == NULL_TREE)
-    fn_opts = optimization_default_node;
-  fn_opts = copy_node (fn_opts);
-  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
-  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
-  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+  process_gpu_implementation_attributes (gpu->decl);
 
   /* Create reference between a kernel and a corresponding host implementation
      to quarantee LTO streaming to a same LTRANS.  */
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 1b57a3c..c00ffd5 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -50,7 +50,6 @@ class hsa_insn_basic;
 class hsa_op_address;
 class hsa_op_reg;
 class hsa_bb;
-typedef hsa_insn_basic *hsa_insn_basic_p;
 
 /* Class representing an input argument, output argument (result) or a
    variable, that will eventually end up being a symbol directive.  */
@@ -72,7 +71,8 @@ struct hsa_symbol
   void fillup_for_decl (tree decl);
 
   /* Pointer to the original tree, which is PARM_DECL for input parameters and
-     RESULT_DECL for the output parameters.  */
+     RESULT_DECL for the output parameters.  Also can be CONST_DECL for Fortran
+     constants which need to be put into readonly segment.  */
   tree m_decl;
 
   /* Name of the symbol, that will be written into output and dumps.  Can be
@@ -259,11 +259,9 @@ private:
   /* Set definition where the register is defined.  */
   void set_definition (hsa_insn_basic *insn);
   /* Uses of the value while still in SSA.  */
-  auto_vec <hsa_insn_basic_p> m_uses;
+  auto_vec <hsa_insn_basic *> m_uses;
 };
 
-typedef class hsa_op_reg *hsa_op_reg_p;
-
 /* Report whether or not P is a register operand.  */
 
 template <>
@@ -490,17 +488,12 @@ class hsa_insn_phi : public hsa_insn_basic
 public:
   hsa_insn_phi (unsigned nops, hsa_op_reg *dst);
 
-  void *operator new (size_t);
-
   /* Destination.  */
   hsa_op_reg *m_dest;
 
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_phi () : hsa_insn_basic (1, HSA_OPCODE_PHI) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a PHI node.  */
@@ -513,35 +506,56 @@ is_a_helper <hsa_insn_phi *>::test (hsa_insn_basic *p)
   return p->m_opcode == HSA_OPCODE_PHI;
 }
 
-/* HSA instruction for branches.  Currently we explicitely represent only
-   conditional branches.  */
-
+/* HSA instruction for  */
 class hsa_insn_br : public hsa_insn_basic
 {
 public:
-  hsa_insn_br (hsa_op_reg *ctrl);
-
-  void *operator new (size_t);
+  hsa_insn_br (unsigned nops, int opc, BrigType16_t t, BrigWidth8_t width,
+	       hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+	       hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
-  /* Width as described in HSA documentation.  */
+  /* Number of work-items affected in the same way by the instruction.  */
   BrigWidth8_t m_width;
+
 private:
   /* Make the default constructor inaccessible.  */
-  hsa_insn_br () : hsa_insn_basic (1, BRIG_OPCODE_CBR) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
+  hsa_insn_br () : hsa_insn_basic (0, BRIG_OPCODE_BR) {}
 };
 
-/* Report whether P is a branching instruction.  */
+/* Return true if P is a branching/synchronization instruction.  */
 
 template <>
 template <>
 inline bool
 is_a_helper <hsa_insn_br *>::test (hsa_insn_basic *p)
 {
-  return p->m_opcode == BRIG_OPCODE_BR
-    || p->m_opcode == BRIG_OPCODE_CBR;
+  return p->m_opcode == BRIG_OPCODE_BARRIER
+    || p->m_opcode == BRIG_OPCODE_BR;
+}
+
+/* HSA instruction for conditional branches.  Structurally the same as
+   hsa_insn_br but we represent it specially because of inherent control
+   flow it represents.  */
+
+class hsa_insn_cbr : public hsa_insn_br
+{
+public:
+  hsa_insn_cbr (hsa_op_reg *ctrl);
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_cbr () : hsa_insn_br (0, BRIG_OPCODE_CBR, BRIG_TYPE_B1,
+				 BRIG_WIDTH_1) {}
+};
+
+/* Report whether P is a contitional branching instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_cbr *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == BRIG_OPCODE_CBR;
 }
 
 /* HSA instruction for switch branches.  */
@@ -554,8 +568,6 @@ public:
   /* Default destructor.  */
   ~hsa_insn_sbr ();
 
-  void *operator new (size_t);
-
   void replace_all_labels (basic_block old_bb, basic_block new_bb);
 
   /* Width as described in HSA documentation.  */
@@ -570,9 +582,6 @@ public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_sbr () : hsa_insn_basic (1, BRIG_OPCODE_SBR) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether P is a switch branching instruction.  */
@@ -594,8 +603,6 @@ public:
 		hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		hsa_op_base *arg2 = NULL);
 
-  void *operator new (size_t);
-
   /* Source type should be derived from operand types.  */
 
   /* The comparison operation.  */
@@ -606,9 +613,6 @@ public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_cmp () : hsa_insn_basic (1, BRIG_OPCODE_CMP) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a comparison instruction.  */
@@ -628,8 +632,6 @@ class hsa_insn_mem : public hsa_insn_basic
 public:
   hsa_insn_mem (int opc, BrigType16_t t, hsa_op_base *arg0, hsa_op_base *arg1);
 
-  void *operator new (size_t);
-
   /* Set alignment to VALUE.  */
 
   void set_align (BrigAlignment8_t value);
@@ -652,9 +654,6 @@ protected:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_mem () : hsa_insn_basic (1, BRIG_OPCODE_LD) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a memory instruction.  */
@@ -677,7 +676,6 @@ public:
 		   BrigType16_t t, BrigMemoryOrder memorder,
 		   hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
-  void *operator new (size_t);
 
   /* The operation itself.  */
   enum BrigAtomicOperation m_atomicop;
@@ -691,9 +689,6 @@ public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_atomic () : hsa_insn_mem (1, BRIG_KIND_NONE, BRIG_TYPE_NONE) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is an atomic instruction.  */
@@ -709,20 +704,19 @@ is_a_helper <hsa_insn_atomic *>::test (hsa_insn_basic *p)
 
 /* HSA instruction for signal operations.  */
 
-class hsa_insn_signal : public hsa_insn_atomic
+class hsa_insn_signal : public hsa_insn_basic
 {
 public:
   hsa_insn_signal (int nops, int opc, enum BrigAtomicOperation sop,
-		   BrigType16_t t, hsa_op_base *arg0 = NULL,
-		   hsa_op_base *arg1 = NULL,
+		   BrigType16_t t, BrigMemoryOrder memorder,
+		   hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
-  void *operator new (size_t);
+  /* Things like acquire/release/aligned.  */
+  enum BrigMemoryOrder m_memory_order;
 
-private:
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
+  /* The operation itself.  */
+  enum BrigAtomicOperation m_signalop;
 };
 
 /* Report whether or not P is a signal instruction.  */
@@ -744,8 +738,6 @@ public:
   hsa_insn_seg (int opc, BrigType16_t destt, BrigType16_t srct,
 		BrigSegment8_t seg, hsa_op_base *arg0, hsa_op_base *arg1);
 
-  void *operator new (size_t);
-
   /* Source type.  Depends on the source addressing/segment.  */
   BrigType16_t m_src_type;
   /* The segment we are converting from or to.  */
@@ -753,9 +745,6 @@ public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_seg () : hsa_insn_basic (1, BRIG_OPCODE_STOF) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a segment conversion instruction.  */
@@ -812,8 +801,6 @@ public:
   /* Default destructor.  */
   ~hsa_insn_call ();
 
-  void *operator new (size_t);
-
   /* Called function.  */
   tree m_called_function;
 
@@ -840,9 +827,6 @@ public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_call () : hsa_insn_basic (0, BRIG_OPCODE_CALL) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a call instruction.  */
@@ -866,17 +850,11 @@ class hsa_insn_arg_block : public hsa_insn_basic
 public:
   hsa_insn_arg_block (BrigKind brig_kind, hsa_insn_call * call);
 
-  void *operator new (size_t);
-
   /* Kind of argument block.  */
   BrigKind m_kind;
 
   /* Call instruction.  */
   hsa_insn_call *m_call_insn;
-private:
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a call block instruction.  */
@@ -900,8 +878,6 @@ public:
   /* Default destructor.  */
   ~hsa_insn_comment ();
 
-  void *operator new (size_t);
-
   char *m_comment;
 };
 
@@ -920,10 +896,18 @@ is_a_helper <hsa_insn_comment *>::test (hsa_insn_basic *p)
 class hsa_insn_queue: public hsa_insn_basic
 {
 public:
-  hsa_insn_queue (int nops, BrigOpcode opcode);
+  hsa_insn_queue (int nops, int opcode, BrigSegment segment,
+		  BrigMemoryOrder memory_order,
+		  hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+		  hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
   /* Destructor.  */
   ~hsa_insn_queue ();
+
+  /* Segment used to refer to the queue.  Must be global or flat.  */
+  BrigSegment m_segment;
+  /* Memory order used to specify synchronization.  */
+  BrigMemoryOrder m_memory_order;
 };
 
 /* Report whether or not P is a queue instruction.  */
@@ -933,7 +917,12 @@ template <>
 inline bool
 is_a_helper <hsa_insn_queue *>::test (hsa_insn_basic *p)
 {
-  return (p->m_opcode == BRIG_OPCODE_ADDQUEUEWRITEINDEX);
+  return (p->m_opcode == BRIG_OPCODE_ADDQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_CASQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_LDQUEUEREADINDEX
+	  || p->m_opcode == BRIG_OPCODE_LDQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_STQUEUEREADINDEX
+	  || p->m_opcode == BRIG_OPCODE_STQUEUEWRITEINDEX);
 }
 
 /* HSA source type instruction.  */
@@ -945,9 +934,6 @@ public:
 		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
 		   hsa_op_base *arg2);
 
-  /* Pool allocator.  */
-  void *operator new (size_t);
-
   /* Source type.  */
   BrigType16_t m_source_type;
 
@@ -976,9 +962,6 @@ public:
 		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
 		   hsa_op_base *arg2);
 
-  /* Pool allocator.  */
-  void *operator new (size_t);
-
   /* Operand list for an operand of the instruction.  */
   hsa_op_operand_list *m_operand_list;
 
@@ -1003,9 +986,6 @@ class hsa_insn_cvt: public hsa_insn_basic
 {
 public:
   hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src);
-
-  /* Pool allocator.  */
-  void *operator new (size_t);
 };
 
 /* Report whether or not P is a convert instruction.  */
@@ -1028,9 +1008,6 @@ public:
 
   /* Required alignment of the allocation.  */
   BrigAlignment8_t m_align;
-
-  /* Pool allocator.  */
-  void *operator new (size_t);
 };
 
 /* Report whether or not P is an alloca instruction.  */
@@ -1055,6 +1032,9 @@ public:
   /* Append an instruction INSN into the basic block.  */
   void append_insn (hsa_insn_basic *insn);
 
+  /* Add a PHI instruction.  */
+  void append_phi (hsa_insn_phi *phi);
+
   /* The real CFG BB that this HBB belongs to.  */
   basic_block m_bb;
 
@@ -1217,7 +1197,7 @@ public:
   unsigned m_temp_symbol_count;
 
   /* SSA names mapping.  */
-  vec <hsa_op_reg_p> m_ssa_map;
+  vec <hsa_op_reg *> m_ssa_map;
 
   /* Flag whether a function needs update of dominators before RA.  */
   bool m_modified_cfg;
@@ -1239,9 +1219,9 @@ struct hsa_function_summary
   hsa_function_kind m_kind;
 
   /* Pointer to a cgraph node which is a HSA implementation of the function.
-     In case of the function is a HSA function, the binded function points
+     In case of the function is a HSA function, the bound function points
      to the host function.  */
-  cgraph_node *m_binded_function;
+  cgraph_node *m_bound_function;
 
   /* Identifies if the function is an HSA function or a host function.  */
   bool m_gpu_implementation_p;
@@ -1252,7 +1232,7 @@ struct hsa_function_summary
 
 inline
 hsa_function_summary::hsa_function_summary (): m_kind (HSA_NONE),
-  m_binded_function (NULL), m_gpu_implementation_p (false)
+  m_bound_function (NULL), m_gpu_implementation_p (false)
 {
 }
 
@@ -1270,6 +1250,9 @@ public:
 
   void link_functions (cgraph_node *gpu, cgraph_node *host,
 		       hsa_function_kind kind, bool gridified_kernel_p);
+
+private:
+  void process_gpu_implementation_attributes (tree gdecl);
 };
 
 /* OMP simple builtin describes behavior that should be done for
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 769657f..0fbe2e2 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -79,7 +79,7 @@ process_hsa_functions (void)
       hsa_function_summary *s = hsa_summaries->get (node);
 
       /* A linked function is skipped.  */
-      if (s->m_binded_function != NULL)
+      if (s->m_bound_function != NULL)
 	continue;
 
       if (s->m_kind != HSA_NONE)
@@ -90,6 +90,7 @@ process_hsa_functions (void)
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
 					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+	  clone->externally_visible = node->externally_visible;
 
 	  clone->force_output = true;
 	  hsa_summaries->link_functions (clone, node, s->m_kind, false);
@@ -107,6 +108,7 @@ process_hsa_functions (void)
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
 					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+	  clone->externally_visible = node->externally_visible;
 
 	  if (!cgraph_local_p (node))
 	    clone->force_output = true;
@@ -131,7 +133,7 @@ process_hsa_functions (void)
 	      hsa_function_summary *dst = hsa_summaries->get (e->callee);
 	      if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
 		{
-		  e->redirect_callee (dst->m_binded_function);
+		  e->redirect_callee (dst->m_bound_function);
 		  if (dump_file)
 		    fprintf (dump_file,
 			     "Redirecting edge to HSA function: %s->%s\n",
@@ -193,10 +195,10 @@ ipa_hsa_write_summary (void)
 	  bp = bitpack_create (ob->main_stream);
 	  bp_pack_value (&bp, s->m_kind, 2);
 	  bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
-	  bp_pack_value (&bp, s->m_binded_function != NULL, 1);
+	  bp_pack_value (&bp, s->m_bound_function != NULL, 1);
 	  streamer_write_bitpack (&bp);
-	  if (s->m_binded_function)
-	    stream_write_tree (ob, s->m_binded_function->decl, true);
+	  if (s->m_bound_function)
+	    stream_write_tree (ob, s->m_bound_function->decl, true);
 	}
     }
 
@@ -249,7 +251,7 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
       if (has_tree)
 	{
 	  tree decl = stream_read_tree (&ib_main, data_in);
-	  s->m_binded_function = cgraph_node::get_create (decl);
+	  s->m_bound_function = cgraph_node::get_create (decl);
 	}
     }
   lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
diff --git a/libgomp/testsuite/libgomp.hsa.c/bits-insns.c b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
new file mode 100644
index 0000000..21cac72
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
@@ -0,0 +1,73 @@
+#include <math.h>
+
+#define N 12
+
+int main()
+{
+  unsigned int arguments[N] = {0u, 1u, 2u, 3u, 111u, 333u, 444u, 0x80000000u, 0x0000ffffu, 0xf0000000u, 0xff000000u, 0xffffffffu};
+  int clrsb[N] = {};
+  int clz[N] = {};
+  int ctz[N] = {};
+  int ffs[N] = {};
+  int parity[N] = {};
+  int popcount[N] = {};
+
+  int ref_clrsb[N] = {};
+  int ref_clz[N] = {};
+  int ref_ctz[N] = {};
+  int ref_ffs[N] = {};
+  int ref_parity[N] = {};
+  int ref_popcount[N] = {};
+
+  for (unsigned i = 0; i < N; i++)
+    {
+      ref_clrsb[i] = __builtin_clrsb (arguments[i]);
+      ref_clz[i] = __builtin_clz (arguments[i]);
+      ref_ctz[i] = __builtin_ctz (arguments[i]);
+      ref_ffs[i] = __builtin_ffs (arguments[i]);
+      ref_parity[i] = __builtin_parity (arguments[i]);
+      ref_popcount[i] = __builtin_popcount (arguments[i]);
+    }
+
+  #pragma omp target map(from:clz, ctz, ffs, parity, popcount)
+  {
+    for (unsigned i = 0; i < N; i++)
+    {
+      clrsb[i] = __builtin_clrsb (arguments[i]);
+      clz[i] = __builtin_clz (arguments[i]);
+      ctz[i] = __builtin_ctz (arguments[i]);
+      ffs[i] = __builtin_ffs (arguments[i]);
+      parity[i] = __builtin_parity (arguments[i]);
+      popcount[i] = __builtin_popcount (arguments[i]);
+    }
+  }
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_clrsb[i] != clrsb[i])
+      __builtin_abort ();
+
+  /* CLZ of zero is undefined for zero.  */
+  for (unsigned i = 1; i < N; i++)
+    if (ref_clz[i] != clz[i])
+      __builtin_abort ();
+
+  /* Likewise for ctz */
+  for (unsigned i = 1; i < N; i++)
+    if (ref_ctz[i] != ctz[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_ffs[i] != ffs[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_parity[i] != parity[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_popcount[i] != popcount[i])
+      __builtin_abort ();
+
+  return 0;
+}
+
-- 
2.10.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] Remove build dependence on HSA run-time
  2016-11-13 23:20 [PATCH 0/4] Merge from HSA branch to trunk Martin Jambor
  2016-11-13 23:20 ` [PATCH 4/4] Back-end and IPA bits of hsa branch merge Martin Jambor
@ 2016-11-13 23:20 ` Martin Jambor
  2016-11-18 10:23   ` Jakub Jelinek
  2016-11-13 23:20 ` [PATCH 2/4] HSA specific built-ins Martin Jambor
  2016-11-13 23:20 ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
  3 siblings, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-11-13 23:20 UTC (permalink / raw)
  To: GCC Patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 79526 bytes --]

Hi,

over the last year there have been only two changes to the HSA libgomp
plugin, both are in the following patch.  The first change allows
running kernels in HSA grid with multiple dimensions.  The second one
changes the way the plugin calls hsa run-time to dynamic shared object
loading, which has the benefit that the run-time does not need to be a
build dependence, which should make life considerably easier for
people allowing HSA offloading in packaged gccs.  We actually carry a
very similar patch in openSUSE Tumbleweed gcc to achieve just that.

I'm not sure whether I can approve this change as the HSA maintainer
or not but since Richi has seen the patch he put into the SUSE
package, I hope it is not controversial.  The patch has passed
bootstrap and checking on x86_64-linux.  OK for trunk?

Thanks


Martin


2016-11-11  Martin Liska  <mliska@suse.cz>
             Martin Jambor  <mjambor@suse.cz>

gcc/
	* doc/install.texi: Remove entry about --with-hsa-kmt-lib.

libgomp/
	* config.h.in: Introduce HSA_RUNTIME_LIB.
	* configure: Regerenated.
	* plugin/hsa.h: New file.
	* plugin/hsa_ext_finalize.h: New file.
	* plugin/configfrag.ac: Remove hsa-kmt-lib test.
	* plugin/plugin-hsa.c: Include config.h, inttypes.h and stdbool.h.
	(struct hsa_runtime_fn_info): New structure.
	(hsa_runtime_fn_info hsa_fns): New variable.
	(hsa_runtime_lib): Likewise.
	(support_cpu_devices): Likewise.
	(init_enviroment_variables): Load newly introduced ENV
	variables.
	(hsa_warn): Call hsa run-time functions via hsa_fns structure.
	(hsa_fatal): Likewise.
	(DLSYM_FN): New macro.
	(init_hsa_runtime_functions): New function.
	(suitable_hsa_agent_p): Call hsa run-time functions via hsa_fns
	structure.  Depending on environment, also allow CPU devices.
	(init_hsa_context): Call hsa run-time functions via hsa_fns structure.
	(get_kernarg_memory_region): Likewise.
	(GOMP_OFFLOAD_init_device): Likewise.
	(destroy_hsa_program): Likewise.
	(init_basic_kernel_info): New function.
	(GOMP_OFFLOAD_load_image): Use it.
	(create_and_finalize_hsa_program): Call hsa run-time functions via
	hsa_fns structure.
	(create_single_kernel_dispatch): Likewise.
	(release_kernel_dispatch): Likewise.
	(init_single_kernel): Likewise.
	(parse_target_attributes): Allow up multiple HSA grid dimensions.
	(get_group_size): New function.
	(run_kernel): Likewise.
	(GOMP_OFFLOAD_run): Outline most functionality to run_kernel.
	(GOMP_OFFLOAD_fini_device): Call hsa run-time functions via hsa_fns
	structure.
	* testsuite/lib/libgomp.exp: Remove hsa_kmt_lib support.
	* testsuite/libgomp-test-support.exp.in: Likewise.
---
 gcc/doc/install.texi                          |   6 -
 libgomp/config.h.in                           |   3 +
 libgomp/configure                             |  56 +--
 libgomp/plugin/configfrag.ac                  |  32 +-
 libgomp/plugin/hsa.h                          | 630 ++++++++++++++++++++++++++
 libgomp/plugin/hsa_ext_finalize.h             | 265 +++++++++++
 libgomp/plugin/plugin-hsa.c                   | 471 ++++++++++++++-----
 libgomp/testsuite/lib/libgomp.exp             |   4 -
 libgomp/testsuite/libgomp-test-support.exp.in |   1 -
 9 files changed, 1281 insertions(+), 187 deletions(-)
 create mode 100644 libgomp/plugin/hsa.h
 create mode 100644 libgomp/plugin/hsa_ext_finalize.h

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index e4c686e..eef7aab 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2021,12 +2021,6 @@ explicitly specify the directory where they are installed.  The
 shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
 @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
-
-@item --with-hsa-kmt-lib=@var{pathname}
-
-If you configure GCC with HSA offloading but do not have the HSA
-KMT library installed in a standard location then you can
-explicitly specify the directory where it resides.
 @end table
 
 @subheading Cross-Compiler-Specific Options
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 226ac53..4483a84 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -125,6 +125,9 @@
 /* Define to 1 if the HSA plugin is built, 0 if not. */
 #undef PLUGIN_HSA
 
+/* Define path to HSA runtime.  */
+#undef HSA_RUNTIME_LIB
+
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
diff --git a/libgomp/configure b/libgomp/configure
index 8d03eb6..6b3e639 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -637,7 +637,6 @@ PLUGIN_HSA_LIBS
 PLUGIN_HSA_LDFLAGS
 PLUGIN_HSA_CPPFLAGS
 PLUGIN_HSA
-HSA_KMT_LIB
 HSA_RUNTIME_LIB
 HSA_RUNTIME_INCLUDE
 PLUGIN_NVPTX_LIBS
@@ -794,7 +793,6 @@ with_cuda_driver_lib
 with_hsa_runtime
 with_hsa_runtime_include
 with_hsa_runtime_lib
-with_hsa_kmt_lib
 enable_linux_futex
 enable_tls
 enable_symvers
@@ -1476,7 +1474,6 @@ Optional Packages:
   --with-hsa-runtime-lib=PATH
                           specify directory for the installed HSA run-time
                           library
-  --with-hsa-kmt-lib=PATH specify directory for installed HSA KMT library.
 
 Some influential environment variables:
   CC          C compiler command
@@ -11145,7 +11142,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11148 "configure"
+#line 11145 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11251,7 +11248,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11254 "configure"
+#line 11251 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15293,22 +15290,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
-HSA_KMT_LIB=
-
-HSA_KMT_LDFLAGS=
-
-# Check whether --with-hsa-kmt-lib was given.
-if test "${with_hsa_kmt_lib+set}" = set; then :
-  withval=$with_hsa_kmt_lib;
-fi
-
-if test "x$with_hsa_kmt_lib" != x; then
-  HSA_KMT_LIB=$with_hsa_kmt_lib
-fi
-if test "x$HSA_KMT_LIB" != x; then
-  HSA_KMT_LDFLAGS=-L$HSA_KMT_LIB
-fi
-
 PLUGIN_HSA=0
 PLUGIN_HSA_CPPFLAGS=
 PLUGIN_HSA_LDFLAGS=
@@ -15318,8 +15299,6 @@ PLUGIN_HSA_LIBS=
 
 
 
-
-
 # Get offload targets and path to install tree of offloading compiler.
 offload_additional_options=
 offload_additional_lib_paths=
@@ -15384,8 +15363,8 @@ rm -f core conftest.err conftest.$ac_objext \
 	        tgt_name=hsa
 	        PLUGIN_HSA=$tgt
 	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
-	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
+	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
+	        PLUGIN_HSA_LIBS="-ldl"
 
 	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
 	        CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
@@ -15394,22 +15373,7 @@ rm -f core conftest.err conftest.$ac_objext \
 	        PLUGIN_HSA_save_LIBS=$LIBS
 	        LIBS="$PLUGIN_HSA_LIBS $LIBS"
 
-	        cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#include "hsa.h"
-int
-main ()
-{
-hsa_status_t status = hsa_init ()
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  PLUGIN_HSA=1
-fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
+	        PLUGIN_HSA=1
 	        CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
 	        LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
 	        LIBS=$PLUGIN_HSA_save_LIBS
@@ -15484,6 +15448,16 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+if test "$HSA_RUNTIME_LIB" != ""; then
+  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
+fi
+
+
+cat >>confdefs.h <<_ACEOF
+#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
+_ACEOF
+
+
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 88b4156..292829f 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -118,19 +118,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
-HSA_KMT_LIB=
-AC_SUBST(HSA_KMT_LIB)
-HSA_KMT_LDFLAGS=
-AC_ARG_WITH(hsa-kmt-lib,
-	[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
-		[specify directory for installed HSA KMT library.])])
-if test "x$with_hsa_kmt_lib" != x; then
-  HSA_KMT_LIB=$with_hsa_kmt_lib
-fi
-if test "x$HSA_KMT_LIB" != x; then
-  HSA_KMT_LDFLAGS=-L$HSA_KMT_LIB
-fi
-
 PLUGIN_HSA=0
 PLUGIN_HSA_CPPFLAGS=
 PLUGIN_HSA_LDFLAGS=
@@ -140,8 +127,6 @@ AC_SUBST(PLUGIN_HSA_CPPFLAGS)
 AC_SUBST(PLUGIN_HSA_LDFLAGS)
 AC_SUBST(PLUGIN_HSA_LIBS)
 
-
-
 # Get offload targets and path to install tree of offloading compiler.
 offload_additional_options=
 offload_additional_lib_paths=
@@ -195,8 +180,8 @@ if test x"$enable_offload_targets" != x; then
 	        tgt_name=hsa
 	        PLUGIN_HSA=$tgt
 	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
-	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
+	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
+	        PLUGIN_HSA_LIBS="-ldl"
 
 	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
 	        CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
@@ -205,11 +190,7 @@ if test x"$enable_offload_targets" != x; then
 	        PLUGIN_HSA_save_LIBS=$LIBS
 	        LIBS="$PLUGIN_HSA_LIBS $LIBS"
 
-	        AC_LINK_IFELSE(
-	          [AC_LANG_PROGRAM(
-	            [#include "hsa.h"],
-	              [hsa_status_t status = hsa_init ()])],
-	          [PLUGIN_HSA=1])
+	        PLUGIN_HSA=1
 	        CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
 	        LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
 	        LIBS=$PLUGIN_HSA_save_LIBS
@@ -260,3 +241,10 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
+
+if test "$HSA_RUNTIME_LIB" != ""; then
+  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
+fi
+
+AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
+  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/hsa.h b/libgomp/plugin/hsa.h
new file mode 100644
index 0000000..6765751
--- /dev/null
+++ b/libgomp/plugin/hsa.h
@@ -0,0 +1,630 @@
+/* HSA runtime API 1.0.1 representation description.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.
+
+The contents of the file was created by extracting data structures, enum,
+typedef and other definitions from HSA Runtime Programmer’s Reference Manual
+Version 1.0 (http://www.hsafoundation.com/standards/).
+
+HTML version is provided on the following link:
+http://www.hsafoundation.com/html/Content/Runtime/Topics/Runtime_title_page.htm
+*/
+
+#ifndef _HSA_H
+#define _HSA_H 1
+
+#define HSA_LARGE_MODEL 1
+
+typedef struct hsa_signal_s { uint64_t handle; } hsa_signal_t;
+typedef enum {
+  HSA_QUEUE_TYPE_MULTI = 0,
+  HSA_QUEUE_TYPE_SINGLE = 1
+} hsa_queue_type_t;
+
+typedef enum { HSA_PROFILE_BASE = 0, HSA_PROFILE_FULL = 1 } hsa_profile_t;
+typedef struct hsa_region_s { uint64_t handle; } hsa_region_t;
+typedef enum {
+  HSA_EXECUTABLE_SYMBOL_INFO_TYPE = 0,
+  HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH = 1,
+  HSA_EXECUTABLE_SYMBOL_INFO_NAME = 2,
+  HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3,
+  HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME = 4,
+  HSA_EXECUTABLE_SYMBOL_INFO_AGENT = 20,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS = 21,
+  HSA_EXECUTABLE_SYMBOL_INFO_LINKAGE = 5,
+  HSA_EXECUTABLE_SYMBOL_INFO_IS_DEFINITION = 17,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SEGMENT = 7,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE = 9,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_IS_CONST = 10,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT = 22,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15,
+  HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_OBJECT = 23,
+  HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16
+} hsa_executable_symbol_info_t;
+typedef enum {
+  HSA_REGION_GLOBAL_FLAG_KERNARG = 1,
+  HSA_REGION_GLOBAL_FLAG_FINE_GRAINED = 2,
+  HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED = 4
+} hsa_region_global_flag_t;
+typedef struct hsa_code_object_s { uint64_t handle; } hsa_code_object_t;
+typedef enum {
+  HSA_KERNEL_DISPATCH_PACKET_SETUP_WIDTH_DIMENSIONS = 2
+} hsa_kernel_dispatch_packet_setup_width_t;
+typedef enum {
+  HSA_DEVICE_TYPE_CPU = 0,
+  HSA_DEVICE_TYPE_GPU = 1,
+  HSA_DEVICE_TYPE_DSP = 2
+} hsa_device_type_t;
+typedef enum {
+  HSA_STATUS_SUCCESS = 0x0,
+  HSA_STATUS_INFO_BREAK = 0x1,
+  HSA_STATUS_ERROR = 0x1000,
+  HSA_STATUS_ERROR_INVALID_ARGUMENT = 0x1001,
+  HSA_STATUS_ERROR_INVALID_QUEUE_CREATION = 0x1002,
+  HSA_STATUS_ERROR_INVALID_ALLOCATION = 0x1003,
+  HSA_STATUS_ERROR_INVALID_AGENT = 0x1004,
+  HSA_STATUS_ERROR_INVALID_REGION = 0x1005,
+  HSA_STATUS_ERROR_INVALID_SIGNAL = 0x1006,
+  HSA_STATUS_ERROR_INVALID_QUEUE = 0x1007,
+  HSA_STATUS_ERROR_OUT_OF_RESOURCES = 0x1008,
+  HSA_STATUS_ERROR_INVALID_PACKET_FORMAT = 0x1009,
+  HSA_STATUS_ERROR_RESOURCE_FREE = 0x100A,
+  HSA_STATUS_ERROR_NOT_INITIALIZED = 0x100B,
+  HSA_STATUS_ERROR_REFCOUNT_OVERFLOW = 0x100C,
+  HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS = 0x100D,
+  HSA_STATUS_ERROR_INVALID_INDEX = 0x100E,
+  HSA_STATUS_ERROR_INVALID_ISA = 0x100F,
+  HSA_STATUS_ERROR_INVALID_ISA_NAME = 0x1017,
+  HSA_STATUS_ERROR_INVALID_CODE_OBJECT = 0x1010,
+  HSA_STATUS_ERROR_INVALID_EXECUTABLE = 0x1011,
+  HSA_STATUS_ERROR_FROZEN_EXECUTABLE = 0x1012,
+  HSA_STATUS_ERROR_INVALID_SYMBOL_NAME = 0x1013,
+  HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED = 0x1014,
+  HSA_STATUS_ERROR_VARIABLE_UNDEFINED = 0x1015,
+  HSA_STATUS_ERROR_EXCEPTION = 0x1016
+} hsa_status_t;
+typedef enum {
+  HSA_EXTENSION_FINALIZER = 0,
+  HSA_EXTENSION_IMAGES = 1
+} hsa_extension_t;
+typedef struct hsa_queue_s {
+  hsa_queue_type_t type;
+  uint32_t features;
+
+#ifdef HSA_LARGE_MODEL
+  void *base_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *base_address;
+  uint32_t reserved0;
+#else
+  uint32_t reserved0;
+  void *base_address;
+#endif
+
+  hsa_signal_t doorbell_signal;
+  uint32_t size;
+  uint32_t reserved1;
+  uint64_t id;
+} hsa_queue_t;
+typedef struct hsa_agent_dispatch_packet_s {
+  uint16_t header;
+  uint16_t type;
+  uint32_t reserved0;
+
+#ifdef HSA_LARGE_MODEL
+  void *return_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *return_address;
+  uint32_t reserved1;
+#else
+  uint32_t reserved1;
+  void *return_address;
+#endif
+  uint64_t arg[4];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_agent_dispatch_packet_t;
+typedef enum {
+  HSA_CODE_SYMBOL_INFO_TYPE = 0,
+  HSA_CODE_SYMBOL_INFO_NAME_LENGTH = 1,
+  HSA_CODE_SYMBOL_INFO_NAME = 2,
+  HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3,
+  HSA_CODE_SYMBOL_INFO_MODULE_NAME = 4,
+  HSA_CODE_SYMBOL_INFO_LINKAGE = 5,
+  HSA_CODE_SYMBOL_INFO_IS_DEFINITION = 17,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT = 7,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE = 9,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST = 10,
+  HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11,
+  HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12,
+  HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13,
+  HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14,
+  HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15,
+  HSA_CODE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16
+} hsa_code_symbol_info_t;
+typedef enum {
+  HSA_QUEUE_FEATURE_KERNEL_DISPATCH = 1,
+  HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2
+} hsa_queue_feature_t;
+typedef enum {
+  HSA_VARIABLE_ALLOCATION_AGENT = 0,
+  HSA_VARIABLE_ALLOCATION_PROGRAM = 1
+} hsa_variable_allocation_t;
+typedef enum {
+  HSA_FENCE_SCOPE_NONE = 0,
+  HSA_FENCE_SCOPE_AGENT = 1,
+  HSA_FENCE_SCOPE_SYSTEM = 2
+} hsa_fence_scope_t;
+typedef struct hsa_agent_s { uint64_t handle; } hsa_agent_t;
+typedef enum { HSA_CODE_OBJECT_TYPE_PROGRAM = 0 } hsa_code_object_type_t;
+typedef enum {
+  HSA_SIGNAL_CONDITION_EQ = 0,
+  HSA_SIGNAL_CONDITION_NE = 1,
+  HSA_SIGNAL_CONDITION_LT = 2,
+  HSA_SIGNAL_CONDITION_GTE = 3
+} hsa_signal_condition_t;
+typedef enum {
+  HSA_EXECUTABLE_STATE_UNFROZEN = 0,
+  HSA_EXECUTABLE_STATE_FROZEN = 1
+} hsa_executable_state_t;
+typedef enum {
+  HSA_ENDIANNESS_LITTLE = 0,
+  HSA_ENDIANNESS_BIG = 1
+} hsa_endianness_t;
+typedef enum {
+  HSA_MACHINE_MODEL_SMALL = 0,
+  HSA_MACHINE_MODEL_LARGE = 1
+} hsa_machine_model_t;
+typedef enum {
+  HSA_AGENT_INFO_NAME = 0,
+  HSA_AGENT_INFO_VENDOR_NAME = 1,
+  HSA_AGENT_INFO_FEATURE = 2,
+  HSA_AGENT_INFO_MACHINE_MODEL = 3,
+  HSA_AGENT_INFO_PROFILE = 4,
+  HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5,
+  HSA_AGENT_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES = 23,
+  HSA_AGENT_INFO_FAST_F16_OPERATION = 24,
+  HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,
+  HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,
+  HSA_AGENT_INFO_WORKGROUP_MAX_SIZE = 8,
+  HSA_AGENT_INFO_GRID_MAX_DIM = 9,
+  HSA_AGENT_INFO_GRID_MAX_SIZE = 10,
+  HSA_AGENT_INFO_FBARRIER_MAX_SIZE = 11,
+  HSA_AGENT_INFO_QUEUES_MAX = 12,
+  HSA_AGENT_INFO_QUEUE_MIN_SIZE = 13,
+  HSA_AGENT_INFO_QUEUE_MAX_SIZE = 14,
+  HSA_AGENT_INFO_QUEUE_TYPE = 15,
+  HSA_AGENT_INFO_NODE = 16,
+  HSA_AGENT_INFO_DEVICE = 17,
+  HSA_AGENT_INFO_CACHE_SIZE = 18,
+  HSA_AGENT_INFO_ISA = 19,
+  HSA_AGENT_INFO_EXTENSIONS = 20,
+  HSA_AGENT_INFO_VERSION_MAJOR = 21,
+  HSA_AGENT_INFO_VERSION_MINOR = 22
+} hsa_agent_info_t;
+typedef struct hsa_barrier_and_packet_s {
+  uint16_t header;
+  uint16_t reserved0;
+  uint32_t reserved1;
+  hsa_signal_t dep_signal[5];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_barrier_and_packet_t;
+typedef struct hsa_dim3_s {
+  uint32_t x;
+  uint32_t y;
+  uint32_t z;
+} hsa_dim3_t;
+typedef enum {
+  HSA_ACCESS_PERMISSION_RO = 1,
+  HSA_ACCESS_PERMISSION_WO = 2,
+  HSA_ACCESS_PERMISSION_RW = 3
+} hsa_access_permission_t;
+typedef enum {
+  HSA_AGENT_FEATURE_KERNEL_DISPATCH = 1,
+  HSA_AGENT_FEATURE_AGENT_DISPATCH = 2
+} hsa_agent_feature_t;
+typedef enum {
+  HSA_WAIT_STATE_BLOCKED = 0,
+  HSA_WAIT_STATE_ACTIVE = 1
+} hsa_wait_state_t;
+typedef struct hsa_executable_s { uint64_t handle; } hsa_executable_t;
+typedef enum {
+  HSA_REGION_SEGMENT_GLOBAL = 0,
+  HSA_REGION_SEGMENT_READONLY = 1,
+  HSA_REGION_SEGMENT_PRIVATE = 2,
+  HSA_REGION_SEGMENT_GROUP = 3
+} hsa_region_segment_t;
+typedef enum {
+  HSA_REGION_INFO_SEGMENT = 0,
+  HSA_REGION_INFO_GLOBAL_FLAGS = 1,
+  HSA_REGION_INFO_SIZE = 2,
+  HSA_REGION_INFO_ALLOC_MAX_SIZE = 4,
+  HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED = 5,
+  HSA_REGION_INFO_RUNTIME_ALLOC_GRANULE = 6,
+  HSA_REGION_INFO_RUNTIME_ALLOC_ALIGNMENT = 7
+} hsa_region_info_t;
+typedef enum {
+  HSA_ISA_INFO_NAME_LENGTH = 0,
+  HSA_ISA_INFO_NAME = 1,
+  HSA_ISA_INFO_CALL_CONVENTION_COUNT = 2,
+  HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONT_SIZE = 3,
+  HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONTS_PER_COMPUTE_UNIT = 4
+} hsa_isa_info_t;
+typedef enum {
+  HSA_VARIABLE_SEGMENT_GLOBAL = 0,
+  HSA_VARIABLE_SEGMENT_READONLY = 1
+} hsa_variable_segment_t;
+typedef struct hsa_callback_data_s { uint64_t handle; } hsa_callback_data_t;
+typedef enum {
+  HSA_SYMBOL_KIND_VARIABLE = 0,
+  HSA_SYMBOL_KIND_KERNEL = 1,
+  HSA_SYMBOL_KIND_INDIRECT_FUNCTION = 2
+} hsa_symbol_kind_t;
+typedef struct hsa_kernel_dispatch_packet_s {
+  uint16_t header;
+  uint16_t setup;
+  uint16_t workgroup_size_x;
+  uint16_t workgroup_size_y;
+  uint16_t workgroup_size_z;
+  uint16_t reserved0;
+  uint32_t grid_size_x;
+  uint32_t grid_size_y;
+  uint32_t grid_size_z;
+  uint32_t private_segment_size;
+  uint32_t group_segment_size;
+  uint64_t kernel_object;
+
+#ifdef HSA_LARGE_MODEL
+  void *kernarg_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *kernarg_address;
+  uint32_t reserved1;
+#else
+  uint32_t reserved1;
+  void *kernarg_address;
+#endif
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_kernel_dispatch_packet_t;
+typedef enum {
+  HSA_PACKET_TYPE_VENDOR_SPECIFIC = 0,
+  HSA_PACKET_TYPE_INVALID = 1,
+  HSA_PACKET_TYPE_KERNEL_DISPATCH = 2,
+  HSA_PACKET_TYPE_BARRIER_AND = 3,
+  HSA_PACKET_TYPE_AGENT_DISPATCH = 4,
+  HSA_PACKET_TYPE_BARRIER_OR = 5
+} hsa_packet_type_t;
+typedef enum {
+  HSA_PACKET_HEADER_TYPE = 0,
+  HSA_PACKET_HEADER_BARRIER = 8,
+  HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE = 9,
+  HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE = 11
+} hsa_packet_header_t;
+typedef struct hsa_isa_s { uint64_t handle; } hsa_isa_t;
+typedef enum {
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT = 0,
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO = 1,
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR = 2
+} hsa_default_float_rounding_mode_t;
+typedef struct hsa_code_symbol_s { uint64_t handle; } hsa_code_symbol_t;
+typedef struct hsa_executable_symbol_s {
+  uint64_t handle;
+} hsa_executable_symbol_t;
+#ifdef HSA_LARGE_MODEL
+typedef int64_t hsa_signal_value_t;
+#else
+typedef int32_t hsa_signal_value_t;
+#endif
+typedef enum {
+  HSA_EXCEPTION_POLICY_BREAK = 1,
+  HSA_EXCEPTION_POLICY_DETECT = 2
+} hsa_exception_policy_t;
+typedef enum {
+  HSA_SYSTEM_INFO_VERSION_MAJOR = 0,
+  HSA_SYSTEM_INFO_VERSION_MINOR = 1,
+  HSA_SYSTEM_INFO_TIMESTAMP = 2,
+  HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY = 3,
+  HSA_SYSTEM_INFO_SIGNAL_MAX_WAIT = 4,
+  HSA_SYSTEM_INFO_ENDIANNESS = 5,
+  HSA_SYSTEM_INFO_MACHINE_MODEL = 6,
+  HSA_SYSTEM_INFO_EXTENSIONS = 7
+} hsa_system_info_t;
+typedef enum {
+  HSA_EXECUTABLE_INFO_PROFILE = 1,
+  HSA_EXECUTABLE_INFO_STATE = 2
+} hsa_executable_info_t;
+typedef enum {
+  HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS = 0
+} hsa_kernel_dispatch_packet_setup_t;
+typedef enum {
+  HSA_PACKET_HEADER_WIDTH_TYPE = 8,
+  HSA_PACKET_HEADER_WIDTH_BARRIER = 1,
+  HSA_PACKET_HEADER_WIDTH_ACQUIRE_FENCE_SCOPE = 2,
+  HSA_PACKET_HEADER_WIDTH_RELEASE_FENCE_SCOPE = 2
+} hsa_packet_header_width_t;
+typedef enum {
+  HSA_CODE_OBJECT_INFO_VERSION = 0,
+  HSA_CODE_OBJECT_INFO_TYPE = 1,
+  HSA_CODE_OBJECT_INFO_ISA = 2,
+  HSA_CODE_OBJECT_INFO_MACHINE_MODEL = 3,
+  HSA_CODE_OBJECT_INFO_PROFILE = 4,
+  HSA_CODE_OBJECT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5
+} hsa_code_object_info_t;
+typedef struct hsa_barrier_or_packet_s {
+  uint16_t header;
+  uint16_t reserved0;
+  uint32_t reserved1;
+  hsa_signal_t dep_signal[5];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_barrier_or_packet_t;
+typedef enum {
+  HSA_SYMBOL_KIND_LINKAGE_MODULE = 0,
+  HSA_SYMBOL_KIND_LINKAGE_PROGRAM = 1,
+} hsa_symbol_kind_linkage_t;
+hsa_status_t hsa_executable_validate(hsa_executable_t executable,
+                                     uint32_t *result);
+uint64_t hsa_queue_add_write_index_acq_rel(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_acquire(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_relaxed(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_release(const hsa_queue_t *queue,
+                                           uint64_t value);
+hsa_status_t hsa_shut_down();
+void hsa_signal_add_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_executable_readonly_variable_define(
+    hsa_executable_t executable, hsa_agent_t agent, const char *variable_name,
+    void *address);
+hsa_status_t hsa_agent_extension_supported(uint16_t extension,
+                                           hsa_agent_t agent,
+                                           uint16_t version_major,
+                                           uint16_t version_minor,
+                                           bool *result);
+hsa_signal_value_t hsa_signal_load_acquire(hsa_signal_t signal);
+
+hsa_signal_value_t hsa_signal_load_relaxed(hsa_signal_t signal);
+hsa_status_t hsa_executable_get_info(hsa_executable_t executable,
+                                     hsa_executable_info_t attribute,
+                                     void *value);
+hsa_status_t hsa_iterate_agents(hsa_status_t (*callback)(hsa_agent_t agent,
+                                                         void *data),
+                                void *data);
+void hsa_signal_subtract_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t
+hsa_executable_symbol_get_info(hsa_executable_symbol_t executable_symbol,
+                               hsa_executable_symbol_info_t attribute,
+                               void *value);
+void hsa_signal_xor_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_code_object_get_info(hsa_code_object_t code_object,
+                                      hsa_code_object_info_t attribute,
+                                      void *value);
+hsa_status_t hsa_code_object_deserialize(void *serialized_code_object,
+                                         size_t serialized_code_object_size,
+                                         const char *options,
+                                         hsa_code_object_t *code_object);
+hsa_status_t hsa_status_string(hsa_status_t status, const char **status_string);
+hsa_status_t hsa_code_object_get_symbol(hsa_code_object_t code_object,
+                                        const char *symbol_name,
+                                        hsa_code_symbol_t *symbol);
+void hsa_signal_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_store_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_signal_destroy(hsa_signal_t signal);
+hsa_status_t hsa_system_get_extension_table(uint16_t extension,
+                                            uint16_t version_major,
+                                            uint16_t version_minor,
+                                            void *table);
+hsa_status_t hsa_agent_iterate_regions(
+    hsa_agent_t agent,
+    hsa_status_t (*callback)(hsa_region_t region, void *data), void *data);
+hsa_status_t hsa_executable_agent_global_variable_define(
+    hsa_executable_t executable, hsa_agent_t agent, const char *variable_name,
+    void *address);
+hsa_status_t hsa_queue_create(hsa_agent_t agent, uint32_t size,
+                              hsa_queue_type_t type,
+                              void (*callback)(hsa_status_t status,
+                                               hsa_queue_t *source, void *data),
+                              void *data, uint32_t private_segment_size,
+                              uint32_t group_segment_size, hsa_queue_t **queue);
+hsa_status_t hsa_isa_compatible(hsa_isa_t code_object_isa, hsa_isa_t agent_isa,
+                                bool *result);
+hsa_status_t hsa_code_object_serialize(
+    hsa_code_object_t code_object,
+    hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data,
+                                   void **address),
+    hsa_callback_data_t callback_data, const char *options,
+    void **serialized_code_object, size_t *serialized_code_object_size);
+hsa_status_t hsa_region_get_info(hsa_region_t region,
+                                 hsa_region_info_t attribute, void *value);
+hsa_status_t hsa_executable_freeze(hsa_extension_t executable,
+                                   const char *options);
+hsa_status_t hsa_system_extension_supported(uint16_t extension,
+                                            uint16_t version_major,
+                                            uint16_t version_minor,
+                                            bool *result);
+hsa_signal_value_t hsa_signal_wait_acquire(hsa_signal_t signal,
+                                           hsa_signal_condition_t condition,
+                                           hsa_signal_value_t compare_value,
+                                           uint64_t timeout_hint,
+                                           hsa_wait_state_t wait_state_hint);
+
+hsa_signal_value_t hsa_signal_wait_relaxed(hsa_signal_t signal,
+                                           hsa_signal_condition_t condition,
+                                           hsa_signal_value_t compare_value,
+                                           uint64_t timeout_hint,
+                                           hsa_wait_state_t wait_state_hint);
+hsa_status_t hsa_memory_copy(void *dst, const void *src, size_t size);
+hsa_status_t hsa_memory_free(void *ptr);
+hsa_status_t hsa_queue_destroy(hsa_queue_t *queue);
+hsa_status_t hsa_isa_from_name(const char *name, hsa_isa_t *isa);
+hsa_status_t hsa_isa_get_info(hsa_isa_t isa, hsa_isa_info_t attribute,
+                              uint32_t index, void *value);
+hsa_status_t hsa_signal_create(hsa_signal_value_t initial_value,
+                               uint32_t num_consumers,
+                               const hsa_agent_t *consumers,
+                               hsa_signal_t *signal);
+hsa_status_t hsa_code_symbol_get_info(hsa_code_symbol_t code_symbol,
+                                      hsa_code_symbol_info_t attribute,
+                                      void *value);
+hsa_signal_value_t hsa_signal_cas_acq_rel(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_acquire(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_relaxed(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_release(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+hsa_status_t hsa_code_object_iterate_symbols(
+    hsa_code_object_t code_object,
+    hsa_status_t (*callback)(hsa_code_object_t code_object,
+                             hsa_code_symbol_t symbol, void *data),
+    void *data);
+void hsa_queue_store_read_index_relaxed(const hsa_queue_t *queue,
+                                        uint64_t value);
+
+void hsa_queue_store_read_index_release(const hsa_queue_t *queue,
+                                        uint64_t value);
+hsa_status_t hsa_memory_assign_agent(void *ptr, hsa_agent_t agent,
+                                     hsa_access_permission_t access);
+hsa_status_t hsa_queue_inactivate(hsa_queue_t *queue);
+hsa_status_t hsa_executable_get_symbol(hsa_executable_t executable,
+                                       const char *module_name,
+                                       const char *symbol_name,
+                                       hsa_agent_t agent,
+                                       int32_t call_convention,
+                                       hsa_executable_symbol_t *symbol);
+uint64_t hsa_queue_cas_write_index_acq_rel(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_acquire(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_relaxed(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_release(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+void hsa_signal_and_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_release(hsa_signal_t signal, hsa_signal_value_t value);
+uint64_t hsa_queue_load_read_index_acquire(const hsa_queue_t *queue);
+
+uint64_t hsa_queue_load_read_index_relaxed(const hsa_queue_t *queue);
+hsa_status_t hsa_executable_load_code_object(hsa_executable_t executable,
+                                             hsa_agent_t agent,
+                                             hsa_code_object_t code_object,
+                                             const char *options);
+uint64_t hsa_queue_load_write_index_acquire(const hsa_queue_t *queue);
+
+uint64_t hsa_queue_load_write_index_relaxed(const hsa_queue_t *queue);
+hsa_status_t hsa_agent_get_exception_policies(hsa_agent_t agent,
+                                              hsa_profile_t profile,
+                                              uint16_t *mask);
+hsa_status_t hsa_memory_deregister(void *ptr, size_t size);
+void hsa_signal_or_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_soft_queue_create(hsa_region_t region, uint32_t size,
+                                   hsa_queue_type_t type, uint32_t features,
+                                   hsa_signal_t doorbell_signal,
+                                   hsa_queue_t **queue);
+hsa_status_t hsa_executable_iterate_symbols(
+    hsa_executable_t executable,
+    hsa_status_t (*callback)(hsa_executable_t executable,
+                             hsa_executable_symbol_t symbol, void *data),
+    void *data);
+hsa_status_t hsa_memory_register(void *ptr, size_t size);
+void hsa_queue_store_write_index_relaxed(const hsa_queue_t *queue,
+                                         uint64_t value);
+
+void hsa_queue_store_write_index_release(const hsa_queue_t *queue,
+                                         uint64_t value);
+hsa_status_t hsa_executable_global_variable_define(hsa_executable_t executable,
+                                                   const char *variable_name,
+                                                   void *address);
+hsa_status_t hsa_executable_destroy(hsa_executable_t executable);
+hsa_status_t hsa_code_object_destroy(hsa_code_object_t code_object);
+hsa_status_t hsa_memory_allocate(hsa_region_t region, size_t size, void **ptr);
+hsa_signal_value_t hsa_signal_exchange_acq_rel(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_acquire(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_relaxed(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_release(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+hsa_status_t hsa_agent_get_info(hsa_agent_t agent, hsa_agent_info_t attribute,
+                                void *value);
+hsa_status_t hsa_init();
+hsa_status_t hsa_system_get_info(hsa_system_info_t attribute, void *value);
+hsa_status_t hsa_executable_create(hsa_profile_t profile,
+                                   hsa_executable_state_t executable_state,
+                                   const char *options,
+                                   hsa_executable_t *executable);
+
+#endif /* _HSA_H */
diff --git a/libgomp/plugin/hsa_ext_finalize.h b/libgomp/plugin/hsa_ext_finalize.h
new file mode 100644
index 0000000..f159add
--- /dev/null
+++ b/libgomp/plugin/hsa_ext_finalize.h
@@ -0,0 +1,265 @@
+/* HSA Extensions API 1.0.1 representation description.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.
+
+The contents of the file was created by extracting data structures, enum,
+typedef and other definitions from HSA Runtime Programmer’s Reference Manual
+Version 1.0 (http://www.hsafoundation.com/standards/).
+
+HTML version is provided on the following link:
+http://www.hsafoundation.com/html/Content/Runtime/Topics/Runtime_title_page.htm
+*/
+
+
+#ifndef _HSA_EXT_FINALIZE_H
+#define _HSA_EXT_FINALIZE_H 1
+
+struct BrigModuleHeader;
+typedef struct BrigModuleHeader *BrigModule_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_GEOMETRY_1D = 0,
+  HSA_EXT_IMAGE_GEOMETRY_2D = 1,
+  HSA_EXT_IMAGE_GEOMETRY_3D = 2,
+  HSA_EXT_IMAGE_GEOMETRY_1DA = 3,
+  HSA_EXT_IMAGE_GEOMETRY_2DA = 4,
+  HSA_EXT_IMAGE_GEOMETRY_1DB = 5,
+  HSA_EXT_IMAGE_GEOMETRY_2DDEPTH = 6,
+  HSA_EXT_IMAGE_GEOMETRY_2DADEPTH = 7
+} hsa_ext_image_geometry_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 = 7,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT = 15
+} hsa_ext_image_channel_type_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_CHANNEL_ORDER_A = 0,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_R = 1,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RX = 2,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RG = 3,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGX = 4,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RA = 5,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGB = 6,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX = 7,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA = 8,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA = 9,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB = 10,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR = 11,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB = 12,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX = 13,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA = 14,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA = 15,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY = 16,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE = 17,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH = 18,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19
+} hsa_ext_image_channel_order_t;
+
+typedef struct hsa_ext_image_format_s
+{
+  hsa_ext_image_channel_type_t channel_type;
+  hsa_ext_image_channel_order_t channel_order;
+} hsa_ext_image_format_t;
+
+typedef struct hsa_ext_sampler_s
+{
+  uint64_t handle;
+} hsa_ext_sampler_t;
+typedef struct hsa_ext_image_data_info_s
+{
+  size_t size;
+  size_t alignment;
+} hsa_ext_image_data_info_t;
+typedef enum {
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED = 0,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE = 1,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER = 2,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT = 3,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT = 4
+} hsa_ext_sampler_addressing_mode_t;
+typedef struct hsa_ext_image_s
+{
+  uint64_t handle;
+} hsa_ext_image_t;
+typedef enum {
+  HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED = 0x0,
+  HSA_EXT_IMAGE_CAPABILITY_READ_ONLY = 0x1,
+  HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY = 0x2,
+  HSA_EXT_IMAGE_CAPABILITY_READ_WRITE = 0x4,
+  HSA_EXT_IMAGE_CAPABILITY_READ_MODIFY_WRITE = 0x8,
+  HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT = 0x10
+} hsa_ext_image_capability_t;
+typedef struct hsa_ext_control_directives_s
+{
+  uint64_t control_directives_mask;
+  uint16_t break_exceptions_mask;
+  uint16_t detect_exceptions_mask;
+  uint32_t max_dynamic_group_size;
+  uint64_t max_flat_grid_size;
+  uint32_t max_flat_workgroup_size;
+  uint32_t reserved1;
+  uint64_t required_grid_size[3];
+  hsa_dim3_t required_workgroup_size;
+  uint8_t required_dim;
+  uint8_t reserved2[75];
+} hsa_ext_control_directives_t;
+typedef enum {
+  HSA_EXT_SAMPLER_FILTER_MODE_NEAREST = 0,
+  HSA_EXT_SAMPLER_FILTER_MODE_LINEAR = 1
+} hsa_ext_sampler_filter_mode_t;
+
+typedef enum {
+  HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED = 0,
+  HSA_EXT_SAMPLER_COORDINATE_MODE_NORMALIZED = 1
+} hsa_ext_sampler_coordinate_mode_t;
+typedef enum {
+  HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO = -1
+} hsa_ext_finalizer_call_convention_t;
+typedef struct hsa_ext_program_s
+{
+  uint64_t handle;
+} hsa_ext_program_t;
+typedef struct hsa_ext_image_descriptor_s
+{
+  hsa_ext_image_geometry_t geometry;
+  size_t width;
+  size_t height;
+  size_t depth;
+  size_t array_size;
+  hsa_ext_image_format_t format;
+} hsa_ext_image_descriptor_t;
+typedef enum {
+  HSA_EXT_PROGRAM_INFO_MACHINE_MODEL = 0,
+  HSA_EXT_PROGRAM_INFO_PROFILE = 1,
+  HSA_EXT_PROGRAM_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 2
+} hsa_ext_program_info_t;
+typedef BrigModule_t hsa_ext_module_t;
+typedef struct hsa_ext_sampler_descriptor_s
+{
+  hsa_ext_sampler_coordinate_mode_t coordinate_mode;
+  hsa_ext_sampler_filter_mode_t filter_mode;
+  hsa_ext_sampler_addressing_mode_t address_mode;
+} hsa_ext_sampler_descriptor_t;
+
+typedef struct hsa_ext_image_region_s
+{
+  hsa_dim3_t offset;
+  hsa_dim3_t range;
+} hsa_ext_image_region_t;
+hsa_status_t hsa_ext_image_export (hsa_agent_t agent, hsa_ext_image_t src_image,
+				   void *dst_memory, size_t dst_row_pitch,
+				   size_t dst_slice_pitch,
+				   const hsa_ext_image_region_t *image_region);
+hsa_status_t hsa_ext_program_add_module (hsa_ext_program_t program,
+					 hsa_ext_module_t module);
+hsa_status_t hsa_ext_program_iterate_modules (
+  hsa_ext_program_t program,
+  hsa_status_t (*callback) (hsa_ext_program_t program, hsa_ext_module_t module,
+			    void *data),
+  void *data);
+hsa_status_t hsa_ext_program_create (
+  hsa_machine_model_t machine_model, hsa_profile_t profile,
+  hsa_default_float_rounding_mode_t default_float_rounding_mode,
+  const char *options, hsa_ext_program_t *program);
+hsa_status_t
+hsa_ext_image_data_get_info (hsa_agent_t agent,
+			     const hsa_ext_image_descriptor_t *image_descriptor,
+			     hsa_access_permission_t access_permission,
+			     hsa_ext_image_data_info_t *image_data_info);
+
+hsa_status_t hsa_ext_image_import (hsa_agent_t agent, const void *src_memory,
+				   size_t src_row_pitch, size_t src_slice_pitch,
+				   hsa_ext_image_t dst_image,
+				   const hsa_ext_image_region_t *image_region);
+hsa_status_t hsa_ext_program_get_info (hsa_ext_program_t program,
+				       hsa_ext_program_info_t attribute,
+				       void *value);
+enum
+{
+  HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED = 0x3000,
+  HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED = 0x3001
+};
+hsa_status_t hsa_ext_image_destroy (hsa_agent_t agent, hsa_ext_image_t image);
+hsa_status_t hsa_ext_image_get_capability (
+  hsa_agent_t agent, hsa_ext_image_geometry_t geometry,
+  const hsa_ext_image_format_t *image_format, uint32_t *capability_mask);
+enum
+{
+  HSA_EXT_STATUS_ERROR_INVALID_PROGRAM = 0x2000,
+  HSA_EXT_STATUS_ERROR_INVALID_MODULE = 0x2001,
+  HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE = 0x2002,
+  HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED = 0x2003,
+  HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH = 0x2004,
+  HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED = 0x2005,
+  HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH = 0x2006
+};
+hsa_status_t hsa_ext_sampler_destroy (hsa_agent_t agent,
+				      hsa_ext_sampler_t sampler);
+hsa_status_t hsa_ext_program_finalize (
+  hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention,
+  hsa_ext_control_directives_t control_directives, const char *options,
+  hsa_code_object_type_t code_object_type, hsa_code_object_t *code_object);
+hsa_status_t hsa_ext_image_create (
+  hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor,
+  const void *image_data, hsa_access_permission_t access_permission,
+  hsa_ext_image_t *image);
+hsa_status_t hsa_ext_program_destroy (hsa_ext_program_t program);
+hsa_status_t hsa_ext_image_copy (hsa_agent_t agent, hsa_ext_image_t src_image,
+				 const hsa_dim3_t *src_offset,
+				 hsa_ext_image_t dst_image,
+				 const hsa_dim3_t *dst_offset,
+				 const hsa_dim3_t *range);
+hsa_status_t hsa_ext_image_clear (hsa_agent_t agent, hsa_ext_image_t image,
+				  const void *data,
+				  const hsa_ext_image_region_t *image_region);
+enum
+{
+  HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS = 0x3000,
+  HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS = 0x3001,
+  HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS = 0x3002,
+  HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS = 0x3003,
+  HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS = 0x3004,
+  HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS = 0x3005,
+  HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS = 0x3006,
+  HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS = 0x3007,
+  HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS = 0x3008,
+  HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES = 0x3009,
+  HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES = 0x300A,
+  HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS = 0x300B
+};
+hsa_status_t
+hsa_ext_sampler_create (hsa_agent_t agent,
+			const hsa_ext_sampler_descriptor_t *sampler_descriptor,
+			hsa_ext_sampler_t *sampler);
+
+#endif /* _HSA_EXT_FINALIZE_H */
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index bed8555..ef7a202 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -27,16 +27,103 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "config.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <pthread.h>
+#include <inttypes.h>
+#include <stdbool.h>
 #include <hsa.h>
 #include <hsa_ext_finalize.h>
 #include <dlfcn.h>
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
 
+/* As an HSA runtime is dlopened, following structure defines function
+   pointers utilized by the HSA plug-in.  */
+
+struct hsa_runtime_fn_info
+{
+  /* HSA runtime.  */
+  hsa_status_t (*hsa_status_string_fn) (hsa_status_t status,
+					const char **status_string);
+  hsa_status_t (*hsa_agent_get_info_fn) (hsa_agent_t agent,
+					 hsa_agent_info_t attribute,
+					 void *value);
+  hsa_status_t (*hsa_init_fn) (void);
+  hsa_status_t (*hsa_iterate_agents_fn)
+    (hsa_status_t (*callback)(hsa_agent_t agent, void *data), void *data);
+  hsa_status_t (*hsa_region_get_info_fn) (hsa_region_t region,
+					  hsa_region_info_t attribute,
+					  void *value);
+  hsa_status_t (*hsa_queue_create_fn)
+    (hsa_agent_t agent, uint32_t size, hsa_queue_type_t type,
+     void (*callback)(hsa_status_t status, hsa_queue_t *source, void *data),
+     void *data, uint32_t private_segment_size,
+     uint32_t group_segment_size, hsa_queue_t **queue);
+  hsa_status_t (*hsa_agent_iterate_regions_fn)
+    (hsa_agent_t agent,
+     hsa_status_t (*callback)(hsa_region_t region, void *data), void *data);
+  hsa_status_t (*hsa_executable_destroy_fn) (hsa_executable_t executable);
+  hsa_status_t (*hsa_executable_create_fn)
+    (hsa_profile_t profile, hsa_executable_state_t executable_state,
+     const char *options, hsa_executable_t *executable);
+  hsa_status_t (*hsa_executable_global_variable_define_fn)
+    (hsa_executable_t executable, const char *variable_name, void *address);
+  hsa_status_t (*hsa_executable_load_code_object_fn)
+    (hsa_executable_t executable, hsa_agent_t agent,
+     hsa_code_object_t code_object, const char *options);
+  hsa_status_t (*hsa_executable_freeze_fn)(hsa_executable_t executable,
+					   const char *options);
+  hsa_status_t (*hsa_signal_create_fn) (hsa_signal_value_t initial_value,
+					uint32_t num_consumers,
+					const hsa_agent_t *consumers,
+					hsa_signal_t *signal);
+  hsa_status_t (*hsa_memory_allocate_fn) (hsa_region_t region, size_t size,
+					  void **ptr);
+  hsa_status_t (*hsa_memory_free_fn) (void *ptr);
+  hsa_status_t (*hsa_signal_destroy_fn) (hsa_signal_t signal);
+  hsa_status_t (*hsa_executable_get_symbol_fn)
+    (hsa_executable_t executable, const char *module_name,
+     const char *symbol_name, hsa_agent_t agent, int32_t call_convention,
+     hsa_executable_symbol_t *symbol);
+  hsa_status_t (*hsa_executable_symbol_get_info_fn)
+    (hsa_executable_symbol_t executable_symbol,
+     hsa_executable_symbol_info_t attribute, void *value);
+  uint64_t (*hsa_queue_add_write_index_release_fn) (const hsa_queue_t *queue,
+						    uint64_t value);
+  uint64_t (*hsa_queue_load_read_index_acquire_fn) (const hsa_queue_t *queue);
+  void (*hsa_signal_store_relaxed_fn) (hsa_signal_t signal,
+				       hsa_signal_value_t value);
+  void (*hsa_signal_store_release_fn) (hsa_signal_t signal,
+				       hsa_signal_value_t value);
+  hsa_signal_value_t (*hsa_signal_wait_acquire_fn)
+    (hsa_signal_t signal, hsa_signal_condition_t condition,
+     hsa_signal_value_t compare_value, uint64_t timeout_hint,
+     hsa_wait_state_t wait_state_hint);
+  hsa_signal_value_t (*hsa_signal_load_acquire_fn) (hsa_signal_t signal);
+  hsa_status_t (*hsa_queue_destroy_fn) (hsa_queue_t *queue);
+
+  /* HSA finalizer.  */
+  hsa_status_t (*hsa_ext_program_add_module_fn) (hsa_ext_program_t program,
+						 hsa_ext_module_t module);
+  hsa_status_t (*hsa_ext_program_create_fn)
+    (hsa_machine_model_t machine_model, hsa_profile_t profile,
+     hsa_default_float_rounding_mode_t default_float_rounding_mode,
+     const char *options, hsa_ext_program_t *program);
+  hsa_status_t (*hsa_ext_program_destroy_fn) (hsa_ext_program_t program);
+  hsa_status_t (*hsa_ext_program_finalize_fn)
+    (hsa_ext_program_t program,hsa_isa_t isa,
+     int32_t call_convention, hsa_ext_control_directives_t control_directives,
+     const char *options, hsa_code_object_type_t code_object_type,
+     hsa_code_object_t *code_object);
+};
+
+/* HSA runtime functions that are initialized in init_hsa_context.  */
+
+static struct hsa_runtime_fn_info hsa_fns;
+
 /* Keep the following GOMP prefixed structures in sync with respective parts of
    the compiler.  */
 
@@ -129,6 +216,16 @@ static bool debug;
 
 static bool suppress_host_fallback;
 
+/* Flag to locate HSA runtime shared library that is dlopened
+   by this plug-in.  */
+
+static const char *hsa_runtime_lib;
+
+/* Flag to decide if the runtime should support also CPU devices (can be
+   a simulator).  */
+
+static bool support_cpu_devices;
+
 /* Initialize debug and suppress_host_fallback according to the environment.  */
 
 static void
@@ -143,6 +240,12 @@ init_enviroment_variables (void)
     suppress_host_fallback = true;
   else
     suppress_host_fallback = false;
+
+  hsa_runtime_lib = getenv ("HSA_RUNTIME_LIB");
+  if (hsa_runtime_lib == NULL)
+    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+
+  support_cpu_devices = getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
 
 /* Print a logging message with PREFIX to stderr if HSA_DEBUG value
@@ -176,7 +279,7 @@ hsa_warn (const char *str, hsa_status_t status)
     return;
 
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
 
   fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error_msg);
 }
@@ -188,7 +291,7 @@ static void
 hsa_fatal (const char *str, hsa_status_t status)
 {
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
   GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
 		     hsa_error_msg);
 }
@@ -200,7 +303,7 @@ static bool
 hsa_error (const char *str, hsa_status_t status)
 {
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
   GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str,
 		     hsa_error_msg);
   return false;
@@ -359,6 +462,50 @@ struct hsa_context_info
 
 static struct hsa_context_info hsa_context;
 
+#define DLSYM_FN(function) \
+  hsa_fns.function##_fn = dlsym (handle, #function); \
+  if (hsa_fns.function##_fn == NULL) \
+    return false;
+
+static bool
+init_hsa_runtime_functions (void)
+{
+  void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
+  if (handle == NULL)
+    return false;
+
+  DLSYM_FN (hsa_status_string)
+  DLSYM_FN (hsa_agent_get_info)
+  DLSYM_FN (hsa_init)
+  DLSYM_FN (hsa_iterate_agents)
+  DLSYM_FN (hsa_region_get_info)
+  DLSYM_FN (hsa_queue_create)
+  DLSYM_FN (hsa_agent_iterate_regions)
+  DLSYM_FN (hsa_executable_destroy)
+  DLSYM_FN (hsa_executable_create)
+  DLSYM_FN (hsa_executable_global_variable_define)
+  DLSYM_FN (hsa_executable_load_code_object)
+  DLSYM_FN (hsa_executable_freeze)
+  DLSYM_FN (hsa_signal_create)
+  DLSYM_FN (hsa_memory_allocate)
+  DLSYM_FN (hsa_memory_free)
+  DLSYM_FN (hsa_signal_destroy)
+  DLSYM_FN (hsa_executable_get_symbol)
+  DLSYM_FN (hsa_executable_symbol_get_info)
+  DLSYM_FN (hsa_queue_add_write_index_release)
+  DLSYM_FN (hsa_queue_load_read_index_acquire)
+  DLSYM_FN (hsa_signal_wait_acquire)
+  DLSYM_FN (hsa_signal_store_relaxed)
+  DLSYM_FN (hsa_signal_store_release)
+  DLSYM_FN (hsa_signal_load_acquire)
+  DLSYM_FN (hsa_queue_destroy)
+  DLSYM_FN (hsa_ext_program_add_module)
+  DLSYM_FN (hsa_ext_program_create)
+  DLSYM_FN (hsa_ext_program_destroy)
+  DLSYM_FN (hsa_ext_program_finalize)
+  return true;
+}
+
 /* Find kernel for an AGENT by name provided in KERNEL_NAME.  */
 
 static struct kernel_info *
@@ -386,17 +533,32 @@ suitable_hsa_agent_p (hsa_agent_t agent)
 {
   hsa_device_type_t device_type;
   hsa_status_t status
-    = hsa_agent_get_info (agent, HSA_AGENT_INFO_DEVICE, &device_type);
-  if (status != HSA_STATUS_SUCCESS || device_type != HSA_DEVICE_TYPE_GPU)
+    = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_DEVICE,
+				     &device_type);
+  if (status != HSA_STATUS_SUCCESS)
     return false;
 
+  switch (device_type)
+    {
+    case HSA_DEVICE_TYPE_GPU:
+      break;
+    case HSA_DEVICE_TYPE_CPU:
+      if (!support_cpu_devices)
+	return false;
+      break;
+    default:
+      return false;
+    }
+
   uint32_t features = 0;
-  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_FEATURE, &features);
+  status = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_FEATURE,
+					  &features);
   if (status != HSA_STATUS_SUCCESS
       || !(features & HSA_AGENT_FEATURE_KERNEL_DISPATCH))
     return false;
   hsa_queue_type_t queue_type;
-  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_QUEUE_TYPE, &queue_type);
+  status = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_QUEUE_TYPE,
+					  &queue_type);
   if (status != HSA_STATUS_SUCCESS
       || (queue_type != HSA_QUEUE_TYPE_MULTI))
     return false;
@@ -443,11 +605,16 @@ init_hsa_context (void)
   if (hsa_context.initialized)
     return true;
   init_enviroment_variables ();
-  status = hsa_init ();
+  if (!init_hsa_runtime_functions ())
+    {
+      HSA_DEBUG ("Run-time could not be dynamically opened\n");
+      return false;
+    }
+  status = hsa_fns.hsa_init_fn ();
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Run-time could not be initialized", status);
   HSA_DEBUG ("HSA run-time initialized\n");
-  status = hsa_iterate_agents (count_gpu_agents, NULL);
+  status = hsa_fns.hsa_iterate_agents_fn (count_gpu_agents, NULL);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("HSA GPU devices could not be enumerated", status);
   HSA_DEBUG ("There are %i HSA GPU devices.\n", hsa_context.agent_count);
@@ -455,7 +622,7 @@ init_hsa_context (void)
   hsa_context.agents
     = GOMP_PLUGIN_malloc_cleared (hsa_context.agent_count
 				  * sizeof (struct agent_info));
-  status = hsa_iterate_agents (assign_agent_ids, &agent_index);
+  status = hsa_fns.hsa_iterate_agents_fn (assign_agent_ids, &agent_index);
   if (agent_index != hsa_context.agent_count)
     {
       GOMP_PLUGIN_error ("Failed to assign IDs to all HSA agents");
@@ -485,14 +652,16 @@ get_kernarg_memory_region (hsa_region_t region, void *data)
   hsa_status_t status;
   hsa_region_segment_t segment;
 
-  status = hsa_region_get_info (region, HSA_REGION_INFO_SEGMENT, &segment);
+  status = hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SEGMENT,
+					   &segment);
   if (status != HSA_STATUS_SUCCESS)
     return status;
   if (segment != HSA_REGION_SEGMENT_GLOBAL)
     return HSA_STATUS_SUCCESS;
 
   uint32_t flags;
-  status = hsa_region_get_info (region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags);
+  status = hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_GLOBAL_FLAGS,
+					   &flags);
   if (status != HSA_STATUS_SUCCESS)
     return status;
   if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG)
@@ -546,29 +715,36 @@ GOMP_OFFLOAD_init_device (int n)
 
   uint32_t queue_size;
   hsa_status_t status;
-  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_QUEUE_MAX_SIZE,
-			       &queue_size);
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id,
+					  HSA_AGENT_INFO_QUEUE_MAX_SIZE,
+					  &queue_size);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error requesting maximum queue size of the HSA agent",
-		      status);
-  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_ISA, &agent->isa);
+    	   	      status);
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_ISA,
+					  &agent->isa);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error querying the ISA of the agent", status);
-  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
-			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
-			     &agent->command_q);
+  status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
+					HSA_QUEUE_TYPE_MULTI,
+					queue_callback, NULL, UINT32_MAX,
+					UINT32_MAX,
+					&agent->command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error creating command queue", status);
 
-  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
-			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
-			     &agent->kernel_dispatch_command_q);
+  status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
+					HSA_QUEUE_TYPE_MULTI,
+					queue_callback, NULL, UINT32_MAX,
+					UINT32_MAX,
+					&agent->kernel_dispatch_command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error creating kernel dispatch command queue", status);
 
   agent->kernarg_region.handle = (uint64_t) -1;
-  status = hsa_agent_iterate_regions (agent->id, get_kernarg_memory_region,
-				      &agent->kernarg_region);
+  status = hsa_fns.hsa_agent_iterate_regions_fn (agent->id,
+						 get_kernarg_memory_region,
+						 &agent->kernarg_region);
   if (agent->kernarg_region.handle == (uint64_t) -1)
     {
       GOMP_PLUGIN_error ("Could not find suitable memory region for kernel "
@@ -646,7 +822,7 @@ destroy_hsa_program (struct agent_info *agent)
 
   HSA_DEBUG ("Destroying the current HSA program.\n");
 
-  status = hsa_executable_destroy (agent->executable);
+  status = hsa_fns.hsa_executable_destroy_fn (agent->executable);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Could not destroy HSA executable", status);
 
@@ -661,6 +837,29 @@ destroy_hsa_program (struct agent_info *agent)
   return true;
 }
 
+/* Initialize KERNEL from D and other parameters.  Return true on success. */
+
+static bool
+init_basic_kernel_info (struct kernel_info *kernel,
+			struct hsa_kernel_description *d,
+			struct agent_info *agent,
+			struct module_info *module)
+{
+  kernel->agent = agent;
+  kernel->module = module;
+  kernel->name = d->name;
+  kernel->omp_data_size = d->omp_data_size;
+  kernel->gridified_kernel_p = d->gridified_kernel_p;
+  kernel->dependencies_count = d->kernel_dependencies_count;
+  kernel->dependencies = d->kernel_dependencies;
+  if (pthread_mutex_init (&kernel->init_mutex, NULL))
+    {
+      GOMP_PLUGIN_error ("Failed to initialize an HSA kernel mutex");
+      return false;
+    }
+  return true;
+}
+
 /* Part of the libgomp plugin interface.  Load BRIG module described by struct
    brig_image_desc in TARGET_DATA and return references to kernel descriptors
    in TARGET_TABLE.  */
@@ -715,19 +914,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, void *target_data,
       pair->end = (uintptr_t) (kernel + 1);
 
       struct hsa_kernel_description *d = &image_desc->kernel_infos[i];
-      kernel->agent = agent;
-      kernel->module = module;
-      kernel->name = d->name;
-      kernel->omp_data_size = d->omp_data_size;
-      kernel->gridified_kernel_p = d->gridified_kernel_p;
-      kernel->dependencies_count = d->kernel_dependencies_count;
-      kernel->dependencies = d->kernel_dependencies;
-      if (pthread_mutex_init (&kernel->init_mutex, NULL))
-	{
-	  GOMP_PLUGIN_error ("Failed to initialize an HSA kernel mutex");
-	  return -1;
-	}
-
+      if (!init_basic_kernel_info (kernel, d, agent, module))
+	return -1;
       kernel++;
       pair++;
     }
@@ -799,9 +987,10 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   if (agent->prog_finalized)
     goto final;
 
-  status = hsa_ext_program_create (HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL,
-				   HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT,
-				   NULL, &prog_handle);
+  status = hsa_fns.hsa_ext_program_create_fn
+    (HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL,
+     HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT,
+     NULL, &prog_handle);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not create an HSA program", status);
 
@@ -810,8 +999,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   struct module_info *module = agent->first_module;
   while (module)
     {
-      status = hsa_ext_program_add_module (prog_handle,
-					   module->image_desc->brig_module);
+      status = hsa_fns.hsa_ext_program_add_module_fn
+	(prog_handle, module->image_desc->brig_module);
       if (status != HSA_STATUS_SUCCESS)
 	hsa_fatal ("Could not add a module to the HSA program", status);
       module = module->next;
@@ -837,7 +1026,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 	  continue;
 	}
 
-      status = hsa_ext_program_add_module (prog_handle, library->image);
+      status = hsa_fns.hsa_ext_program_add_module_fn (prog_handle,
+						      library->image);
       if (status != HSA_STATUS_SUCCESS)
 	hsa_warn ("Could not add a shared BRIG library the HSA program",
 		  status);
@@ -849,11 +1039,9 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   hsa_ext_control_directives_t control_directives;
   memset (&control_directives, 0, sizeof (control_directives));
   hsa_code_object_t code_object;
-  status = hsa_ext_program_finalize (prog_handle, agent->isa,
-				     HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO,
-				     control_directives, "",
-				     HSA_CODE_OBJECT_TYPE_PROGRAM,
-				     &code_object);
+  status = hsa_fns.hsa_ext_program_finalize_fn
+    (prog_handle, agent->isa,HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO,
+     control_directives, "", HSA_CODE_OBJECT_TYPE_PROGRAM, &code_object);
   if (status != HSA_STATUS_SUCCESS)
     {
       hsa_warn ("Finalization of the HSA program failed", status);
@@ -861,11 +1049,12 @@ create_and_finalize_hsa_program (struct agent_info *agent)
     }
 
   HSA_DEBUG ("Finalization done\n");
-  hsa_ext_program_destroy (prog_handle);
+  hsa_fns.hsa_ext_program_destroy_fn (prog_handle);
 
   status
-    = hsa_executable_create (HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN,
-			     "", &agent->executable);
+    = hsa_fns.hsa_executable_create_fn (HSA_PROFILE_FULL,
+					HSA_EXECUTABLE_STATE_UNFROZEN,
+					"", &agent->executable);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not create HSA executable", status);
 
@@ -877,9 +1066,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 	{
 	  struct global_var_info *var;
 	  var = &module->image_desc->global_variables[i];
-	  status
-	    = hsa_executable_global_variable_define (agent->executable,
-						     var->name, var->address);
+	  status = hsa_fns.hsa_executable_global_variable_define_fn
+	    (agent->executable, var->name, var->address);
 
 	  HSA_DEBUG ("Defining global variable: %s, address: %p\n", var->name,
 		     var->address);
@@ -892,11 +1080,12 @@ create_and_finalize_hsa_program (struct agent_info *agent)
       module = module->next;
     }
 
-  status = hsa_executable_load_code_object (agent->executable, agent->id,
-					    code_object, "");
+  status = hsa_fns.hsa_executable_load_code_object_fn (agent->executable,
+						       agent->id,
+						       code_object, "");
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not add a code object to the HSA executable", status);
-  status = hsa_executable_freeze (agent->executable, "");
+  status = hsa_fns.hsa_executable_freeze_fn (agent->executable, "");
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not freeze the HSA executable", status);
 
@@ -937,7 +1126,7 @@ create_single_kernel_dispatch (struct kernel_info *kernel,
   shadow->object = kernel->object;
 
   hsa_signal_t sync_signal;
-  hsa_status_t status = hsa_signal_create (1, 0, NULL, &sync_signal);
+  hsa_status_t status = hsa_fns.hsa_signal_create_fn (1, 0, NULL, &sync_signal);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Error creating the HSA sync signal", status);
 
@@ -946,8 +1135,9 @@ create_single_kernel_dispatch (struct kernel_info *kernel,
   shadow->group_segment_size = kernel->group_segment_size;
 
   status
-    = hsa_memory_allocate (agent->kernarg_region, kernel->kernarg_segment_size,
-			   &shadow->kernarg_address);
+    = hsa_fns.hsa_memory_allocate_fn (agent->kernarg_region,
+				      kernel->kernarg_segment_size,
+				      &shadow->kernarg_address);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not allocate memory for HSA kernel arguments", status);
 
@@ -962,11 +1152,11 @@ release_kernel_dispatch (struct GOMP_hsa_kernel_dispatch *shadow)
   HSA_DEBUG ("Released kernel dispatch: %p has value: %lu (%p)\n", shadow,
 	     shadow->debug, (void *) shadow->debug);
 
-  hsa_memory_free (shadow->kernarg_address);
+  hsa_fns.hsa_memory_free_fn (shadow->kernarg_address);
 
   hsa_signal_t s;
   s.handle = shadow->signal;
-  hsa_signal_destroy (s);
+  hsa_fns.hsa_signal_destroy_fn (s);
 
   free (shadow->omp_data_memory);
 
@@ -986,31 +1176,30 @@ init_single_kernel (struct kernel_info *kernel, unsigned *max_omp_data_size)
   hsa_status_t status;
   struct agent_info *agent = kernel->agent;
   hsa_executable_symbol_t kernel_symbol;
-  status = hsa_executable_get_symbol (agent->executable, NULL, kernel->name,
-				      agent->id, 0, &kernel_symbol);
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 kernel->name, agent->id,
+						 0, &kernel_symbol);
   if (status != HSA_STATUS_SUCCESS)
     {
       hsa_warn ("Could not find symbol for kernel in the code object", status);
       goto failure;
     }
   HSA_DEBUG ("Located kernel %s\n", kernel->name);
-  status
-    = hsa_executable_symbol_get_info (kernel_symbol,
-				      HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT,
-				      &kernel->object);
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
+    (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel->object);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not extract a kernel object from its symbol", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE,
      &kernel->kernarg_segment_size);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not get info about kernel argument size", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE,
      &kernel->group_segment_size);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not get info about kernel group segment size", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE,
      &kernel->private_segment_size);
   if (status != HSA_STATUS_SUCCESS)
@@ -1209,18 +1398,43 @@ parse_target_attributes (void **input,
   struct GOMP_kernel_launch_attributes *kla;
   kla = (struct GOMP_kernel_launch_attributes *) *input;
   *result = kla;
-  if (kla->ndim != 1)
-    GOMP_PLUGIN_fatal ("HSA does not yet support number of dimensions "
-		       "different from one.");
-  if (kla->gdims[0] == 0)
-    return false;
-
-  HSA_DEBUG ("GOMP_OFFLOAD_run called with grid size %u and group size %u\n",
-	     kla->gdims[0], kla->wdims[0]);
+  if (kla->ndim == 0 || kla->ndim > 3)
+    GOMP_PLUGIN_fatal ("Invalid number of dimensions (%u)", kla->ndim);
 
+  HSA_DEBUG ("GOMP_OFFLOAD_run called with %u dimensions:\n", kla->ndim);
+  unsigned i;
+  for (i = 0; i < kla->ndim; i++)
+    {
+      HSA_DEBUG ("  Dimension %u: grid size %u and group size %u\n", i,
+		 kla->gdims[i], kla->wdims[i]);
+      if (kla->gdims[i] == 0)
+	return false;
+    }
   return true;
 }
 
+/* Return the group size given the requested GROUP size, GRID size and number
+   of grid dimensions NDIM.  */
+
+static uint32_t
+get_group_size (uint32_t ndim, uint32_t grid, uint32_t group)
+{
+  if (group == 0)
+    {
+      /* TODO: Provide a default via environment or device characteristics.  */
+      if (ndim == 1)
+	group = 64;
+      else if (ndim == 2)
+	group = 8;
+      else
+	group = 4;
+    }
+
+  if (group > grid)
+    group = grid;
+  return group;
+}
+
 /* Return true if the HSA runtime can run function FN_PTR.  */
 
 bool
@@ -1254,22 +1468,14 @@ packet_store_release (uint32_t* packet, uint16_t header, uint16_t rest)
   __atomic_store_n (packet, header | (rest << 16), __ATOMIC_RELEASE);
 }
 
-/* Part of the libgomp plugin interface.  Run a kernel on device N and pass it
-   an array of pointers in VARS as a parameter.  The kernel is identified by
-   FN_PTR which must point to a kernel_info structure.  */
+/* Run KERNEL on its agent, pass VARS to it as arguments and take
+   launchattributes from KLA.  */
 
 void
-GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
+run_kernel (struct kernel_info *kernel, void *vars,
+	    struct GOMP_kernel_launch_attributes *kla)
 {
-  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
   struct agent_info *agent = kernel->agent;
-  struct GOMP_kernel_launch_attributes def;
-  struct GOMP_kernel_launch_attributes *kla;
-  if (!parse_target_attributes (args, &def, &kla))
-    {
-      HSA_DEBUG ("Will not run HSA kernel because the grid size is zero\n");
-      return;
-    }
   if (pthread_rwlock_rdlock (&agent->modules_rwlock))
     GOMP_PLUGIN_fatal ("Unable to read-lock an HSA agent rwlock");
 
@@ -1288,11 +1494,12 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
       print_kernel_dispatch (shadow, 2);
     }
 
-  uint64_t index = hsa_queue_add_write_index_release (agent->command_q, 1);
+  uint64_t index
+    = hsa_fns.hsa_queue_add_write_index_release_fn (agent->command_q, 1);
   HSA_DEBUG ("Got AQL index %llu\n", (long long int) index);
 
   /* Wait until the queue is not full before writing the packet.   */
-  while (index - hsa_queue_load_read_index_acquire (agent->command_q)
+  while (index - hsa_fns.hsa_queue_load_read_index_acquire_fn (agent->command_q)
 	 >= agent->command_q->size)
     ;
 
@@ -1302,17 +1509,33 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
 
   memset (((uint8_t *) packet) + 4, 0, sizeof (*packet) - 4);
   packet->grid_size_x = kla->gdims[0];
-  uint32_t wgs = kla->wdims[0];
-  if (wgs == 0)
-    /* TODO: Provide a default via environment.  */
-    wgs = 64;
-  else if (wgs > kla->gdims[0])
-    wgs = kla->gdims[0];
-  packet->workgroup_size_x = wgs;
-  packet->grid_size_y = 1;
-  packet->workgroup_size_y = 1;
-  packet->grid_size_z = 1;
-  packet->workgroup_size_z = 1;
+  packet->workgroup_size_x = get_group_size (kla->ndim, kla->gdims[0],
+					     kla->wdims[0]);
+
+  if (kla->ndim >= 2)
+    {
+      packet->grid_size_y = kla->gdims[1];
+      packet->workgroup_size_y = get_group_size (kla->ndim, kla->gdims[1],
+						 kla->wdims[1]);
+    }
+  else
+    {
+      packet->grid_size_y = 1;
+      packet->workgroup_size_y = 1;
+    }
+
+  if (kla->ndim == 3)
+    {
+      packet->grid_size_z = kla->gdims[2];
+      packet->workgroup_size_z = get_group_size (kla->ndim, kla->gdims[2],
+					     kla->wdims[2]);
+    }
+  else
+    {
+      packet->grid_size_z = 1;
+      packet->workgroup_size_z = 1;
+    }
+
   packet->private_segment_size = kernel->private_segment_size;
   packet->group_segment_size = kernel->group_segment_size;
   packet->kernel_object = kernel->object;
@@ -1320,7 +1543,7 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   hsa_signal_t s;
   s.handle = shadow->signal;
   packet->completion_signal = s;
-  hsa_signal_store_relaxed (s, 1);
+  hsa_fns.hsa_signal_store_relaxed_fn (s, 1);
   memcpy (shadow->kernarg_address, &vars, sizeof (vars));
 
   /* PR hsa/70337.  */
@@ -1344,9 +1567,10 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   HSA_DEBUG ("Going to dispatch kernel %s\n", kernel->name);
 
   packet_store_release ((uint32_t *) packet, header,
-			1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS);
+			(uint16_t) kla->ndim << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS);
 
-  hsa_signal_store_release (agent->command_q->doorbell_signal, index);
+  hsa_fns.hsa_signal_store_release_fn (agent->command_q->doorbell_signal,
+				       index);
 
   /* TODO: GPU agents in Carrizo APUs cannot properly update L2 cache for
      signal wait and signal load operations on their own and we need to
@@ -1357,8 +1581,9 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   HSA_DEBUG ("Kernel dispatched, waiting for completion\n");
 
   /* Root signal waits with 1ms timeout.  */
-  while (hsa_signal_wait_acquire (s, HSA_SIGNAL_CONDITION_LT, 1, 1000 * 1000,
-				  HSA_WAIT_STATE_BLOCKED) != 0)
+  while (hsa_fns.hsa_signal_wait_acquire_fn (s, HSA_SIGNAL_CONDITION_LT, 1,
+					     1000 * 1000,
+					     HSA_WAIT_STATE_BLOCKED) != 0)
     for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
       {
 	hsa_signal_t child_s;
@@ -1366,7 +1591,7 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
 
 	HSA_DEBUG ("Waiting for children completion signal: %lu\n",
 		   shadow->children_dispatches[i]->signal);
-	hsa_signal_load_acquire (child_s);
+	hsa_fns.hsa_signal_load_acquire_fn (child_s);
       }
 
   release_kernel_dispatch (shadow);
@@ -1375,6 +1600,26 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
     GOMP_PLUGIN_fatal ("Unable to unlock an HSA agent rwlock");
 }
 
+/* Part of the libgomp plugin interface.  Run a kernel on device N (the number
+   is actually ignored, we assume the FN_PTR has been mapped using the correct
+   device) and pass it an array of pointers in VARS as a parameter.  The kernel
+   is identified by FN_PTR which must point to a kernel_info structure.  */
+
+void
+GOMP_OFFLOAD_run (int n __attribute__((unused)),
+		  void *fn_ptr, void *vars, void **args)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+  struct GOMP_kernel_launch_attributes def;
+  struct GOMP_kernel_launch_attributes *kla;
+  if (!parse_target_attributes (args, &def, &kla))
+    {
+      HSA_DEBUG ("Will not run HSA kernel because the grid size is zero\n");
+      return;
+    }
+  run_kernel (kernel, vars, kla);
+}
+
 /* Information to be passed to a thread running a kernel asycnronously.  */
 
 struct async_run_info
@@ -1534,10 +1779,10 @@ GOMP_OFFLOAD_fini_device (int n)
 
   release_agent_shared_libraries (agent);
 
-  hsa_status_t status = hsa_queue_destroy (agent->command_q);
+  hsa_status_t status = hsa_fns.hsa_queue_destroy_fn (agent->command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error destroying command queue", status);
-  status = hsa_queue_destroy (agent->kernel_dispatch_command_q);
+  status = hsa_fns.hsa_queue_destroy_fn (agent->kernel_dispatch_command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error destroying kernel dispatch command queue", status);
   if (pthread_mutex_destroy (&agent->prog_mutex))
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 1cb4991..50ec8a7 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -205,13 +205,9 @@ proc libgomp_init { args } {
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
 	global hsa_runtime_lib
-	global hsa_kmt_lib
 	if { $hsa_runtime_lib != "" } {
 	    append always_ld_library_path ":$hsa_runtime_lib"
 	}
-	if { $hsa_kmt_lib != "" } {
-	    append always_ld_library_path ":$hsa_kmt_lib"
-	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 5a724fb..a5250a8 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
 set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
-set hsa_kmt_lib "@HSA_KMT_LIB@"
 
 set offload_targets "@offload_targets@"
-- 
2.10.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2016-11-13 23:20 ` [PATCH 1/4] Remove build dependence on HSA run-time Martin Jambor
@ 2016-11-18 10:23   ` Jakub Jelinek
  2016-11-22 13:27     ` Martin Jambor
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2016-11-18 10:23 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Sun, Nov 13, 2016 at 08:02:41PM +0100, Martin Jambor wrote:
> @@ -143,6 +240,12 @@ init_enviroment_variables (void)
>      suppress_host_fallback = true;
>    else
>      suppress_host_fallback = false;
> +
> +  hsa_runtime_lib = getenv ("HSA_RUNTIME_LIB");
> +  if (hsa_runtime_lib == NULL)
> +    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";

libgomp is very much env var driven, but the above one is IMHO just
too dangerous in suid/sgid apps, allowing one to select a library
of their own choice to dlopen is an instant exploit possibility,
so such env var should be only considered in non-priviledged processes.
It is possible to try dlopen (hsa_runtime_lib) and if that fails, try
dlopen ("libhsa-runtime64.so"), where it would search the library only
in the system paths (note, the dynamic linker handles LD_LIBRARY_PATH,
LD_PRELOAD etc. safely in priviledges processes).

So I'd recommend to use secure_getenv instead.  E.g. see how libgfortran
checks for it in configure and even provides a fallback version for it.
In the HSA plugin case, I think the fallback should be static function
in the plugin.
Otherwise it looks reasonable, thanks for working on that.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] HSA specific built-ins
  2016-11-13 23:20 ` [PATCH 2/4] HSA specific built-ins Martin Jambor
@ 2016-11-18 10:27   ` Jakub Jelinek
  2016-11-22 13:30     ` Martin Jambor
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2016-11-18 10:27 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Sun, Nov 13, 2016 at 08:39:35PM +0100, Martin Jambor wrote:
> Hello,
> 
> this patch adds a small file hsa-builtins.def which defines a few
> builtins that I then use in OpenMP lowering and expansion.
> 
> After we split gridification stuff in omp-low.c to a separate file, we
> should be able to only conditionally include the file and remove the
> weird conditional ifdef.
> 
> OK for trunk?

Does this work well even with lto and jit FEs?  Ok for trunk if it does.

> 2016-11-11  Martin Jambor  <mjambor@suse.cz>
> 
> gcc/
> 	* hsa-builtins.def: New file.
> 	* Makefile.in (BUILTINS_DEF): Add hsa-builtins.def dependency.
> 	* builtins.def: Include hsa-builtins.def.
> 	(DEF_HSA_BUILTIN): New macro.
> 
> fortran/
> 	* f95-lang.c (DEF_HSA_BUILTIN): New macro.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] OpenMP lowering changes from the hsa branch
  2016-11-13 23:20 ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
@ 2016-11-18 10:39   ` Jakub Jelinek
  2016-03-16 14:13     ` [omp] Create openmp -fopt-info optimization group Martin Jambor
  2016-11-22 13:43     ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
  0 siblings, 2 replies; 36+ messages in thread
From: Jakub Jelinek @ 2016-11-18 10:39 UTC (permalink / raw)
  To: Martin Jambor; +Cc: GCC Patches

On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> +  size_t collapse = gimple_omp_for_collapse (for_stmt);
> +  struct omp_for_data_loop *loops
> +    = (struct omp_for_data_loop *)
> +    alloca (gimple_omp_for_collapse (for_stmt)
> +	    * sizeof (struct omp_for_data_loop));

Use
  struct omp_for_data_loop *loops
    = XALLOCAVEC (struct omp_for_data_loop,
		  gimple_omp_for_collapse (for_stmt));
instead?

> @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
>  {
>    GIMPLE_PASS, /* type */
>    "ompexp", /* name */
> -  OPTGROUP_NONE, /* optinfo_flags */
> +  OPTGROUP_OPENMP, /* optinfo_flags */
>    TV_NONE, /* tv_id */
>    PROP_gimple_any, /* properties_required */
>    PROP_gimple_eomp, /* properties_provided */

What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What about
openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
ompdevlow pass soon.

Otherwise LGTM.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2016-11-18 10:23   ` Jakub Jelinek
@ 2016-11-22 13:27     ` Martin Jambor
  2016-11-22 14:13       ` Jakub Jelinek
  2021-01-14 14:50       ` Thomas Schwinge
  0 siblings, 2 replies; 36+ messages in thread
From: Martin Jambor @ 2016-11-22 13:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Martin Liska

Hi,

On Fri, Nov 18, 2016 at 11:23:10AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 08:02:41PM +0100, Martin Jambor wrote:
> > @@ -143,6 +240,12 @@ init_enviroment_variables (void)
> >      suppress_host_fallback = true;
> >    else
> >      suppress_host_fallback = false;
> > +
> > +  hsa_runtime_lib = getenv ("HSA_RUNTIME_LIB");
> > +  if (hsa_runtime_lib == NULL)
> > +    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
> 
> libgomp is very much env var driven, but the above one is IMHO just
> too dangerous in suid/sgid apps, allowing one to select a library
> of their own choice to dlopen is an instant exploit possibility,
> so such env var should be only considered in non-priviledged processes.
> It is possible to try dlopen (hsa_runtime_lib) and if that fails, try
> dlopen ("libhsa-runtime64.so"), where it would search the library only
> in the system paths (note, the dynamic linker handles LD_LIBRARY_PATH,
> LD_PRELOAD etc. safely in priviledges processes).
> 
> So I'd recommend to use secure_getenv instead.  E.g. see how libgfortran
> checks for it in configure and even provides a fallback version for it.
> In the HSA plugin case, I think the fallback should be static function
> in the plugin.
> Otherwise it looks reasonable, thanks for working on that.
> 

I have basically copied what libgfortran did, with additional checking
for HAVE_UNISTD_H when attempting to implement secure_getenv in its
absence (which is maybe unnecessary but should not do any harm) and I
also needed to add -D_GNU_SOURCE to plugin compilation flags.
Finally, I have changed all getenv users in the plugin to use
secure_getenv.

So far I have only bootstrapped (and lto-bootstrapped) and tested this
on x86_64-linux without any issues.  I'm about to play with it a bit
on gcc111, i.e. ppc64le-aix, but the machine is very slow and I mainly
want to make sure I do not break it for people not interested in hsa.

So, is this version OK for trunk?

Thanks a lot,

Martin


2016-11-21  Martin Liska  <mliska@suse.cz>
            Martin Jambor  <mjambor@suse.cz>

gcc/
	* doc/install.texi: Remove entry about --with-hsa-kmt-lib.

libgomp/
	* plugin/hsa.h: New file.
	* plugin/hsa_ext_finalize.h: New file.
	* plugin/configfrag.ac: Remove hsa-kmt-lib test.  Added checks for
	header file unistd.h, and functions secure_getenv, __secure_getenv,
	getuid, geteuid, getgid and getegid.
	* plugin/Makefrag.am (libgomp_plugin_hsa_la_CPPFLAGS): Added
	-D_GNU_SOURCE.
	* plugin/plugin-hsa.c: Include config.h, inttypes.h and stdbool.h.
	Handle various cases of secure_getenv presence, add an implementation
	when we can test effective UID and GID.
	(struct hsa_runtime_fn_info): New structure.
	(hsa_runtime_fn_info hsa_fns): New variable.
	(hsa_runtime_lib): Likewise.
	(support_cpu_devices): Likewise.
	(init_enviroment_variables): Load newly introduced ENV
	variables.
	(hsa_warn): Call hsa run-time functions via hsa_fns structure.
	(hsa_fatal): Likewise.
	(DLSYM_FN): New macro.
	(init_hsa_runtime_functions): New function.
	(suitable_hsa_agent_p): Call hsa run-time functions via hsa_fns
	structure.  Depending on environment, also allow CPU devices.
	(init_hsa_context): Call hsa run-time functions via hsa_fns structure.
	(get_kernarg_memory_region): Likewise.
	(GOMP_OFFLOAD_init_device): Likewise.
	(destroy_hsa_program): Likewise.
	(init_basic_kernel_info): New function.
	(GOMP_OFFLOAD_load_image): Use it.
	(create_and_finalize_hsa_program): Call hsa run-time functions via
	hsa_fns structure.
	(create_single_kernel_dispatch): Likewise.
	(release_kernel_dispatch): Likewise.
	(init_single_kernel): Likewise.
	(parse_target_attributes): Allow up multiple HSA grid dimensions.
	(get_group_size): New function.
	(run_kernel): Likewise.
	(GOMP_OFFLOAD_run): Outline most functionality to run_kernel.
	(GOMP_OFFLOAD_fini_device): Call hsa run-time functions via hsa_fns
	structure.
	* testsuite/lib/libgomp.exp: Remove hsa_kmt_lib support.
	* testsuite/libgomp-test-support.exp.in: Likewise.
	* Makefile.in: Regenerated.
	* aclocal.m4: Likewise.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
---
 gcc/doc/install.texi                          |   6 -
 libgomp/Makefile.in                           | 138 ++----
 libgomp/aclocal.m4                            |  74 ++-
 libgomp/config.h.in                           |  21 +
 libgomp/configure                             | 129 ++++--
 libgomp/plugin/Makefrag.am                    |   3 +-
 libgomp/plugin/configfrag.ac                  |  35 +-
 libgomp/plugin/hsa.h                          | 630 ++++++++++++++++++++++++++
 libgomp/plugin/hsa_ext_finalize.h             | 265 +++++++++++
 libgomp/plugin/plugin-hsa.c                   | 505 ++++++++++++++++-----
 libgomp/testsuite/Makefile.in                 |  61 +--
 libgomp/testsuite/lib/libgomp.exp             |   4 -
 libgomp/testsuite/libgomp-test-support.exp.in |   1 -
 13 files changed, 1484 insertions(+), 388 deletions(-)
 create mode 100644 libgomp/plugin/hsa.h
 create mode 100644 libgomp/plugin/hsa_ext_finalize.h

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 78e385e..a520045 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1995,12 +1995,6 @@ explicitly specify the directory where they are installed.  The
 shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
 @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
-
-@item --with-hsa-kmt-lib=@var{pathname}
-
-If you configure GCC with HSA offloading but do not have the HSA
-KMT library installed in a standard location then you can
-explicitly specify the directory where it resides.
 @end table
 
 @subheading Cross-Compiler-Specific Options
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 035a663..39d1de1 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -44,7 +44,8 @@ if PLUGIN_HSA
 libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
 toolexeclib_LTLIBRARIES += libgomp-plugin-hsa.la
 libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
-libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS)
+libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS) \
+	-D_GNU_SOURCE
 libgomp_plugin_hsa_la_LDFLAGS = $(libgomp_plugin_hsa_version_info) \
 	$(lt_host_flags)
 libgomp_plugin_hsa_la_LDFLAGS += $(PLUGIN_HSA_LDFLAGS)
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 88b4156..29416d5 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -36,6 +36,9 @@ if test x"$plugin_support" = xyes; then
 elif test "x${enable_offload_targets-no}" != xno; then
   AC_MSG_ERROR([Can't support offloading without support for plugins])
 fi
+AC_CHECK_HEADERS_ONCE(unistd.h)
+AC_CHECK_FUNCS_ONCE(secure_getenv __secure_getenv getuid geteuid getgid getegid)
+
 
 # Look for the CUDA driver package.
 CUDA_DRIVER_INCLUDE=
@@ -118,19 +121,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
-HSA_KMT_LIB=
-AC_SUBST(HSA_KMT_LIB)
-HSA_KMT_LDFLAGS=
-AC_ARG_WITH(hsa-kmt-lib,
-	[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
-		[specify directory for installed HSA KMT library.])])
-if test "x$with_hsa_kmt_lib" != x; then
-  HSA_KMT_LIB=$with_hsa_kmt_lib
-fi
-if test "x$HSA_KMT_LIB" != x; then
-  HSA_KMT_LDFLAGS=-L$HSA_KMT_LIB
-fi
-
 PLUGIN_HSA=0
 PLUGIN_HSA_CPPFLAGS=
 PLUGIN_HSA_LDFLAGS=
@@ -140,8 +130,6 @@ AC_SUBST(PLUGIN_HSA_CPPFLAGS)
 AC_SUBST(PLUGIN_HSA_LDFLAGS)
 AC_SUBST(PLUGIN_HSA_LIBS)
 
-
-
 # Get offload targets and path to install tree of offloading compiler.
 offload_additional_options=
 offload_additional_lib_paths=
@@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
 	        tgt_name=hsa
 	        PLUGIN_HSA=$tgt
 	        PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
-	        PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
+	        PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
+	        PLUGIN_HSA_LIBS="-ldl"
 
 	        PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
 	        CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
@@ -205,11 +193,7 @@ if test x"$enable_offload_targets" != x; then
 	        PLUGIN_HSA_save_LIBS=$LIBS
 	        LIBS="$PLUGIN_HSA_LIBS $LIBS"
 
-	        AC_LINK_IFELSE(
-	          [AC_LANG_PROGRAM(
-	            [#include "hsa.h"],
-	              [hsa_status_t status = hsa_init ()])],
-	          [PLUGIN_HSA=1])
+	        PLUGIN_HSA=1
 	        CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
 	        LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
 	        LIBS=$PLUGIN_HSA_save_LIBS
@@ -260,3 +244,10 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
+
+if test "$HSA_RUNTIME_LIB" != ""; then
+  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
+fi
+
+AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
+  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/hsa.h b/libgomp/plugin/hsa.h
new file mode 100644
index 0000000..6765751
--- /dev/null
+++ b/libgomp/plugin/hsa.h
@@ -0,0 +1,630 @@
+/* HSA runtime API 1.0.1 representation description.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.
+
+The contents of the file was created by extracting data structures, enum,
+typedef and other definitions from HSA Runtime Programmer’s Reference Manual
+Version 1.0 (http://www.hsafoundation.com/standards/).
+
+HTML version is provided on the following link:
+http://www.hsafoundation.com/html/Content/Runtime/Topics/Runtime_title_page.htm
+*/
+
+#ifndef _HSA_H
+#define _HSA_H 1
+
+#define HSA_LARGE_MODEL 1
+
+typedef struct hsa_signal_s { uint64_t handle; } hsa_signal_t;
+typedef enum {
+  HSA_QUEUE_TYPE_MULTI = 0,
+  HSA_QUEUE_TYPE_SINGLE = 1
+} hsa_queue_type_t;
+
+typedef enum { HSA_PROFILE_BASE = 0, HSA_PROFILE_FULL = 1 } hsa_profile_t;
+typedef struct hsa_region_s { uint64_t handle; } hsa_region_t;
+typedef enum {
+  HSA_EXECUTABLE_SYMBOL_INFO_TYPE = 0,
+  HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH = 1,
+  HSA_EXECUTABLE_SYMBOL_INFO_NAME = 2,
+  HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3,
+  HSA_EXECUTABLE_SYMBOL_INFO_MODULE_NAME = 4,
+  HSA_EXECUTABLE_SYMBOL_INFO_AGENT = 20,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS = 21,
+  HSA_EXECUTABLE_SYMBOL_INFO_LINKAGE = 5,
+  HSA_EXECUTABLE_SYMBOL_INFO_IS_DEFINITION = 17,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SEGMENT = 7,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE = 9,
+  HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_IS_CONST = 10,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT = 22,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14,
+  HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15,
+  HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_OBJECT = 23,
+  HSA_EXECUTABLE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16
+} hsa_executable_symbol_info_t;
+typedef enum {
+  HSA_REGION_GLOBAL_FLAG_KERNARG = 1,
+  HSA_REGION_GLOBAL_FLAG_FINE_GRAINED = 2,
+  HSA_REGION_GLOBAL_FLAG_COARSE_GRAINED = 4
+} hsa_region_global_flag_t;
+typedef struct hsa_code_object_s { uint64_t handle; } hsa_code_object_t;
+typedef enum {
+  HSA_KERNEL_DISPATCH_PACKET_SETUP_WIDTH_DIMENSIONS = 2
+} hsa_kernel_dispatch_packet_setup_width_t;
+typedef enum {
+  HSA_DEVICE_TYPE_CPU = 0,
+  HSA_DEVICE_TYPE_GPU = 1,
+  HSA_DEVICE_TYPE_DSP = 2
+} hsa_device_type_t;
+typedef enum {
+  HSA_STATUS_SUCCESS = 0x0,
+  HSA_STATUS_INFO_BREAK = 0x1,
+  HSA_STATUS_ERROR = 0x1000,
+  HSA_STATUS_ERROR_INVALID_ARGUMENT = 0x1001,
+  HSA_STATUS_ERROR_INVALID_QUEUE_CREATION = 0x1002,
+  HSA_STATUS_ERROR_INVALID_ALLOCATION = 0x1003,
+  HSA_STATUS_ERROR_INVALID_AGENT = 0x1004,
+  HSA_STATUS_ERROR_INVALID_REGION = 0x1005,
+  HSA_STATUS_ERROR_INVALID_SIGNAL = 0x1006,
+  HSA_STATUS_ERROR_INVALID_QUEUE = 0x1007,
+  HSA_STATUS_ERROR_OUT_OF_RESOURCES = 0x1008,
+  HSA_STATUS_ERROR_INVALID_PACKET_FORMAT = 0x1009,
+  HSA_STATUS_ERROR_RESOURCE_FREE = 0x100A,
+  HSA_STATUS_ERROR_NOT_INITIALIZED = 0x100B,
+  HSA_STATUS_ERROR_REFCOUNT_OVERFLOW = 0x100C,
+  HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS = 0x100D,
+  HSA_STATUS_ERROR_INVALID_INDEX = 0x100E,
+  HSA_STATUS_ERROR_INVALID_ISA = 0x100F,
+  HSA_STATUS_ERROR_INVALID_ISA_NAME = 0x1017,
+  HSA_STATUS_ERROR_INVALID_CODE_OBJECT = 0x1010,
+  HSA_STATUS_ERROR_INVALID_EXECUTABLE = 0x1011,
+  HSA_STATUS_ERROR_FROZEN_EXECUTABLE = 0x1012,
+  HSA_STATUS_ERROR_INVALID_SYMBOL_NAME = 0x1013,
+  HSA_STATUS_ERROR_VARIABLE_ALREADY_DEFINED = 0x1014,
+  HSA_STATUS_ERROR_VARIABLE_UNDEFINED = 0x1015,
+  HSA_STATUS_ERROR_EXCEPTION = 0x1016
+} hsa_status_t;
+typedef enum {
+  HSA_EXTENSION_FINALIZER = 0,
+  HSA_EXTENSION_IMAGES = 1
+} hsa_extension_t;
+typedef struct hsa_queue_s {
+  hsa_queue_type_t type;
+  uint32_t features;
+
+#ifdef HSA_LARGE_MODEL
+  void *base_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *base_address;
+  uint32_t reserved0;
+#else
+  uint32_t reserved0;
+  void *base_address;
+#endif
+
+  hsa_signal_t doorbell_signal;
+  uint32_t size;
+  uint32_t reserved1;
+  uint64_t id;
+} hsa_queue_t;
+typedef struct hsa_agent_dispatch_packet_s {
+  uint16_t header;
+  uint16_t type;
+  uint32_t reserved0;
+
+#ifdef HSA_LARGE_MODEL
+  void *return_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *return_address;
+  uint32_t reserved1;
+#else
+  uint32_t reserved1;
+  void *return_address;
+#endif
+  uint64_t arg[4];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_agent_dispatch_packet_t;
+typedef enum {
+  HSA_CODE_SYMBOL_INFO_TYPE = 0,
+  HSA_CODE_SYMBOL_INFO_NAME_LENGTH = 1,
+  HSA_CODE_SYMBOL_INFO_NAME = 2,
+  HSA_CODE_SYMBOL_INFO_MODULE_NAME_LENGTH = 3,
+  HSA_CODE_SYMBOL_INFO_MODULE_NAME = 4,
+  HSA_CODE_SYMBOL_INFO_LINKAGE = 5,
+  HSA_CODE_SYMBOL_INFO_IS_DEFINITION = 17,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_ALLOCATION = 6,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_SEGMENT = 7,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_ALIGNMENT = 8,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_SIZE = 9,
+  HSA_CODE_SYMBOL_INFO_VARIABLE_IS_CONST = 10,
+  HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE = 11,
+  HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT = 12,
+  HSA_CODE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE = 13,
+  HSA_CODE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE = 14,
+  HSA_CODE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK = 15,
+  HSA_CODE_SYMBOL_INFO_INDIRECT_FUNCTION_CALL_CONVENTION = 16
+} hsa_code_symbol_info_t;
+typedef enum {
+  HSA_QUEUE_FEATURE_KERNEL_DISPATCH = 1,
+  HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2
+} hsa_queue_feature_t;
+typedef enum {
+  HSA_VARIABLE_ALLOCATION_AGENT = 0,
+  HSA_VARIABLE_ALLOCATION_PROGRAM = 1
+} hsa_variable_allocation_t;
+typedef enum {
+  HSA_FENCE_SCOPE_NONE = 0,
+  HSA_FENCE_SCOPE_AGENT = 1,
+  HSA_FENCE_SCOPE_SYSTEM = 2
+} hsa_fence_scope_t;
+typedef struct hsa_agent_s { uint64_t handle; } hsa_agent_t;
+typedef enum { HSA_CODE_OBJECT_TYPE_PROGRAM = 0 } hsa_code_object_type_t;
+typedef enum {
+  HSA_SIGNAL_CONDITION_EQ = 0,
+  HSA_SIGNAL_CONDITION_NE = 1,
+  HSA_SIGNAL_CONDITION_LT = 2,
+  HSA_SIGNAL_CONDITION_GTE = 3
+} hsa_signal_condition_t;
+typedef enum {
+  HSA_EXECUTABLE_STATE_UNFROZEN = 0,
+  HSA_EXECUTABLE_STATE_FROZEN = 1
+} hsa_executable_state_t;
+typedef enum {
+  HSA_ENDIANNESS_LITTLE = 0,
+  HSA_ENDIANNESS_BIG = 1
+} hsa_endianness_t;
+typedef enum {
+  HSA_MACHINE_MODEL_SMALL = 0,
+  HSA_MACHINE_MODEL_LARGE = 1
+} hsa_machine_model_t;
+typedef enum {
+  HSA_AGENT_INFO_NAME = 0,
+  HSA_AGENT_INFO_VENDOR_NAME = 1,
+  HSA_AGENT_INFO_FEATURE = 2,
+  HSA_AGENT_INFO_MACHINE_MODEL = 3,
+  HSA_AGENT_INFO_PROFILE = 4,
+  HSA_AGENT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5,
+  HSA_AGENT_INFO_BASE_PROFILE_DEFAULT_FLOAT_ROUNDING_MODES = 23,
+  HSA_AGENT_INFO_FAST_F16_OPERATION = 24,
+  HSA_AGENT_INFO_WAVEFRONT_SIZE = 6,
+  HSA_AGENT_INFO_WORKGROUP_MAX_DIM = 7,
+  HSA_AGENT_INFO_WORKGROUP_MAX_SIZE = 8,
+  HSA_AGENT_INFO_GRID_MAX_DIM = 9,
+  HSA_AGENT_INFO_GRID_MAX_SIZE = 10,
+  HSA_AGENT_INFO_FBARRIER_MAX_SIZE = 11,
+  HSA_AGENT_INFO_QUEUES_MAX = 12,
+  HSA_AGENT_INFO_QUEUE_MIN_SIZE = 13,
+  HSA_AGENT_INFO_QUEUE_MAX_SIZE = 14,
+  HSA_AGENT_INFO_QUEUE_TYPE = 15,
+  HSA_AGENT_INFO_NODE = 16,
+  HSA_AGENT_INFO_DEVICE = 17,
+  HSA_AGENT_INFO_CACHE_SIZE = 18,
+  HSA_AGENT_INFO_ISA = 19,
+  HSA_AGENT_INFO_EXTENSIONS = 20,
+  HSA_AGENT_INFO_VERSION_MAJOR = 21,
+  HSA_AGENT_INFO_VERSION_MINOR = 22
+} hsa_agent_info_t;
+typedef struct hsa_barrier_and_packet_s {
+  uint16_t header;
+  uint16_t reserved0;
+  uint32_t reserved1;
+  hsa_signal_t dep_signal[5];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_barrier_and_packet_t;
+typedef struct hsa_dim3_s {
+  uint32_t x;
+  uint32_t y;
+  uint32_t z;
+} hsa_dim3_t;
+typedef enum {
+  HSA_ACCESS_PERMISSION_RO = 1,
+  HSA_ACCESS_PERMISSION_WO = 2,
+  HSA_ACCESS_PERMISSION_RW = 3
+} hsa_access_permission_t;
+typedef enum {
+  HSA_AGENT_FEATURE_KERNEL_DISPATCH = 1,
+  HSA_AGENT_FEATURE_AGENT_DISPATCH = 2
+} hsa_agent_feature_t;
+typedef enum {
+  HSA_WAIT_STATE_BLOCKED = 0,
+  HSA_WAIT_STATE_ACTIVE = 1
+} hsa_wait_state_t;
+typedef struct hsa_executable_s { uint64_t handle; } hsa_executable_t;
+typedef enum {
+  HSA_REGION_SEGMENT_GLOBAL = 0,
+  HSA_REGION_SEGMENT_READONLY = 1,
+  HSA_REGION_SEGMENT_PRIVATE = 2,
+  HSA_REGION_SEGMENT_GROUP = 3
+} hsa_region_segment_t;
+typedef enum {
+  HSA_REGION_INFO_SEGMENT = 0,
+  HSA_REGION_INFO_GLOBAL_FLAGS = 1,
+  HSA_REGION_INFO_SIZE = 2,
+  HSA_REGION_INFO_ALLOC_MAX_SIZE = 4,
+  HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED = 5,
+  HSA_REGION_INFO_RUNTIME_ALLOC_GRANULE = 6,
+  HSA_REGION_INFO_RUNTIME_ALLOC_ALIGNMENT = 7
+} hsa_region_info_t;
+typedef enum {
+  HSA_ISA_INFO_NAME_LENGTH = 0,
+  HSA_ISA_INFO_NAME = 1,
+  HSA_ISA_INFO_CALL_CONVENTION_COUNT = 2,
+  HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONT_SIZE = 3,
+  HSA_ISA_INFO_CALL_CONVENTION_INFO_WAVEFRONTS_PER_COMPUTE_UNIT = 4
+} hsa_isa_info_t;
+typedef enum {
+  HSA_VARIABLE_SEGMENT_GLOBAL = 0,
+  HSA_VARIABLE_SEGMENT_READONLY = 1
+} hsa_variable_segment_t;
+typedef struct hsa_callback_data_s { uint64_t handle; } hsa_callback_data_t;
+typedef enum {
+  HSA_SYMBOL_KIND_VARIABLE = 0,
+  HSA_SYMBOL_KIND_KERNEL = 1,
+  HSA_SYMBOL_KIND_INDIRECT_FUNCTION = 2
+} hsa_symbol_kind_t;
+typedef struct hsa_kernel_dispatch_packet_s {
+  uint16_t header;
+  uint16_t setup;
+  uint16_t workgroup_size_x;
+  uint16_t workgroup_size_y;
+  uint16_t workgroup_size_z;
+  uint16_t reserved0;
+  uint32_t grid_size_x;
+  uint32_t grid_size_y;
+  uint32_t grid_size_z;
+  uint32_t private_segment_size;
+  uint32_t group_segment_size;
+  uint64_t kernel_object;
+
+#ifdef HSA_LARGE_MODEL
+  void *kernarg_address;
+#elif defined HSA_LITTLE_ENDIAN
+  void *kernarg_address;
+  uint32_t reserved1;
+#else
+  uint32_t reserved1;
+  void *kernarg_address;
+#endif
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_kernel_dispatch_packet_t;
+typedef enum {
+  HSA_PACKET_TYPE_VENDOR_SPECIFIC = 0,
+  HSA_PACKET_TYPE_INVALID = 1,
+  HSA_PACKET_TYPE_KERNEL_DISPATCH = 2,
+  HSA_PACKET_TYPE_BARRIER_AND = 3,
+  HSA_PACKET_TYPE_AGENT_DISPATCH = 4,
+  HSA_PACKET_TYPE_BARRIER_OR = 5
+} hsa_packet_type_t;
+typedef enum {
+  HSA_PACKET_HEADER_TYPE = 0,
+  HSA_PACKET_HEADER_BARRIER = 8,
+  HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE = 9,
+  HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE = 11
+} hsa_packet_header_t;
+typedef struct hsa_isa_s { uint64_t handle; } hsa_isa_t;
+typedef enum {
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT = 0,
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_ZERO = 1,
+  HSA_DEFAULT_FLOAT_ROUNDING_MODE_NEAR = 2
+} hsa_default_float_rounding_mode_t;
+typedef struct hsa_code_symbol_s { uint64_t handle; } hsa_code_symbol_t;
+typedef struct hsa_executable_symbol_s {
+  uint64_t handle;
+} hsa_executable_symbol_t;
+#ifdef HSA_LARGE_MODEL
+typedef int64_t hsa_signal_value_t;
+#else
+typedef int32_t hsa_signal_value_t;
+#endif
+typedef enum {
+  HSA_EXCEPTION_POLICY_BREAK = 1,
+  HSA_EXCEPTION_POLICY_DETECT = 2
+} hsa_exception_policy_t;
+typedef enum {
+  HSA_SYSTEM_INFO_VERSION_MAJOR = 0,
+  HSA_SYSTEM_INFO_VERSION_MINOR = 1,
+  HSA_SYSTEM_INFO_TIMESTAMP = 2,
+  HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY = 3,
+  HSA_SYSTEM_INFO_SIGNAL_MAX_WAIT = 4,
+  HSA_SYSTEM_INFO_ENDIANNESS = 5,
+  HSA_SYSTEM_INFO_MACHINE_MODEL = 6,
+  HSA_SYSTEM_INFO_EXTENSIONS = 7
+} hsa_system_info_t;
+typedef enum {
+  HSA_EXECUTABLE_INFO_PROFILE = 1,
+  HSA_EXECUTABLE_INFO_STATE = 2
+} hsa_executable_info_t;
+typedef enum {
+  HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS = 0
+} hsa_kernel_dispatch_packet_setup_t;
+typedef enum {
+  HSA_PACKET_HEADER_WIDTH_TYPE = 8,
+  HSA_PACKET_HEADER_WIDTH_BARRIER = 1,
+  HSA_PACKET_HEADER_WIDTH_ACQUIRE_FENCE_SCOPE = 2,
+  HSA_PACKET_HEADER_WIDTH_RELEASE_FENCE_SCOPE = 2
+} hsa_packet_header_width_t;
+typedef enum {
+  HSA_CODE_OBJECT_INFO_VERSION = 0,
+  HSA_CODE_OBJECT_INFO_TYPE = 1,
+  HSA_CODE_OBJECT_INFO_ISA = 2,
+  HSA_CODE_OBJECT_INFO_MACHINE_MODEL = 3,
+  HSA_CODE_OBJECT_INFO_PROFILE = 4,
+  HSA_CODE_OBJECT_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 5
+} hsa_code_object_info_t;
+typedef struct hsa_barrier_or_packet_s {
+  uint16_t header;
+  uint16_t reserved0;
+  uint32_t reserved1;
+  hsa_signal_t dep_signal[5];
+  uint64_t reserved2;
+  hsa_signal_t completion_signal;
+} hsa_barrier_or_packet_t;
+typedef enum {
+  HSA_SYMBOL_KIND_LINKAGE_MODULE = 0,
+  HSA_SYMBOL_KIND_LINKAGE_PROGRAM = 1,
+} hsa_symbol_kind_linkage_t;
+hsa_status_t hsa_executable_validate(hsa_executable_t executable,
+                                     uint32_t *result);
+uint64_t hsa_queue_add_write_index_acq_rel(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_acquire(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_relaxed(const hsa_queue_t *queue,
+                                           uint64_t value);
+
+uint64_t hsa_queue_add_write_index_release(const hsa_queue_t *queue,
+                                           uint64_t value);
+hsa_status_t hsa_shut_down();
+void hsa_signal_add_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_add_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_executable_readonly_variable_define(
+    hsa_executable_t executable, hsa_agent_t agent, const char *variable_name,
+    void *address);
+hsa_status_t hsa_agent_extension_supported(uint16_t extension,
+                                           hsa_agent_t agent,
+                                           uint16_t version_major,
+                                           uint16_t version_minor,
+                                           bool *result);
+hsa_signal_value_t hsa_signal_load_acquire(hsa_signal_t signal);
+
+hsa_signal_value_t hsa_signal_load_relaxed(hsa_signal_t signal);
+hsa_status_t hsa_executable_get_info(hsa_executable_t executable,
+                                     hsa_executable_info_t attribute,
+                                     void *value);
+hsa_status_t hsa_iterate_agents(hsa_status_t (*callback)(hsa_agent_t agent,
+                                                         void *data),
+                                void *data);
+void hsa_signal_subtract_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_subtract_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t
+hsa_executable_symbol_get_info(hsa_executable_symbol_t executable_symbol,
+                               hsa_executable_symbol_info_t attribute,
+                               void *value);
+void hsa_signal_xor_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_xor_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_code_object_get_info(hsa_code_object_t code_object,
+                                      hsa_code_object_info_t attribute,
+                                      void *value);
+hsa_status_t hsa_code_object_deserialize(void *serialized_code_object,
+                                         size_t serialized_code_object_size,
+                                         const char *options,
+                                         hsa_code_object_t *code_object);
+hsa_status_t hsa_status_string(hsa_status_t status, const char **status_string);
+hsa_status_t hsa_code_object_get_symbol(hsa_code_object_t code_object,
+                                        const char *symbol_name,
+                                        hsa_code_symbol_t *symbol);
+void hsa_signal_store_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_store_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_signal_destroy(hsa_signal_t signal);
+hsa_status_t hsa_system_get_extension_table(uint16_t extension,
+                                            uint16_t version_major,
+                                            uint16_t version_minor,
+                                            void *table);
+hsa_status_t hsa_agent_iterate_regions(
+    hsa_agent_t agent,
+    hsa_status_t (*callback)(hsa_region_t region, void *data), void *data);
+hsa_status_t hsa_executable_agent_global_variable_define(
+    hsa_executable_t executable, hsa_agent_t agent, const char *variable_name,
+    void *address);
+hsa_status_t hsa_queue_create(hsa_agent_t agent, uint32_t size,
+                              hsa_queue_type_t type,
+                              void (*callback)(hsa_status_t status,
+                                               hsa_queue_t *source, void *data),
+                              void *data, uint32_t private_segment_size,
+                              uint32_t group_segment_size, hsa_queue_t **queue);
+hsa_status_t hsa_isa_compatible(hsa_isa_t code_object_isa, hsa_isa_t agent_isa,
+                                bool *result);
+hsa_status_t hsa_code_object_serialize(
+    hsa_code_object_t code_object,
+    hsa_status_t (*alloc_callback)(size_t size, hsa_callback_data_t data,
+                                   void **address),
+    hsa_callback_data_t callback_data, const char *options,
+    void **serialized_code_object, size_t *serialized_code_object_size);
+hsa_status_t hsa_region_get_info(hsa_region_t region,
+                                 hsa_region_info_t attribute, void *value);
+hsa_status_t hsa_executable_freeze(hsa_extension_t executable,
+                                   const char *options);
+hsa_status_t hsa_system_extension_supported(uint16_t extension,
+                                            uint16_t version_major,
+                                            uint16_t version_minor,
+                                            bool *result);
+hsa_signal_value_t hsa_signal_wait_acquire(hsa_signal_t signal,
+                                           hsa_signal_condition_t condition,
+                                           hsa_signal_value_t compare_value,
+                                           uint64_t timeout_hint,
+                                           hsa_wait_state_t wait_state_hint);
+
+hsa_signal_value_t hsa_signal_wait_relaxed(hsa_signal_t signal,
+                                           hsa_signal_condition_t condition,
+                                           hsa_signal_value_t compare_value,
+                                           uint64_t timeout_hint,
+                                           hsa_wait_state_t wait_state_hint);
+hsa_status_t hsa_memory_copy(void *dst, const void *src, size_t size);
+hsa_status_t hsa_memory_free(void *ptr);
+hsa_status_t hsa_queue_destroy(hsa_queue_t *queue);
+hsa_status_t hsa_isa_from_name(const char *name, hsa_isa_t *isa);
+hsa_status_t hsa_isa_get_info(hsa_isa_t isa, hsa_isa_info_t attribute,
+                              uint32_t index, void *value);
+hsa_status_t hsa_signal_create(hsa_signal_value_t initial_value,
+                               uint32_t num_consumers,
+                               const hsa_agent_t *consumers,
+                               hsa_signal_t *signal);
+hsa_status_t hsa_code_symbol_get_info(hsa_code_symbol_t code_symbol,
+                                      hsa_code_symbol_info_t attribute,
+                                      void *value);
+hsa_signal_value_t hsa_signal_cas_acq_rel(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_acquire(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_relaxed(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_cas_release(hsa_signal_t signal,
+                                          hsa_signal_value_t expected,
+                                          hsa_signal_value_t value);
+hsa_status_t hsa_code_object_iterate_symbols(
+    hsa_code_object_t code_object,
+    hsa_status_t (*callback)(hsa_code_object_t code_object,
+                             hsa_code_symbol_t symbol, void *data),
+    void *data);
+void hsa_queue_store_read_index_relaxed(const hsa_queue_t *queue,
+                                        uint64_t value);
+
+void hsa_queue_store_read_index_release(const hsa_queue_t *queue,
+                                        uint64_t value);
+hsa_status_t hsa_memory_assign_agent(void *ptr, hsa_agent_t agent,
+                                     hsa_access_permission_t access);
+hsa_status_t hsa_queue_inactivate(hsa_queue_t *queue);
+hsa_status_t hsa_executable_get_symbol(hsa_executable_t executable,
+                                       const char *module_name,
+                                       const char *symbol_name,
+                                       hsa_agent_t agent,
+                                       int32_t call_convention,
+                                       hsa_executable_symbol_t *symbol);
+uint64_t hsa_queue_cas_write_index_acq_rel(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_acquire(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_relaxed(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+
+uint64_t hsa_queue_cas_write_index_release(const hsa_queue_t *queue,
+                                           uint64_t expected, uint64_t value);
+void hsa_signal_and_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_and_release(hsa_signal_t signal, hsa_signal_value_t value);
+uint64_t hsa_queue_load_read_index_acquire(const hsa_queue_t *queue);
+
+uint64_t hsa_queue_load_read_index_relaxed(const hsa_queue_t *queue);
+hsa_status_t hsa_executable_load_code_object(hsa_executable_t executable,
+                                             hsa_agent_t agent,
+                                             hsa_code_object_t code_object,
+                                             const char *options);
+uint64_t hsa_queue_load_write_index_acquire(const hsa_queue_t *queue);
+
+uint64_t hsa_queue_load_write_index_relaxed(const hsa_queue_t *queue);
+hsa_status_t hsa_agent_get_exception_policies(hsa_agent_t agent,
+                                              hsa_profile_t profile,
+                                              uint16_t *mask);
+hsa_status_t hsa_memory_deregister(void *ptr, size_t size);
+void hsa_signal_or_acq_rel(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_acquire(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_relaxed(hsa_signal_t signal, hsa_signal_value_t value);
+
+void hsa_signal_or_release(hsa_signal_t signal, hsa_signal_value_t value);
+hsa_status_t hsa_soft_queue_create(hsa_region_t region, uint32_t size,
+                                   hsa_queue_type_t type, uint32_t features,
+                                   hsa_signal_t doorbell_signal,
+                                   hsa_queue_t **queue);
+hsa_status_t hsa_executable_iterate_symbols(
+    hsa_executable_t executable,
+    hsa_status_t (*callback)(hsa_executable_t executable,
+                             hsa_executable_symbol_t symbol, void *data),
+    void *data);
+hsa_status_t hsa_memory_register(void *ptr, size_t size);
+void hsa_queue_store_write_index_relaxed(const hsa_queue_t *queue,
+                                         uint64_t value);
+
+void hsa_queue_store_write_index_release(const hsa_queue_t *queue,
+                                         uint64_t value);
+hsa_status_t hsa_executable_global_variable_define(hsa_executable_t executable,
+                                                   const char *variable_name,
+                                                   void *address);
+hsa_status_t hsa_executable_destroy(hsa_executable_t executable);
+hsa_status_t hsa_code_object_destroy(hsa_code_object_t code_object);
+hsa_status_t hsa_memory_allocate(hsa_region_t region, size_t size, void **ptr);
+hsa_signal_value_t hsa_signal_exchange_acq_rel(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_acquire(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_relaxed(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+
+hsa_signal_value_t hsa_signal_exchange_release(hsa_signal_t signal,
+                                               hsa_signal_value_t value);
+hsa_status_t hsa_agent_get_info(hsa_agent_t agent, hsa_agent_info_t attribute,
+                                void *value);
+hsa_status_t hsa_init();
+hsa_status_t hsa_system_get_info(hsa_system_info_t attribute, void *value);
+hsa_status_t hsa_executable_create(hsa_profile_t profile,
+                                   hsa_executable_state_t executable_state,
+                                   const char *options,
+                                   hsa_executable_t *executable);
+
+#endif /* _HSA_H */
diff --git a/libgomp/plugin/hsa_ext_finalize.h b/libgomp/plugin/hsa_ext_finalize.h
new file mode 100644
index 0000000..f159add
--- /dev/null
+++ b/libgomp/plugin/hsa_ext_finalize.h
@@ -0,0 +1,265 @@
+/* HSA Extensions API 1.0.1 representation description.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.
+
+The contents of the file was created by extracting data structures, enum,
+typedef and other definitions from HSA Runtime Programmer’s Reference Manual
+Version 1.0 (http://www.hsafoundation.com/standards/).
+
+HTML version is provided on the following link:
+http://www.hsafoundation.com/html/Content/Runtime/Topics/Runtime_title_page.htm
+*/
+
+
+#ifndef _HSA_EXT_FINALIZE_H
+#define _HSA_EXT_FINALIZE_H 1
+
+struct BrigModuleHeader;
+typedef struct BrigModuleHeader *BrigModule_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_GEOMETRY_1D = 0,
+  HSA_EXT_IMAGE_GEOMETRY_2D = 1,
+  HSA_EXT_IMAGE_GEOMETRY_3D = 2,
+  HSA_EXT_IMAGE_GEOMETRY_1DA = 3,
+  HSA_EXT_IMAGE_GEOMETRY_2DA = 4,
+  HSA_EXT_IMAGE_GEOMETRY_1DB = 5,
+  HSA_EXT_IMAGE_GEOMETRY_2DDEPTH = 6,
+  HSA_EXT_IMAGE_GEOMETRY_2DADEPTH = 7
+} hsa_ext_image_geometry_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 = 7,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14,
+  HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT = 15
+} hsa_ext_image_channel_type_t;
+
+typedef enum {
+  HSA_EXT_IMAGE_CHANNEL_ORDER_A = 0,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_R = 1,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RX = 2,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RG = 3,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGX = 4,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RA = 5,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGB = 6,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX = 7,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA = 8,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA = 9,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB = 10,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR = 11,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB = 12,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX = 13,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA = 14,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA = 15,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY = 16,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE = 17,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH = 18,
+  HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19
+} hsa_ext_image_channel_order_t;
+
+typedef struct hsa_ext_image_format_s
+{
+  hsa_ext_image_channel_type_t channel_type;
+  hsa_ext_image_channel_order_t channel_order;
+} hsa_ext_image_format_t;
+
+typedef struct hsa_ext_sampler_s
+{
+  uint64_t handle;
+} hsa_ext_sampler_t;
+typedef struct hsa_ext_image_data_info_s
+{
+  size_t size;
+  size_t alignment;
+} hsa_ext_image_data_info_t;
+typedef enum {
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED = 0,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE = 1,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER = 2,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT = 3,
+  HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT = 4
+} hsa_ext_sampler_addressing_mode_t;
+typedef struct hsa_ext_image_s
+{
+  uint64_t handle;
+} hsa_ext_image_t;
+typedef enum {
+  HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED = 0x0,
+  HSA_EXT_IMAGE_CAPABILITY_READ_ONLY = 0x1,
+  HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY = 0x2,
+  HSA_EXT_IMAGE_CAPABILITY_READ_WRITE = 0x4,
+  HSA_EXT_IMAGE_CAPABILITY_READ_MODIFY_WRITE = 0x8,
+  HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT = 0x10
+} hsa_ext_image_capability_t;
+typedef struct hsa_ext_control_directives_s
+{
+  uint64_t control_directives_mask;
+  uint16_t break_exceptions_mask;
+  uint16_t detect_exceptions_mask;
+  uint32_t max_dynamic_group_size;
+  uint64_t max_flat_grid_size;
+  uint32_t max_flat_workgroup_size;
+  uint32_t reserved1;
+  uint64_t required_grid_size[3];
+  hsa_dim3_t required_workgroup_size;
+  uint8_t required_dim;
+  uint8_t reserved2[75];
+} hsa_ext_control_directives_t;
+typedef enum {
+  HSA_EXT_SAMPLER_FILTER_MODE_NEAREST = 0,
+  HSA_EXT_SAMPLER_FILTER_MODE_LINEAR = 1
+} hsa_ext_sampler_filter_mode_t;
+
+typedef enum {
+  HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED = 0,
+  HSA_EXT_SAMPLER_COORDINATE_MODE_NORMALIZED = 1
+} hsa_ext_sampler_coordinate_mode_t;
+typedef enum {
+  HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO = -1
+} hsa_ext_finalizer_call_convention_t;
+typedef struct hsa_ext_program_s
+{
+  uint64_t handle;
+} hsa_ext_program_t;
+typedef struct hsa_ext_image_descriptor_s
+{
+  hsa_ext_image_geometry_t geometry;
+  size_t width;
+  size_t height;
+  size_t depth;
+  size_t array_size;
+  hsa_ext_image_format_t format;
+} hsa_ext_image_descriptor_t;
+typedef enum {
+  HSA_EXT_PROGRAM_INFO_MACHINE_MODEL = 0,
+  HSA_EXT_PROGRAM_INFO_PROFILE = 1,
+  HSA_EXT_PROGRAM_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 2
+} hsa_ext_program_info_t;
+typedef BrigModule_t hsa_ext_module_t;
+typedef struct hsa_ext_sampler_descriptor_s
+{
+  hsa_ext_sampler_coordinate_mode_t coordinate_mode;
+  hsa_ext_sampler_filter_mode_t filter_mode;
+  hsa_ext_sampler_addressing_mode_t address_mode;
+} hsa_ext_sampler_descriptor_t;
+
+typedef struct hsa_ext_image_region_s
+{
+  hsa_dim3_t offset;
+  hsa_dim3_t range;
+} hsa_ext_image_region_t;
+hsa_status_t hsa_ext_image_export (hsa_agent_t agent, hsa_ext_image_t src_image,
+				   void *dst_memory, size_t dst_row_pitch,
+				   size_t dst_slice_pitch,
+				   const hsa_ext_image_region_t *image_region);
+hsa_status_t hsa_ext_program_add_module (hsa_ext_program_t program,
+					 hsa_ext_module_t module);
+hsa_status_t hsa_ext_program_iterate_modules (
+  hsa_ext_program_t program,
+  hsa_status_t (*callback) (hsa_ext_program_t program, hsa_ext_module_t module,
+			    void *data),
+  void *data);
+hsa_status_t hsa_ext_program_create (
+  hsa_machine_model_t machine_model, hsa_profile_t profile,
+  hsa_default_float_rounding_mode_t default_float_rounding_mode,
+  const char *options, hsa_ext_program_t *program);
+hsa_status_t
+hsa_ext_image_data_get_info (hsa_agent_t agent,
+			     const hsa_ext_image_descriptor_t *image_descriptor,
+			     hsa_access_permission_t access_permission,
+			     hsa_ext_image_data_info_t *image_data_info);
+
+hsa_status_t hsa_ext_image_import (hsa_agent_t agent, const void *src_memory,
+				   size_t src_row_pitch, size_t src_slice_pitch,
+				   hsa_ext_image_t dst_image,
+				   const hsa_ext_image_region_t *image_region);
+hsa_status_t hsa_ext_program_get_info (hsa_ext_program_t program,
+				       hsa_ext_program_info_t attribute,
+				       void *value);
+enum
+{
+  HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED = 0x3000,
+  HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED = 0x3001
+};
+hsa_status_t hsa_ext_image_destroy (hsa_agent_t agent, hsa_ext_image_t image);
+hsa_status_t hsa_ext_image_get_capability (
+  hsa_agent_t agent, hsa_ext_image_geometry_t geometry,
+  const hsa_ext_image_format_t *image_format, uint32_t *capability_mask);
+enum
+{
+  HSA_EXT_STATUS_ERROR_INVALID_PROGRAM = 0x2000,
+  HSA_EXT_STATUS_ERROR_INVALID_MODULE = 0x2001,
+  HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE = 0x2002,
+  HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED = 0x2003,
+  HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH = 0x2004,
+  HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED = 0x2005,
+  HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH = 0x2006
+};
+hsa_status_t hsa_ext_sampler_destroy (hsa_agent_t agent,
+				      hsa_ext_sampler_t sampler);
+hsa_status_t hsa_ext_program_finalize (
+  hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention,
+  hsa_ext_control_directives_t control_directives, const char *options,
+  hsa_code_object_type_t code_object_type, hsa_code_object_t *code_object);
+hsa_status_t hsa_ext_image_create (
+  hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor,
+  const void *image_data, hsa_access_permission_t access_permission,
+  hsa_ext_image_t *image);
+hsa_status_t hsa_ext_program_destroy (hsa_ext_program_t program);
+hsa_status_t hsa_ext_image_copy (hsa_agent_t agent, hsa_ext_image_t src_image,
+				 const hsa_dim3_t *src_offset,
+				 hsa_ext_image_t dst_image,
+				 const hsa_dim3_t *dst_offset,
+				 const hsa_dim3_t *range);
+hsa_status_t hsa_ext_image_clear (hsa_agent_t agent, hsa_ext_image_t image,
+				  const void *data,
+				  const hsa_ext_image_region_t *image_region);
+enum
+{
+  HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS = 0x3000,
+  HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS = 0x3001,
+  HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS = 0x3002,
+  HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS = 0x3003,
+  HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS = 0x3004,
+  HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS = 0x3005,
+  HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS = 0x3006,
+  HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS = 0x3007,
+  HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS = 0x3008,
+  HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES = 0x3009,
+  HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES = 0x300A,
+  HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS = 0x300B
+};
+hsa_status_t
+hsa_ext_sampler_create (hsa_agent_t agent,
+			const hsa_ext_sampler_descriptor_t *sampler_descriptor,
+			hsa_ext_sampler_t *sampler);
+
+#endif /* _HSA_EXT_FINALIZE_H */
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index bed8555..b829c8c 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -27,16 +27,129 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "config.h"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <pthread.h>
-#include <hsa.h>
-#include <hsa_ext_finalize.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <plugin/hsa.h>
+#include <plugin/hsa_ext_finalize.h>
 #include <dlfcn.h>
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
 
+/* Secure getenv() which returns NULL if running as SUID/SGID.  */
+#ifndef HAVE_SECURE_GETENV
+#ifdef HAVE___SECURE_GETENV
+#define secure_getenv __secure_getenv
+#elif defined (HAVE_UNISTD_H) && defined(HAVE_GETUID) && defined(HAVE_GETEUID) \
+  && defined(HAVE_GETGID) && defined(HAVE_GETEGID)
+
+#include <unistd.h>
+
+/* Implementation of secure_getenv() for targets where it is not provided but
+   we have at least means to test real and effective IDs. */
+
+static char *
+secure_getenv (const char *name)
+{
+  if ((getuid () == geteuid ()) && (getgid () == getegid ()))
+    return getenv (name);
+  else
+    return NULL;
+}
+
+#else
+#define secure_getenv getenv
+#endif
+#endif
+
+/* As an HSA runtime is dlopened, following structure defines function
+   pointers utilized by the HSA plug-in.  */
+
+struct hsa_runtime_fn_info
+{
+  /* HSA runtime.  */
+  hsa_status_t (*hsa_status_string_fn) (hsa_status_t status,
+					const char **status_string);
+  hsa_status_t (*hsa_agent_get_info_fn) (hsa_agent_t agent,
+					 hsa_agent_info_t attribute,
+					 void *value);
+  hsa_status_t (*hsa_init_fn) (void);
+  hsa_status_t (*hsa_iterate_agents_fn)
+    (hsa_status_t (*callback)(hsa_agent_t agent, void *data), void *data);
+  hsa_status_t (*hsa_region_get_info_fn) (hsa_region_t region,
+					  hsa_region_info_t attribute,
+					  void *value);
+  hsa_status_t (*hsa_queue_create_fn)
+    (hsa_agent_t agent, uint32_t size, hsa_queue_type_t type,
+     void (*callback)(hsa_status_t status, hsa_queue_t *source, void *data),
+     void *data, uint32_t private_segment_size,
+     uint32_t group_segment_size, hsa_queue_t **queue);
+  hsa_status_t (*hsa_agent_iterate_regions_fn)
+    (hsa_agent_t agent,
+     hsa_status_t (*callback)(hsa_region_t region, void *data), void *data);
+  hsa_status_t (*hsa_executable_destroy_fn) (hsa_executable_t executable);
+  hsa_status_t (*hsa_executable_create_fn)
+    (hsa_profile_t profile, hsa_executable_state_t executable_state,
+     const char *options, hsa_executable_t *executable);
+  hsa_status_t (*hsa_executable_global_variable_define_fn)
+    (hsa_executable_t executable, const char *variable_name, void *address);
+  hsa_status_t (*hsa_executable_load_code_object_fn)
+    (hsa_executable_t executable, hsa_agent_t agent,
+     hsa_code_object_t code_object, const char *options);
+  hsa_status_t (*hsa_executable_freeze_fn)(hsa_executable_t executable,
+					   const char *options);
+  hsa_status_t (*hsa_signal_create_fn) (hsa_signal_value_t initial_value,
+					uint32_t num_consumers,
+					const hsa_agent_t *consumers,
+					hsa_signal_t *signal);
+  hsa_status_t (*hsa_memory_allocate_fn) (hsa_region_t region, size_t size,
+					  void **ptr);
+  hsa_status_t (*hsa_memory_free_fn) (void *ptr);
+  hsa_status_t (*hsa_signal_destroy_fn) (hsa_signal_t signal);
+  hsa_status_t (*hsa_executable_get_symbol_fn)
+    (hsa_executable_t executable, const char *module_name,
+     const char *symbol_name, hsa_agent_t agent, int32_t call_convention,
+     hsa_executable_symbol_t *symbol);
+  hsa_status_t (*hsa_executable_symbol_get_info_fn)
+    (hsa_executable_symbol_t executable_symbol,
+     hsa_executable_symbol_info_t attribute, void *value);
+  uint64_t (*hsa_queue_add_write_index_release_fn) (const hsa_queue_t *queue,
+						    uint64_t value);
+  uint64_t (*hsa_queue_load_read_index_acquire_fn) (const hsa_queue_t *queue);
+  void (*hsa_signal_store_relaxed_fn) (hsa_signal_t signal,
+				       hsa_signal_value_t value);
+  void (*hsa_signal_store_release_fn) (hsa_signal_t signal,
+				       hsa_signal_value_t value);
+  hsa_signal_value_t (*hsa_signal_wait_acquire_fn)
+    (hsa_signal_t signal, hsa_signal_condition_t condition,
+     hsa_signal_value_t compare_value, uint64_t timeout_hint,
+     hsa_wait_state_t wait_state_hint);
+  hsa_signal_value_t (*hsa_signal_load_acquire_fn) (hsa_signal_t signal);
+  hsa_status_t (*hsa_queue_destroy_fn) (hsa_queue_t *queue);
+
+  /* HSA finalizer.  */
+  hsa_status_t (*hsa_ext_program_add_module_fn) (hsa_ext_program_t program,
+						 hsa_ext_module_t module);
+  hsa_status_t (*hsa_ext_program_create_fn)
+    (hsa_machine_model_t machine_model, hsa_profile_t profile,
+     hsa_default_float_rounding_mode_t default_float_rounding_mode,
+     const char *options, hsa_ext_program_t *program);
+  hsa_status_t (*hsa_ext_program_destroy_fn) (hsa_ext_program_t program);
+  hsa_status_t (*hsa_ext_program_finalize_fn)
+    (hsa_ext_program_t program,hsa_isa_t isa,
+     int32_t call_convention, hsa_ext_control_directives_t control_directives,
+     const char *options, hsa_code_object_type_t code_object_type,
+     hsa_code_object_t *code_object);
+};
+
+/* HSA runtime functions that are initialized in init_hsa_context.  */
+
+static struct hsa_runtime_fn_info hsa_fns;
+
 /* Keep the following GOMP prefixed structures in sync with respective parts of
    the compiler.  */
 
@@ -129,20 +242,36 @@ static bool debug;
 
 static bool suppress_host_fallback;
 
+/* Flag to locate HSA runtime shared library that is dlopened
+   by this plug-in.  */
+
+static const char *hsa_runtime_lib;
+
+/* Flag to decide if the runtime should support also CPU devices (can be
+   a simulator).  */
+
+static bool support_cpu_devices;
+
 /* Initialize debug and suppress_host_fallback according to the environment.  */
 
 static void
 init_enviroment_variables (void)
 {
-  if (getenv ("HSA_DEBUG"))
+  if (secure_getenv ("HSA_DEBUG"))
     debug = true;
   else
     debug = false;
 
-  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+  if (secure_getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
     suppress_host_fallback = true;
   else
     suppress_host_fallback = false;
+
+  hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
+  if (hsa_runtime_lib == NULL)
+    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+
+  support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
 
 /* Print a logging message with PREFIX to stderr if HSA_DEBUG value
@@ -176,7 +305,7 @@ hsa_warn (const char *str, hsa_status_t status)
     return;
 
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
 
   fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error_msg);
 }
@@ -188,7 +317,7 @@ static void
 hsa_fatal (const char *str, hsa_status_t status)
 {
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
   GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
 		     hsa_error_msg);
 }
@@ -200,7 +329,7 @@ static bool
 hsa_error (const char *str, hsa_status_t status)
 {
   const char *hsa_error_msg;
-  hsa_status_string (status, &hsa_error_msg);
+  hsa_fns.hsa_status_string_fn (status, &hsa_error_msg);
   GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str,
 		     hsa_error_msg);
   return false;
@@ -359,6 +488,50 @@ struct hsa_context_info
 
 static struct hsa_context_info hsa_context;
 
+#define DLSYM_FN(function) \
+  hsa_fns.function##_fn = dlsym (handle, #function); \
+  if (hsa_fns.function##_fn == NULL) \
+    return false;
+
+static bool
+init_hsa_runtime_functions (void)
+{
+  void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
+  if (handle == NULL)
+    return false;
+
+  DLSYM_FN (hsa_status_string)
+  DLSYM_FN (hsa_agent_get_info)
+  DLSYM_FN (hsa_init)
+  DLSYM_FN (hsa_iterate_agents)
+  DLSYM_FN (hsa_region_get_info)
+  DLSYM_FN (hsa_queue_create)
+  DLSYM_FN (hsa_agent_iterate_regions)
+  DLSYM_FN (hsa_executable_destroy)
+  DLSYM_FN (hsa_executable_create)
+  DLSYM_FN (hsa_executable_global_variable_define)
+  DLSYM_FN (hsa_executable_load_code_object)
+  DLSYM_FN (hsa_executable_freeze)
+  DLSYM_FN (hsa_signal_create)
+  DLSYM_FN (hsa_memory_allocate)
+  DLSYM_FN (hsa_memory_free)
+  DLSYM_FN (hsa_signal_destroy)
+  DLSYM_FN (hsa_executable_get_symbol)
+  DLSYM_FN (hsa_executable_symbol_get_info)
+  DLSYM_FN (hsa_queue_add_write_index_release)
+  DLSYM_FN (hsa_queue_load_read_index_acquire)
+  DLSYM_FN (hsa_signal_wait_acquire)
+  DLSYM_FN (hsa_signal_store_relaxed)
+  DLSYM_FN (hsa_signal_store_release)
+  DLSYM_FN (hsa_signal_load_acquire)
+  DLSYM_FN (hsa_queue_destroy)
+  DLSYM_FN (hsa_ext_program_add_module)
+  DLSYM_FN (hsa_ext_program_create)
+  DLSYM_FN (hsa_ext_program_destroy)
+  DLSYM_FN (hsa_ext_program_finalize)
+  return true;
+}
+
 /* Find kernel for an AGENT by name provided in KERNEL_NAME.  */
 
 static struct kernel_info *
@@ -386,17 +559,32 @@ suitable_hsa_agent_p (hsa_agent_t agent)
 {
   hsa_device_type_t device_type;
   hsa_status_t status
-    = hsa_agent_get_info (agent, HSA_AGENT_INFO_DEVICE, &device_type);
-  if (status != HSA_STATUS_SUCCESS || device_type != HSA_DEVICE_TYPE_GPU)
+    = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_DEVICE,
+				     &device_type);
+  if (status != HSA_STATUS_SUCCESS)
     return false;
 
+  switch (device_type)
+    {
+    case HSA_DEVICE_TYPE_GPU:
+      break;
+    case HSA_DEVICE_TYPE_CPU:
+      if (!support_cpu_devices)
+	return false;
+      break;
+    default:
+      return false;
+    }
+
   uint32_t features = 0;
-  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_FEATURE, &features);
+  status = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_FEATURE,
+					  &features);
   if (status != HSA_STATUS_SUCCESS
       || !(features & HSA_AGENT_FEATURE_KERNEL_DISPATCH))
     return false;
   hsa_queue_type_t queue_type;
-  status = hsa_agent_get_info (agent, HSA_AGENT_INFO_QUEUE_TYPE, &queue_type);
+  status = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_QUEUE_TYPE,
+					  &queue_type);
   if (status != HSA_STATUS_SUCCESS
       || (queue_type != HSA_QUEUE_TYPE_MULTI))
     return false;
@@ -443,11 +631,16 @@ init_hsa_context (void)
   if (hsa_context.initialized)
     return true;
   init_enviroment_variables ();
-  status = hsa_init ();
+  if (!init_hsa_runtime_functions ())
+    {
+      HSA_DEBUG ("Run-time could not be dynamically opened\n");
+      return false;
+    }
+  status = hsa_fns.hsa_init_fn ();
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Run-time could not be initialized", status);
   HSA_DEBUG ("HSA run-time initialized\n");
-  status = hsa_iterate_agents (count_gpu_agents, NULL);
+  status = hsa_fns.hsa_iterate_agents_fn (count_gpu_agents, NULL);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("HSA GPU devices could not be enumerated", status);
   HSA_DEBUG ("There are %i HSA GPU devices.\n", hsa_context.agent_count);
@@ -455,7 +648,7 @@ init_hsa_context (void)
   hsa_context.agents
     = GOMP_PLUGIN_malloc_cleared (hsa_context.agent_count
 				  * sizeof (struct agent_info));
-  status = hsa_iterate_agents (assign_agent_ids, &agent_index);
+  status = hsa_fns.hsa_iterate_agents_fn (assign_agent_ids, &agent_index);
   if (agent_index != hsa_context.agent_count)
     {
       GOMP_PLUGIN_error ("Failed to assign IDs to all HSA agents");
@@ -485,14 +678,16 @@ get_kernarg_memory_region (hsa_region_t region, void *data)
   hsa_status_t status;
   hsa_region_segment_t segment;
 
-  status = hsa_region_get_info (region, HSA_REGION_INFO_SEGMENT, &segment);
+  status = hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SEGMENT,
+					   &segment);
   if (status != HSA_STATUS_SUCCESS)
     return status;
   if (segment != HSA_REGION_SEGMENT_GLOBAL)
     return HSA_STATUS_SUCCESS;
 
   uint32_t flags;
-  status = hsa_region_get_info (region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags);
+  status = hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_GLOBAL_FLAGS,
+					   &flags);
   if (status != HSA_STATUS_SUCCESS)
     return status;
   if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG)
@@ -546,29 +741,36 @@ GOMP_OFFLOAD_init_device (int n)
 
   uint32_t queue_size;
   hsa_status_t status;
-  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_QUEUE_MAX_SIZE,
-			       &queue_size);
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id,
+					  HSA_AGENT_INFO_QUEUE_MAX_SIZE,
+					  &queue_size);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error requesting maximum queue size of the HSA agent",
-		      status);
-  status = hsa_agent_get_info (agent->id, HSA_AGENT_INFO_ISA, &agent->isa);
+    	   	      status);
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_ISA,
+					  &agent->isa);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error querying the ISA of the agent", status);
-  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
-			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
-			     &agent->command_q);
+  status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
+					HSA_QUEUE_TYPE_MULTI,
+					queue_callback, NULL, UINT32_MAX,
+					UINT32_MAX,
+					&agent->command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error creating command queue", status);
 
-  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
-			     queue_callback, NULL, UINT32_MAX, UINT32_MAX,
-			     &agent->kernel_dispatch_command_q);
+  status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
+					HSA_QUEUE_TYPE_MULTI,
+					queue_callback, NULL, UINT32_MAX,
+					UINT32_MAX,
+					&agent->kernel_dispatch_command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error creating kernel dispatch command queue", status);
 
   agent->kernarg_region.handle = (uint64_t) -1;
-  status = hsa_agent_iterate_regions (agent->id, get_kernarg_memory_region,
-				      &agent->kernarg_region);
+  status = hsa_fns.hsa_agent_iterate_regions_fn (agent->id,
+						 get_kernarg_memory_region,
+						 &agent->kernarg_region);
   if (agent->kernarg_region.handle == (uint64_t) -1)
     {
       GOMP_PLUGIN_error ("Could not find suitable memory region for kernel "
@@ -646,7 +848,7 @@ destroy_hsa_program (struct agent_info *agent)
 
   HSA_DEBUG ("Destroying the current HSA program.\n");
 
-  status = hsa_executable_destroy (agent->executable);
+  status = hsa_fns.hsa_executable_destroy_fn (agent->executable);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Could not destroy HSA executable", status);
 
@@ -661,6 +863,29 @@ destroy_hsa_program (struct agent_info *agent)
   return true;
 }
 
+/* Initialize KERNEL from D and other parameters.  Return true on success. */
+
+static bool
+init_basic_kernel_info (struct kernel_info *kernel,
+			struct hsa_kernel_description *d,
+			struct agent_info *agent,
+			struct module_info *module)
+{
+  kernel->agent = agent;
+  kernel->module = module;
+  kernel->name = d->name;
+  kernel->omp_data_size = d->omp_data_size;
+  kernel->gridified_kernel_p = d->gridified_kernel_p;
+  kernel->dependencies_count = d->kernel_dependencies_count;
+  kernel->dependencies = d->kernel_dependencies;
+  if (pthread_mutex_init (&kernel->init_mutex, NULL))
+    {
+      GOMP_PLUGIN_error ("Failed to initialize an HSA kernel mutex");
+      return false;
+    }
+  return true;
+}
+
 /* Part of the libgomp plugin interface.  Load BRIG module described by struct
    brig_image_desc in TARGET_DATA and return references to kernel descriptors
    in TARGET_TABLE.  */
@@ -715,19 +940,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, void *target_data,
       pair->end = (uintptr_t) (kernel + 1);
 
       struct hsa_kernel_description *d = &image_desc->kernel_infos[i];
-      kernel->agent = agent;
-      kernel->module = module;
-      kernel->name = d->name;
-      kernel->omp_data_size = d->omp_data_size;
-      kernel->gridified_kernel_p = d->gridified_kernel_p;
-      kernel->dependencies_count = d->kernel_dependencies_count;
-      kernel->dependencies = d->kernel_dependencies;
-      if (pthread_mutex_init (&kernel->init_mutex, NULL))
-	{
-	  GOMP_PLUGIN_error ("Failed to initialize an HSA kernel mutex");
-	  return -1;
-	}
-
+      if (!init_basic_kernel_info (kernel, d, agent, module))
+	return -1;
       kernel++;
       pair++;
     }
@@ -799,9 +1013,10 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   if (agent->prog_finalized)
     goto final;
 
-  status = hsa_ext_program_create (HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL,
-				   HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT,
-				   NULL, &prog_handle);
+  status = hsa_fns.hsa_ext_program_create_fn
+    (HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL,
+     HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT,
+     NULL, &prog_handle);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not create an HSA program", status);
 
@@ -810,8 +1025,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   struct module_info *module = agent->first_module;
   while (module)
     {
-      status = hsa_ext_program_add_module (prog_handle,
-					   module->image_desc->brig_module);
+      status = hsa_fns.hsa_ext_program_add_module_fn
+	(prog_handle, module->image_desc->brig_module);
       if (status != HSA_STATUS_SUCCESS)
 	hsa_fatal ("Could not add a module to the HSA program", status);
       module = module->next;
@@ -837,7 +1052,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 	  continue;
 	}
 
-      status = hsa_ext_program_add_module (prog_handle, library->image);
+      status = hsa_fns.hsa_ext_program_add_module_fn (prog_handle,
+						      library->image);
       if (status != HSA_STATUS_SUCCESS)
 	hsa_warn ("Could not add a shared BRIG library the HSA program",
 		  status);
@@ -849,11 +1065,9 @@ create_and_finalize_hsa_program (struct agent_info *agent)
   hsa_ext_control_directives_t control_directives;
   memset (&control_directives, 0, sizeof (control_directives));
   hsa_code_object_t code_object;
-  status = hsa_ext_program_finalize (prog_handle, agent->isa,
-				     HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO,
-				     control_directives, "",
-				     HSA_CODE_OBJECT_TYPE_PROGRAM,
-				     &code_object);
+  status = hsa_fns.hsa_ext_program_finalize_fn
+    (prog_handle, agent->isa,HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO,
+     control_directives, "", HSA_CODE_OBJECT_TYPE_PROGRAM, &code_object);
   if (status != HSA_STATUS_SUCCESS)
     {
       hsa_warn ("Finalization of the HSA program failed", status);
@@ -861,11 +1075,12 @@ create_and_finalize_hsa_program (struct agent_info *agent)
     }
 
   HSA_DEBUG ("Finalization done\n");
-  hsa_ext_program_destroy (prog_handle);
+  hsa_fns.hsa_ext_program_destroy_fn (prog_handle);
 
   status
-    = hsa_executable_create (HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN,
-			     "", &agent->executable);
+    = hsa_fns.hsa_executable_create_fn (HSA_PROFILE_FULL,
+					HSA_EXECUTABLE_STATE_UNFROZEN,
+					"", &agent->executable);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not create HSA executable", status);
 
@@ -877,9 +1092,8 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 	{
 	  struct global_var_info *var;
 	  var = &module->image_desc->global_variables[i];
-	  status
-	    = hsa_executable_global_variable_define (agent->executable,
-						     var->name, var->address);
+	  status = hsa_fns.hsa_executable_global_variable_define_fn
+	    (agent->executable, var->name, var->address);
 
 	  HSA_DEBUG ("Defining global variable: %s, address: %p\n", var->name,
 		     var->address);
@@ -892,11 +1106,12 @@ create_and_finalize_hsa_program (struct agent_info *agent)
       module = module->next;
     }
 
-  status = hsa_executable_load_code_object (agent->executable, agent->id,
-					    code_object, "");
+  status = hsa_fns.hsa_executable_load_code_object_fn (agent->executable,
+						       agent->id,
+						       code_object, "");
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not add a code object to the HSA executable", status);
-  status = hsa_executable_freeze (agent->executable, "");
+  status = hsa_fns.hsa_executable_freeze_fn (agent->executable, "");
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not freeze the HSA executable", status);
 
@@ -937,7 +1152,7 @@ create_single_kernel_dispatch (struct kernel_info *kernel,
   shadow->object = kernel->object;
 
   hsa_signal_t sync_signal;
-  hsa_status_t status = hsa_signal_create (1, 0, NULL, &sync_signal);
+  hsa_status_t status = hsa_fns.hsa_signal_create_fn (1, 0, NULL, &sync_signal);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Error creating the HSA sync signal", status);
 
@@ -946,8 +1161,9 @@ create_single_kernel_dispatch (struct kernel_info *kernel,
   shadow->group_segment_size = kernel->group_segment_size;
 
   status
-    = hsa_memory_allocate (agent->kernarg_region, kernel->kernarg_segment_size,
-			   &shadow->kernarg_address);
+    = hsa_fns.hsa_memory_allocate_fn (agent->kernarg_region,
+				      kernel->kernarg_segment_size,
+				      &shadow->kernarg_address);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not allocate memory for HSA kernel arguments", status);
 
@@ -962,11 +1178,11 @@ release_kernel_dispatch (struct GOMP_hsa_kernel_dispatch *shadow)
   HSA_DEBUG ("Released kernel dispatch: %p has value: %lu (%p)\n", shadow,
 	     shadow->debug, (void *) shadow->debug);
 
-  hsa_memory_free (shadow->kernarg_address);
+  hsa_fns.hsa_memory_free_fn (shadow->kernarg_address);
 
   hsa_signal_t s;
   s.handle = shadow->signal;
-  hsa_signal_destroy (s);
+  hsa_fns.hsa_signal_destroy_fn (s);
 
   free (shadow->omp_data_memory);
 
@@ -986,31 +1202,30 @@ init_single_kernel (struct kernel_info *kernel, unsigned *max_omp_data_size)
   hsa_status_t status;
   struct agent_info *agent = kernel->agent;
   hsa_executable_symbol_t kernel_symbol;
-  status = hsa_executable_get_symbol (agent->executable, NULL, kernel->name,
-				      agent->id, 0, &kernel_symbol);
+  status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
+						 kernel->name, agent->id,
+						 0, &kernel_symbol);
   if (status != HSA_STATUS_SUCCESS)
     {
       hsa_warn ("Could not find symbol for kernel in the code object", status);
       goto failure;
     }
   HSA_DEBUG ("Located kernel %s\n", kernel->name);
-  status
-    = hsa_executable_symbol_get_info (kernel_symbol,
-				      HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT,
-				      &kernel->object);
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
+    (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel->object);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not extract a kernel object from its symbol", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE,
      &kernel->kernarg_segment_size);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not get info about kernel argument size", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE,
      &kernel->group_segment_size);
   if (status != HSA_STATUS_SUCCESS)
     hsa_fatal ("Could not get info about kernel group segment size", status);
-  status = hsa_executable_symbol_get_info
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
     (kernel_symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE,
      &kernel->private_segment_size);
   if (status != HSA_STATUS_SUCCESS)
@@ -1209,18 +1424,43 @@ parse_target_attributes (void **input,
   struct GOMP_kernel_launch_attributes *kla;
   kla = (struct GOMP_kernel_launch_attributes *) *input;
   *result = kla;
-  if (kla->ndim != 1)
-    GOMP_PLUGIN_fatal ("HSA does not yet support number of dimensions "
-		       "different from one.");
-  if (kla->gdims[0] == 0)
-    return false;
-
-  HSA_DEBUG ("GOMP_OFFLOAD_run called with grid size %u and group size %u\n",
-	     kla->gdims[0], kla->wdims[0]);
+  if (kla->ndim == 0 || kla->ndim > 3)
+    GOMP_PLUGIN_fatal ("Invalid number of dimensions (%u)", kla->ndim);
 
+  HSA_DEBUG ("GOMP_OFFLOAD_run called with %u dimensions:\n", kla->ndim);
+  unsigned i;
+  for (i = 0; i < kla->ndim; i++)
+    {
+      HSA_DEBUG ("  Dimension %u: grid size %u and group size %u\n", i,
+		 kla->gdims[i], kla->wdims[i]);
+      if (kla->gdims[i] == 0)
+	return false;
+    }
   return true;
 }
 
+/* Return the group size given the requested GROUP size, GRID size and number
+   of grid dimensions NDIM.  */
+
+static uint32_t
+get_group_size (uint32_t ndim, uint32_t grid, uint32_t group)
+{
+  if (group == 0)
+    {
+      /* TODO: Provide a default via environment or device characteristics.  */
+      if (ndim == 1)
+	group = 64;
+      else if (ndim == 2)
+	group = 8;
+      else
+	group = 4;
+    }
+
+  if (group > grid)
+    group = grid;
+  return group;
+}
+
 /* Return true if the HSA runtime can run function FN_PTR.  */
 
 bool
@@ -1254,22 +1494,14 @@ packet_store_release (uint32_t* packet, uint16_t header, uint16_t rest)
   __atomic_store_n (packet, header | (rest << 16), __ATOMIC_RELEASE);
 }
 
-/* Part of the libgomp plugin interface.  Run a kernel on device N and pass it
-   an array of pointers in VARS as a parameter.  The kernel is identified by
-   FN_PTR which must point to a kernel_info structure.  */
+/* Run KERNEL on its agent, pass VARS to it as arguments and take
+   launchattributes from KLA.  */
 
 void
-GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
+run_kernel (struct kernel_info *kernel, void *vars,
+	    struct GOMP_kernel_launch_attributes *kla)
 {
-  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
   struct agent_info *agent = kernel->agent;
-  struct GOMP_kernel_launch_attributes def;
-  struct GOMP_kernel_launch_attributes *kla;
-  if (!parse_target_attributes (args, &def, &kla))
-    {
-      HSA_DEBUG ("Will not run HSA kernel because the grid size is zero\n");
-      return;
-    }
   if (pthread_rwlock_rdlock (&agent->modules_rwlock))
     GOMP_PLUGIN_fatal ("Unable to read-lock an HSA agent rwlock");
 
@@ -1288,11 +1520,12 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
       print_kernel_dispatch (shadow, 2);
     }
 
-  uint64_t index = hsa_queue_add_write_index_release (agent->command_q, 1);
+  uint64_t index
+    = hsa_fns.hsa_queue_add_write_index_release_fn (agent->command_q, 1);
   HSA_DEBUG ("Got AQL index %llu\n", (long long int) index);
 
   /* Wait until the queue is not full before writing the packet.   */
-  while (index - hsa_queue_load_read_index_acquire (agent->command_q)
+  while (index - hsa_fns.hsa_queue_load_read_index_acquire_fn (agent->command_q)
 	 >= agent->command_q->size)
     ;
 
@@ -1302,17 +1535,33 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
 
   memset (((uint8_t *) packet) + 4, 0, sizeof (*packet) - 4);
   packet->grid_size_x = kla->gdims[0];
-  uint32_t wgs = kla->wdims[0];
-  if (wgs == 0)
-    /* TODO: Provide a default via environment.  */
-    wgs = 64;
-  else if (wgs > kla->gdims[0])
-    wgs = kla->gdims[0];
-  packet->workgroup_size_x = wgs;
-  packet->grid_size_y = 1;
-  packet->workgroup_size_y = 1;
-  packet->grid_size_z = 1;
-  packet->workgroup_size_z = 1;
+  packet->workgroup_size_x = get_group_size (kla->ndim, kla->gdims[0],
+					     kla->wdims[0]);
+
+  if (kla->ndim >= 2)
+    {
+      packet->grid_size_y = kla->gdims[1];
+      packet->workgroup_size_y = get_group_size (kla->ndim, kla->gdims[1],
+						 kla->wdims[1]);
+    }
+  else
+    {
+      packet->grid_size_y = 1;
+      packet->workgroup_size_y = 1;
+    }
+
+  if (kla->ndim == 3)
+    {
+      packet->grid_size_z = kla->gdims[2];
+      packet->workgroup_size_z = get_group_size (kla->ndim, kla->gdims[2],
+					     kla->wdims[2]);
+    }
+  else
+    {
+      packet->grid_size_z = 1;
+      packet->workgroup_size_z = 1;
+    }
+
   packet->private_segment_size = kernel->private_segment_size;
   packet->group_segment_size = kernel->group_segment_size;
   packet->kernel_object = kernel->object;
@@ -1320,7 +1569,7 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   hsa_signal_t s;
   s.handle = shadow->signal;
   packet->completion_signal = s;
-  hsa_signal_store_relaxed (s, 1);
+  hsa_fns.hsa_signal_store_relaxed_fn (s, 1);
   memcpy (shadow->kernarg_address, &vars, sizeof (vars));
 
   /* PR hsa/70337.  */
@@ -1344,9 +1593,10 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   HSA_DEBUG ("Going to dispatch kernel %s\n", kernel->name);
 
   packet_store_release ((uint32_t *) packet, header,
-			1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS);
+			(uint16_t) kla->ndim << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS);
 
-  hsa_signal_store_release (agent->command_q->doorbell_signal, index);
+  hsa_fns.hsa_signal_store_release_fn (agent->command_q->doorbell_signal,
+				       index);
 
   /* TODO: GPU agents in Carrizo APUs cannot properly update L2 cache for
      signal wait and signal load operations on their own and we need to
@@ -1357,8 +1607,9 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
   HSA_DEBUG ("Kernel dispatched, waiting for completion\n");
 
   /* Root signal waits with 1ms timeout.  */
-  while (hsa_signal_wait_acquire (s, HSA_SIGNAL_CONDITION_LT, 1, 1000 * 1000,
-				  HSA_WAIT_STATE_BLOCKED) != 0)
+  while (hsa_fns.hsa_signal_wait_acquire_fn (s, HSA_SIGNAL_CONDITION_LT, 1,
+					     1000 * 1000,
+					     HSA_WAIT_STATE_BLOCKED) != 0)
     for (unsigned i = 0; i < shadow->kernel_dispatch_count; i++)
       {
 	hsa_signal_t child_s;
@@ -1366,7 +1617,7 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
 
 	HSA_DEBUG ("Waiting for children completion signal: %lu\n",
 		   shadow->children_dispatches[i]->signal);
-	hsa_signal_load_acquire (child_s);
+	hsa_fns.hsa_signal_load_acquire_fn (child_s);
       }
 
   release_kernel_dispatch (shadow);
@@ -1375,6 +1626,26 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void **args)
     GOMP_PLUGIN_fatal ("Unable to unlock an HSA agent rwlock");
 }
 
+/* Part of the libgomp plugin interface.  Run a kernel on device N (the number
+   is actually ignored, we assume the FN_PTR has been mapped using the correct
+   device) and pass it an array of pointers in VARS as a parameter.  The kernel
+   is identified by FN_PTR which must point to a kernel_info structure.  */
+
+void
+GOMP_OFFLOAD_run (int n __attribute__((unused)),
+		  void *fn_ptr, void *vars, void **args)
+{
+  struct kernel_info *kernel = (struct kernel_info *) fn_ptr;
+  struct GOMP_kernel_launch_attributes def;
+  struct GOMP_kernel_launch_attributes *kla;
+  if (!parse_target_attributes (args, &def, &kla))
+    {
+      HSA_DEBUG ("Will not run HSA kernel because the grid size is zero\n");
+      return;
+    }
+  run_kernel (kernel, vars, kla);
+}
+
 /* Information to be passed to a thread running a kernel asycnronously.  */
 
 struct async_run_info
@@ -1534,10 +1805,10 @@ GOMP_OFFLOAD_fini_device (int n)
 
   release_agent_shared_libraries (agent);
 
-  hsa_status_t status = hsa_queue_destroy (agent->command_q);
+  hsa_status_t status = hsa_fns.hsa_queue_destroy_fn (agent->command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error destroying command queue", status);
-  status = hsa_queue_destroy (agent->kernel_dispatch_command_q);
+  status = hsa_fns.hsa_queue_destroy_fn (agent->kernel_dispatch_command_q);
   if (status != HSA_STATUS_SUCCESS)
     return hsa_error ("Error destroying kernel dispatch command queue", status);
   if (pthread_mutex_destroy (&agent->prog_mutex))
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 1cb4991..50ec8a7 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -205,13 +205,9 @@ proc libgomp_init { args } {
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
 	global hsa_runtime_lib
-	global hsa_kmt_lib
 	if { $hsa_runtime_lib != "" } {
 	    append always_ld_library_path ":$hsa_runtime_lib"
 	}
-	if { $hsa_kmt_lib != "" } {
-	    append always_ld_library_path ":$hsa_kmt_lib"
-	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 5a724fb..a5250a8 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
 set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
-set hsa_kmt_lib "@HSA_KMT_LIB@"
 
 set offload_targets "@offload_targets@"
-- 
2.10.2

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] HSA specific built-ins
  2016-11-18 10:27   ` Jakub Jelinek
@ 2016-11-22 13:30     ` Martin Jambor
  0 siblings, 0 replies; 36+ messages in thread
From: Martin Jambor @ 2016-11-22 13:30 UTC (permalink / raw)
  To: GCC Patches

On Fri, Nov 18, 2016 at 11:27:24AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 08:39:35PM +0100, Martin Jambor wrote:
> > Hello,
> > 
> > this patch adds a small file hsa-builtins.def which defines a few
> > builtins that I then use in OpenMP lowering and expansion.
> > 
> > After we split gridification stuff in omp-low.c to a separate file, we
> > should be able to only conditionally include the file and remove the
> > weird conditional ifdef.
> > 
> > OK for trunk?
> 
> Does this work well even with lto and jit FEs?  Ok for trunk if it does.

I have enabled jit, ran its testsuite and compared the results to ones
from unpatched trunk and found no new failures.  I have also
lto-bootstrapped the patch with both hsa enabled and disabled so that
should be fine too.  Thus, I consider the patch approved.

Thank you very much,

Martin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] OpenMP lowering changes from the hsa branch
  2016-11-18 10:39   ` Jakub Jelinek
  2016-03-16 14:13     ` [omp] Create openmp -fopt-info optimization group Martin Jambor
@ 2016-11-22 13:43     ` Martin Jambor
  2017-02-22  7:58       ` Rename the "openmp" group of optimizations to "omp" (was: [PATCH 3/4] OpenMP lowering changes from the hsa branch) Thomas Schwinge
  1 sibling, 1 reply; 36+ messages in thread
From: Martin Jambor @ 2016-11-22 13:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Hi,

On Fri, Nov 18, 2016 at 11:38:56AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> > +  size_t collapse = gimple_omp_for_collapse (for_stmt);
> > +  struct omp_for_data_loop *loops
> > +    = (struct omp_for_data_loop *)
> > +    alloca (gimple_omp_for_collapse (for_stmt)
> > +	    * sizeof (struct omp_for_data_loop));
> 
> Use
>   struct omp_for_data_loop *loops
>     = XALLOCAVEC (struct omp_for_data_loop,
> 		  gimple_omp_for_collapse (for_stmt));
> instead?

I have changed it as you suggested.

> 
> > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
> >  {
> >    GIMPLE_PASS, /* type */
> >    "ompexp", /* name */
> > -  OPTGROUP_NONE, /* optinfo_flags */
> > +  OPTGROUP_OPENMP, /* optinfo_flags */
> >    TV_NONE, /* tv_id */
> >    PROP_gimple_any, /* properties_required */
> >    PROP_gimple_eomp, /* properties_provided */
> 
> What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What about
> openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
> ompdevlow pass soon.

I was not sure about those at first, but I suppose all of them should
also be in the same group (though I hope the name is still fine), so I
added them.  I will make sure that ompdevlow pass will be in it as
well, whether it gets in before or after this.

> 
> Otherwise LGTM.

Thanks,  the updated patch is below.  I have tested the whole patch
set by by bootstrapping, lto-bootstrapping and testing on x86_64-linux
and bootstrapping and testing on aarch64-linux.  I will commit it when
the first patch is approved.

Thank you very much for the review,

Martin


2016-11-21  Martin Jambor  <mjambor@suse.cz>

gcc/
	* dumpfile.h (OPTGROUP_OPENMP): Define.
	* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
	* gimple.h (gf_mask): Added elements GF_OMP_FOR_GRID_INTRA_GROUP and
	GF_OMP_FOR_GRID_GROUP_ITER.
	(gimple_omp_for_grid_phony): Added checking assert.
	(gimple_omp_for_set_grid_phony): Likewise.
	(gimple_omp_for_grid_intra_group): New function.
	(gimple_omp_for_set_grid_intra_group): Likewise.
	(gimple_omp_for_grid_group_iter): Likewise.
	(gimple_omp_for_set_grid_group_iter): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Allow GRID loop where
	previosuly only distribute loop was permitted.
	(lower_lastprivate_clauses): Allow non tcc_comparison predicates.
	(grid_get_kernel_launch_attributes): Support multiple HSA grid
	dimensions.
	(grid_expand_omp_for_loop): Likewise and also support standalone
	distribute constructs.  New parameter INTRA_GROUP, updated both users.
	(grid_expand_target_grid_body): Support standalone distribute
	constructs.
	(pass_data_expand_omp): Changed optinfo_flags to OPTGROUP_OPENMP.
	(pass_data_expand_omp_ssa): Likewise.
	(pass_data_lower_omp): Likewise.
	(pass_data_diagnose_omp_blocks): Likewise.
	(pass_data_oacc_device_lower): Likewise.
	(pass_data_omp_target_link): Likewise.
	(grid_lastprivate_predicate): New function.
	(lower_omp_for_lastprivate): Call grid_lastprivate_predicate for
	gridified loops.
	(lower_omp_for): Support standalone distribute constructs.
	(grid_prop): New type.
	(grid_safe_assignment_p): Check for assignments to group_sizes, new
	parameter GRID.
	(grid_seq_only_contains_local_assignments): New parameter GRID, pass
	it to callee.
	(grid_find_single_omp_among_assignments_1): Likewise, improve missed
	optimization info messages.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_find_ungridifiable_statement): Do not bail out for SIMDs.
	(grid_parallel_clauses_gridifiable): New function.
	(grid_inner_loop_gridifiable_p): Likewise.
	(grid_dist_follows_simple_pattern): Likewise.
	(grid_gfor_follows_tiling_pattern): Likewise.
	(grid_call_permissible_in_distribute_p): Likewise.
	(grid_handle_call_in_distribute): Likewise.
	(grid_dist_follows_tiling_pattern): Likewise.
	(grid_target_follows_gridifiable_pattern): Support standalone distribute
	constructs.
	(grid_var_segment): New enum.
	(grid_mark_variable_segment): New function.
	(grid_copy_leading_local_assignments): Call grid_mark_variable_segment
	if a new argument says so.
	(grid_process_grid_body): New function.
	(grid_eliminate_combined_simd_part): Likewise.
	(grid_mark_tiling_loops): Likewise.
	(grid_mark_tiling_parallels_and_loops): Likewise.
	(grid_process_kernel_body_copy): Support standalone distribute
	constructs.
	(grid_attempt_target_gridification): New grid variable holding overall
	gridification state.  Support standalone distribute constructs and
	collapse clauses.
	* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.

gcc/testsuite/
	* c-c++-common/gomp/gridify-1.c: Update scan string.
	* gfortran.dg/gomp/gridify-1.f90: Likewise.
	* c-c++-common/gomp/gridify-2.c: New test.
	* c-c++-common/gomp/gridify-3.c: Likewise.

libgomp/
	* testsuite/libgomp.hsa.c/tiling-1.c: New test.
	* testsuite/libgomp.hsa.c/tiling-2.c: Likewise.
---
 gcc/doc/optinfo.texi                         |    3 +
 gcc/dumpfile.c                               |    1 +
 gcc/dumpfile.h                               |    3 +-
 gcc/gimple.h                                 |   57 +
 gcc/omp-low.c                                | 1555 +++++++++++++++++++-------
 gcc/testsuite/c-c++-common/gomp/gridify-1.c  |    2 +-
 gcc/testsuite/c-c++-common/gomp/gridify-2.c  |   66 ++
 gcc/testsuite/c-c++-common/gomp/gridify-3.c  |   68 ++
 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 |    2 +-
 libgomp/testsuite/libgomp.hsa.c/tiling-1.c   |  212 ++++
 libgomp/testsuite/libgomp.hsa.c/tiling-2.c   |  258 +++++
 11 files changed, 1812 insertions(+), 415 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/tiling-2.c

diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 3c8fdba..20ca560 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
+@item OPTGROUP_OPENMP
+OpenMP passes. Enabled by @option{-openmp}.
+
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index e9483bc..5b23c3f 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -138,6 +138,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
+  {"openmp", OPTGROUP_OPENMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index b7d70f2..f366228 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -98,7 +98,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
+#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0eafada..0d0296e 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -163,7 +163,13 @@ enum gf_mask {
     GF_OMP_FOR_KIND_CILKSIMD	= GF_OMP_FOR_SIMD | 1,
     GF_OMP_FOR_COMBINED		= 1 << 4,
     GF_OMP_FOR_COMBINED_INTO	= 1 << 5,
+    /* The following flag must not be used on GF_OMP_FOR_KIND_GRID_LOOP loop
+       statements.  */
     GF_OMP_FOR_GRID_PHONY	= 1 << 6,
+    /* The following two flags should only be set on GF_OMP_FOR_KIND_GRID_LOOP
+       loop statements.  */
+    GF_OMP_FOR_GRID_INTRA_GROUP	= 1 << 6,
+    GF_OMP_FOR_GRID_GROUP_ITER  = 1 << 7,
     GF_OMP_TARGET_KIND_MASK	= (1 << 4) - 1,
     GF_OMP_TARGET_KIND_REGION	= 0,
     GF_OMP_TARGET_KIND_DATA	= 1,
@@ -5143,6 +5149,8 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
 static inline bool
 gimple_omp_for_grid_phony (const gomp_for *omp_for)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       != GF_OMP_FOR_KIND_GRID_LOOP);
   return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
 }
 
@@ -5151,12 +5159,61 @@ gimple_omp_for_grid_phony (const gomp_for *omp_for)
 static inline void
 gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       != GF_OMP_FOR_KIND_GRID_LOOP);
   if (value)
     omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
   else
     omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
 }
 
+/* Return the kernel_intra_group of a GRID_LOOP OMP_FOR statement.  */
+
+static inline bool
+gimple_omp_for_grid_intra_group (const gomp_for *omp_for)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_INTRA_GROUP) != 0;
+}
+
+/* Set kernel_intra_group flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_intra_group (gomp_for *omp_for, bool value)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  if (value)
+    omp_for->subcode |= GF_OMP_FOR_GRID_INTRA_GROUP;
+  else
+    omp_for->subcode &= ~GF_OMP_FOR_GRID_INTRA_GROUP;
+}
+
+/* Return true if iterations of a grid OMP_FOR statement correspond to HSA
+   groups.  */
+
+static inline bool
+gimple_omp_for_grid_group_iter (const gomp_for *omp_for)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_GROUP_ITER) != 0;
+}
+
+/* Set group_iter flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_group_iter (gomp_for *omp_for, bool value)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  if (value)
+    omp_for->subcode |= GF_OMP_FOR_GRID_GROUP_ITER;
+  else
+    omp_for->subcode &= ~GF_OMP_FOR_GRID_GROUP_ITER;
+}
+
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
 static inline tree
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7c58c03..6b7093b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3294,8 +3294,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
       else if (gimple_code (ctx->stmt) == GIMPLE_OMP_TEAMS)
 	{
 	  if ((gimple_code (stmt) != GIMPLE_OMP_FOR
-	       || (gimple_omp_for_kind (stmt)
-		   != GF_OMP_FOR_KIND_DISTRIBUTE))
+	       || ((gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_DISTRIBUTE)
+		   && (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP)))
 	      && gimple_code (stmt) != GIMPLE_OMP_PARALLEL)
 	    {
 	      error_at (gimple_location (stmt),
@@ -5420,15 +5420,25 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
     {
       gcond *stmt;
       tree label_true, arm1, arm2;
+      enum tree_code pred_code = TREE_CODE (predicate);
 
       label = create_artificial_label (UNKNOWN_LOCATION);
       label_true = create_artificial_label (UNKNOWN_LOCATION);
-      arm1 = TREE_OPERAND (predicate, 0);
-      arm2 = TREE_OPERAND (predicate, 1);
-      gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
-      gimplify_expr (&arm2, stmt_list, NULL, is_gimple_val, fb_rvalue);
-      stmt = gimple_build_cond (TREE_CODE (predicate), arm1, arm2,
-				label_true, label);
+      if (TREE_CODE_CLASS (pred_code) == tcc_comparison)
+	{
+	  arm1 = TREE_OPERAND (predicate, 0);
+	  arm2 = TREE_OPERAND (predicate, 1);
+	  gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	  gimplify_expr (&arm2, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	}
+      else
+	{
+	  arm1 = predicate;
+	  gimplify_expr (&arm1, stmt_list, NULL, is_gimple_val, fb_rvalue);
+	  arm2 = boolean_false_node;
+	  pred_code = NE_EXPR;
+	}
+      stmt = gimple_build_cond (pred_code, arm1, arm2, label_true, label);
       gimple_seq_add_stmt (stmt_list, stmt);
       gimple_seq_add_stmt (stmt_list, gimple_build_label (label_true));
     }
@@ -12917,7 +12927,6 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
 				   gomp_target *tgt_stmt)
 {
   grid_create_kernel_launch_attr_types ();
-  tree u32_one = build_one_cst (uint32_type_node);
   tree lattrs = create_tmp_var (grid_attr_trees->kernel_launch_attributes_type,
 				"__kernel_launch_attrs");
 
@@ -12942,10 +12951,10 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
 
   tree dimref = build3 (COMPONENT_REF, uint32_type_node, lattrs,
 			grid_attr_trees->kernel_lattrs_dimnum_decl, NULL_TREE);
-  /* At this moment we cannot gridify a loop with a collapse clause.  */
-  /* TODO: Adjust when we support bigger collapse.  */
-  gcc_assert (max_dim == 0);
-  gsi_insert_before (gsi, gimple_build_assign (dimref, u32_one), GSI_SAME_STMT);
+  gcc_checking_assert (max_dim <= 2);
+  tree dimensions = build_int_cstu (uint32_type_node, max_dim + 1);
+  gsi_insert_before (gsi, gimple_build_assign (dimref, dimensions),
+		     GSI_SAME_STMT);
   TREE_ADDRESSABLE (lattrs) = 1;
   return build_fold_addr_expr (lattrs);
 }
@@ -13591,59 +13600,79 @@ expand_omp_target (struct omp_region *region)
     }
 }
 
-/* Expand KFOR loop as a GPGPU kernel, i.e. as a body only with iteration
-   variable derived from the thread number.  */
+/* Expand KFOR loop as a HSA grifidied kernel, i.e. as a body only with
+   iteration variable derived from the thread number.  INTRA_GROUP means this
+   is an expansion of a loop iterating over work-items within a separate
+   iteration over groups. */
 
 static void
-grid_expand_omp_for_loop (struct omp_region *kfor)
+grid_expand_omp_for_loop (struct omp_region *kfor, bool intra_group)
 {
-  tree t, threadid;
-  tree type, itype;
   gimple_stmt_iterator gsi;
-  tree n1, step;
-  struct omp_for_data fd;
-
   gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
   gcc_checking_assert (gimple_omp_for_kind (for_stmt)
 		       == GF_OMP_FOR_KIND_GRID_LOOP);
+  size_t collapse = gimple_omp_for_collapse (for_stmt);
+  struct omp_for_data_loop *loops
+    = XALLOCAVEC (struct omp_for_data_loop,
+                  gimple_omp_for_collapse (for_stmt));
+  struct omp_for_data fd;
+
+  remove_edge (BRANCH_EDGE (kfor->entry));
   basic_block body_bb = FALLTHRU_EDGE (kfor->entry)->dest;
 
-  gcc_assert (gimple_omp_for_collapse (for_stmt) == 1);
   gcc_assert (kfor->cont);
-  extract_omp_for_data (for_stmt, &fd, NULL);
-
-  itype = type = TREE_TYPE (fd.loop.v);
-  if (POINTER_TYPE_P (type))
-    itype = signed_type_for (type);
+  extract_omp_for_data (for_stmt, &fd, loops);
 
   gsi = gsi_start_bb (body_bb);
 
-  n1 = fd.loop.n1;
-  step = fd.loop.step;
-  n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
-				 true, NULL_TREE, true, GSI_SAME_STMT);
-  step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
-				   true, NULL_TREE, true, GSI_SAME_STMT);
-  threadid = build_call_expr (builtin_decl_explicit
-			      (BUILT_IN_OMP_GET_THREAD_NUM), 0);
-  threadid = fold_convert (itype, threadid);
-  threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
-				       true, GSI_SAME_STMT);
+  for (size_t dim = 0; dim < collapse; dim++)
+    {
+      tree type, itype;
+      itype = type = TREE_TYPE (fd.loops[dim].v);
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
 
-  tree startvar = fd.loop.v;
-  t = fold_build2 (MULT_EXPR, itype, threadid, step);
-  if (POINTER_TYPE_P (type))
-    t = fold_build_pointer_plus (n1, t);
-  else
-    t = fold_build2 (PLUS_EXPR, type, t, n1);
-  t = fold_convert (type, t);
-  t = force_gimple_operand_gsi (&gsi, t,
-				DECL_P (startvar)
-				&& TREE_ADDRESSABLE (startvar),
-				NULL_TREE, true, GSI_SAME_STMT);
-  gassign *assign_stmt = gimple_build_assign (startvar, t);
-  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+      tree n1 = fd.loops[dim].n1;
+      tree step = fd.loops[dim].step;
+      n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
+				     true, NULL_TREE, true, GSI_SAME_STMT);
+      step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
+				       true, NULL_TREE, true, GSI_SAME_STMT);
+      tree threadid;
+      if (gimple_omp_for_grid_group_iter (for_stmt))
+	{
+	  gcc_checking_assert (!intra_group);
+	  threadid = build_call_expr (builtin_decl_explicit
+				      (BUILT_IN_HSA_WORKGROUPID), 1,
+				      build_int_cstu (unsigned_type_node, dim));
+	}
+      else if (intra_group)
+	threadid = build_call_expr (builtin_decl_explicit
+				    (BUILT_IN_HSA_WORKITEMID), 1,
+				    build_int_cstu (unsigned_type_node, dim));
+      else
+	threadid = build_call_expr (builtin_decl_explicit
+				    (BUILT_IN_HSA_WORKITEMABSID), 1,
+				    build_int_cstu (unsigned_type_node, dim));
+      threadid = fold_convert (itype, threadid);
+      threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
+					   true, GSI_SAME_STMT);
 
+      tree startvar = fd.loops[dim].v;
+      tree t = fold_build2 (MULT_EXPR, itype, threadid, step);
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (n1, t);
+      else
+	t = fold_build2 (PLUS_EXPR, type, t, n1);
+      t = fold_convert (type, t);
+      t = force_gimple_operand_gsi (&gsi, t,
+				    DECL_P (startvar)
+				    && TREE_ADDRESSABLE (startvar),
+				    NULL_TREE, true, GSI_SAME_STMT);
+      gassign *assign_stmt = gimple_build_assign (startvar, t);
+      gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+    }
   /* Remove the omp for statement */
   gsi = gsi_last_bb (kfor->entry);
   gsi_remove (&gsi, true);
@@ -13654,10 +13683,12 @@ grid_expand_omp_for_loop (struct omp_region *kfor)
 	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_CONTINUE);
   gsi_remove (&gsi, true);
 
-  /* Replace the GIMPLE_OMP_RETURN with a real return.  */
+  /* Replace the GIMPLE_OMP_RETURN with a barrier, if necessary.  */
   gsi = gsi_last_bb (kfor->exit);
   gcc_assert (!gsi_end_p (gsi)
 	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  if (intra_group)
+    gsi_insert_before (&gsi, build_omp_barrier (NULL_TREE), GSI_SAME_STMT);
   gsi_remove (&gsi, true);
 
   /* Fixup the much simpler CFG.  */
@@ -13696,7 +13727,7 @@ grid_remap_kernel_arg_accesses (tree *tp, int *walk_subtrees, void *data)
 static void expand_omp (struct omp_region *region);
 
 /* If TARGET region contains a kernel body for loop, remove its region from the
-   TARGET and expand it in GPGPU kernel fashion. */
+   TARGET and expand it in HSA gridified kernel fashion. */
 
 static void
 grid_expand_target_grid_body (struct omp_region *target)
@@ -13738,11 +13769,29 @@ grid_expand_target_grid_body (struct omp_region *target)
 
   struct omp_region *kfor = *pp;
   gcc_assert (kfor);
-  gcc_assert (gimple_omp_for_kind (last_stmt ((kfor)->entry))
-	      == GF_OMP_FOR_KIND_GRID_LOOP);
+  gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
+  gcc_assert (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP);
   *pp = kfor->next;
   if (kfor->inner)
-    expand_omp (kfor->inner);
+    {
+      if (gimple_omp_for_grid_group_iter (for_stmt))
+	{
+	  struct omp_region **next_pp;
+	  for (pp = &kfor->inner; *pp; pp = next_pp)
+	    {
+	      next_pp = &(*pp)->next;
+	      if ((*pp)->type != GIMPLE_OMP_FOR)
+		continue;
+	      gomp_for *inner = as_a <gomp_for *> (last_stmt ((*pp)->entry));
+	      gcc_assert (gimple_omp_for_kind (inner)
+			  == GF_OMP_FOR_KIND_GRID_LOOP);
+	      grid_expand_omp_for_loop (*pp, true);
+	      *pp = (*pp)->next;
+	      next_pp = pp;
+	    }
+	}
+      expand_omp (kfor->inner);
+    }
   if (gpukernel->inner)
     expand_omp (gpukernel->inner);
 
@@ -13772,8 +13821,7 @@ grid_expand_target_grid_body (struct omp_region *target)
   struct function *kern_cfun = DECL_STRUCT_FUNCTION (kern_fndecl);
   kern_cfun->curr_properties = cfun->curr_properties;
 
-  remove_edge (BRANCH_EDGE (kfor->entry));
-  grid_expand_omp_for_loop (kfor);
+  grid_expand_omp_for_loop (kfor, false);
 
   /* Remove the omp for statement */
   gimple_stmt_iterator gsi = gsi_last_bb (gpukernel->entry);
@@ -14133,7 +14181,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -14180,7 +14228,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -15000,6 +15048,46 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   BLOCK_VARS (block) = gimple_bind_vars (bind);
 }
 
+/* Return the lastprivate predicate for a given gridified loop described by FD).
+   TODO: When grid stuff is moved to a separate file, move this too.  */
+
+static tree
+grid_lastprivate_predicate (struct omp_for_data *fd)
+{
+  /* When dealing with a gridified loop, we need to check up to three collapsed
+     iteration variables but they are not actually captured in this fd.
+     Fortunately, we can easily rely on HSA builtins to get this
+     information. */
+
+  tree id, size;
+  if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP
+      && gimple_omp_for_grid_intra_group (fd->for_stmt))
+    {
+      id = builtin_decl_explicit (BUILT_IN_HSA_WORKITEMID);
+      size = builtin_decl_explicit (BUILT_IN_HSA_CURRENTWORKGROUPSIZE);
+    }
+  else
+    {
+      id = builtin_decl_explicit (BUILT_IN_HSA_WORKITEMABSID);
+      size = builtin_decl_explicit (BUILT_IN_HSA_GRIDSIZE);
+    }
+  tree cond = NULL;
+  for (int dim = 0; dim < fd->collapse; dim++)
+    {
+      tree dim_tree = build_int_cstu (unsigned_type_node, dim);
+      tree u1 = build_int_cstu (unsigned_type_node, 1);
+      tree c2
+	= build2 (EQ_EXPR, boolean_type_node,
+		  build2 (PLUS_EXPR, unsigned_type_node,
+			  build_call_expr (id, 1, dim_tree), u1),
+		  build_call_expr (size, 1, dim_tree));
+      if (cond)
+	cond = build2 (TRUTH_AND_EXPR, boolean_type_node, cond, c2);
+      else
+	cond = c2;
+    }
+  return cond;
+}
 
 /* A subroutine of lower_omp_for.  Generate code to emit the predicate
    for a lastprivate clause.  Given a loop control predicate of (V
@@ -15027,58 +15115,65 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
 	cond_code = EQ_EXPR;
     }
 
-  tree n2 = fd->loop.n2;
-  if (fd->collapse > 1
-      && TREE_CODE (n2) != INTEGER_CST
-      && gimple_omp_for_combined_into_p (fd->for_stmt))
+  if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP
+      || gimple_omp_for_grid_phony (fd->for_stmt))
+    cond = grid_lastprivate_predicate (fd);
+  else
     {
-      struct omp_context *taskreg_ctx = NULL;
-      if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
+      tree n2 = fd->loop.n2;
+      if (fd->collapse > 1
+	  && TREE_CODE (n2) != INTEGER_CST
+	  && gimple_omp_for_combined_into_p (fd->for_stmt))
 	{
-	  gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
-	  if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
-	      || gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
+	  struct omp_context *taskreg_ctx = NULL;
+	  if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
 	    {
-	      if (gimple_omp_for_combined_into_p (gfor))
-		{
-		  gcc_assert (ctx->outer->outer
-			      && is_parallel_ctx (ctx->outer->outer));
-		  taskreg_ctx = ctx->outer->outer;
-		}
-	      else
+	      gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
+	      if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
+		  || gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
 		{
-		  struct omp_for_data outer_fd;
-		  extract_omp_for_data (gfor, &outer_fd, NULL);
-		  n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
+		  if (gimple_omp_for_combined_into_p (gfor))
+		    {
+		      gcc_assert (ctx->outer->outer
+				  && is_parallel_ctx (ctx->outer->outer));
+		      taskreg_ctx = ctx->outer->outer;
+		    }
+		  else
+		    {
+		      struct omp_for_data outer_fd;
+		      extract_omp_for_data (gfor, &outer_fd, NULL);
+		      n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
+		    }
 		}
+	      else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
+		taskreg_ctx = ctx->outer->outer;
 	    }
-	  else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
-	    taskreg_ctx = ctx->outer->outer;
-	}
-      else if (is_taskreg_ctx (ctx->outer))
-	taskreg_ctx = ctx->outer;
-      if (taskreg_ctx)
-	{
-	  int i;
-	  tree innerc
-	    = find_omp_clause (gimple_omp_taskreg_clauses (taskreg_ctx->stmt),
-			       OMP_CLAUSE__LOOPTEMP_);
-	  gcc_assert (innerc);
-	  for (i = 0; i < fd->collapse; i++)
+	  else if (is_taskreg_ctx (ctx->outer))
+	    taskreg_ctx = ctx->outer;
+	  if (taskreg_ctx)
 	    {
+	      int i;
+	      tree taskreg_clauses
+		= gimple_omp_taskreg_clauses (taskreg_ctx->stmt);
+	      tree innerc = find_omp_clause (taskreg_clauses,
+					     OMP_CLAUSE__LOOPTEMP_);
+	      gcc_assert (innerc);
+	      for (i = 0; i < fd->collapse; i++)
+		{
+		  innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
+					    OMP_CLAUSE__LOOPTEMP_);
+		  gcc_assert (innerc);
+		}
 	      innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
 					OMP_CLAUSE__LOOPTEMP_);
-	      gcc_assert (innerc);
+	      if (innerc)
+		n2 = fold_convert (TREE_TYPE (n2),
+				   lookup_decl (OMP_CLAUSE_DECL (innerc),
+						taskreg_ctx));
 	    }
-	  innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
-				    OMP_CLAUSE__LOOPTEMP_);
-	  if (innerc)
-	    n2 = fold_convert (TREE_TYPE (n2),
-			       lookup_decl (OMP_CLAUSE_DECL (innerc),
-					    taskreg_ctx));
 	}
+      cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
     }
-  cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
 
   clauses = gimple_omp_for_clauses (fd->for_stmt);
   stmts = NULL;
@@ -15247,11 +15342,13 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						ctx);
 	}
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  bool phony_loop = (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP
+		     && gimple_omp_for_grid_phony (stmt));
+  if (!phony_loop)
     gimple_seq_add_stmt (&body, stmt);
   gimple_seq_add_seq (&body, gimple_omp_body (stmt));
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  if (!phony_loop)
     gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
 							   fd.loop.v));
 
@@ -15265,7 +15362,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   body = maybe_catch_exception (body);
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  if (!phony_loop)
     {
       /* Region exit marker goes at the end of the loop body.  */
       gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
@@ -17249,60 +17346,90 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
-/* Returen true if STMT is an assignment of a register-type into a local
-   VAR_DECL.  */
+/* Structure describing the basic properties of the loop we ara analyzing
+   whether it can be gridified and when it is gridified. */
+
+struct grid_prop
+{
+  /* True when we are doing tiling gridification, i.e. when there is a distinct
+     distribute loop over groups and a loop construct over work-items.  False
+     when distribute and parallel for loops form a combined construct.  */
+  bool tiling;
+  /* Location of the target construct for optimization information
+     messages.  */
+  location_t target_loc;
+  /* The collapse clause of the involved loops.  Collapse value of all of them
+     must be the same for gridification to take place.  */
+  size_t collapse;
+  /* Group sizes, if requested by the user or NULL if not requested.  */
+  tree group_sizes[3];
+};
+
+#define GRID_MISSED_MSG_PREFIX "Will not turn target construct into a " \
+  "gridified HSA kernel because "
+
+/* Return true if STMT is an assignment of a register-type into a local
+   VAR_DECL.  If GRID is non-NULL, the assignment additionally must not be to
+   any of the trees specifying group sizes there.  */
 
 static bool
-grid_reg_assignment_to_local_var_p (gimple *stmt)
+grid_safe_assignment_p (gimple *stmt, grid_prop *grid)
 {
   gassign *assign = dyn_cast <gassign *> (stmt);
   if (!assign)
     return false;
+  if (gimple_clobber_p (assign))
+    return true;
   tree lhs = gimple_assign_lhs (assign);
   if (!VAR_P (lhs)
       || !is_gimple_reg_type (TREE_TYPE (lhs))
       || is_global_var (lhs))
     return false;
+  if (grid)
+    for (unsigned i = 0; i < grid->collapse; i++)
+      if (lhs == grid->group_sizes[i])
+	return false;
   return true;
 }
 
 /* Return true if all statements in SEQ are assignments to local register-type
-   variables.  */
+   variables that do not hold group size information.  */
 
 static bool
-grid_seq_only_contains_local_assignments (gimple_seq seq)
+grid_seq_only_contains_local_assignments (gimple_seq seq, grid_prop *grid)
 {
   if (!seq)
     return true;
 
   gimple_stmt_iterator gsi;
   for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
-    if (!grid_reg_assignment_to_local_var_p (gsi_stmt (gsi)))
+    if (!grid_safe_assignment_p (gsi_stmt (gsi), grid))
       return false;
   return true;
 }
 
-/* Scan statements in SEQ and call itself recursively on any bind.  If during
-   whole search only assignments to register-type local variables and one
-   single OMP statement is encountered, return true, otherwise return false.
-   RET is where we store any OMP statement encountered.  TARGET_LOC and NAME
-   are used for dumping a note about a failure.  */
+/* Scan statements in SEQ and call itself recursively on any bind.  GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  If during whole search only assignments to
+   register-type local variables (that do not overwrite group size information)
+   and one single OMP statement is encountered, return true, otherwise return
+   false.  RET is where we store any OMP statement encountered.  */
 
 static bool
-grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
-				     const char *name, gimple **ret)
+grid_find_single_omp_among_assignments_1 (gimple_seq seq, grid_prop *grid,
+					  const char *name, gimple **ret)
 {
   gimple_stmt_iterator gsi;
   for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gimple *stmt = gsi_stmt (gsi);
 
-      if (grid_reg_assignment_to_local_var_p (stmt))
+      if (grid_safe_assignment_p (stmt, grid))
 	continue;
       if (gbind *bind = dyn_cast <gbind *> (stmt))
 	{
 	  if (!grid_find_single_omp_among_assignments_1 (gimple_bind_body (bind),
-							 target_loc, name, ret))
+							 grid, name, ret))
 	      return false;
 	}
       else if (is_gimple_omp (stmt))
@@ -17310,10 +17437,18 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
 	  if (*ret)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, target_loc,
-				 "Will not turn target construct into a simple "
-				 "GPGPU kernel because %s construct contains "
-				 "multiple OpenMP constructs\n", name);
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "%s construct "
+				   "contains multiple OpenMP constructs\n",
+				   name);
+		  dump_printf_loc (MSG_NOTE, gimple_location (*ret),
+				   "The first OpenMP construct within "
+				   "a parallel\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "The second OpenMP construct within "
+				   "a parallel\n");
+		}
 	      return false;
 	    }
 	  *ret = stmt;
@@ -17321,10 +17456,14 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
       else
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, target_loc,
-			     "Will not turn target construct into a simple "
-			     "GPGPU kernel because %s construct contains "
-			     "a complex statement\n", name);
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "%s construct contains "
+			       "a complex statement\n", name);
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "gridification\n");
+	    }
 	  return false;
 	}
     }
@@ -17332,33 +17471,32 @@ grid_find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
 }
 
 /* Scan statements in SEQ and make sure that it and any binds in it contain
-   only assignments to local register-type variables and one OMP construct.  If
-   so, return that construct, otherwise return NULL.  If dumping is enabled and
-   function fails, use TARGET_LOC and NAME to dump a note with the reason for
-   failure.  */
+   only assignments to local register-type variables (that do not overwrite
+   group size information) and one OMP construct.  If so, return that
+   construct, otherwise return NULL.  GRID describes hitherto discovered
+   properties of the loop that is evaluated for possible gridification.  If
+   dumping is enabled and function fails, use NAME to dump a note with the
+   reason for failure.  */
 
 static gimple *
-grid_find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
+grid_find_single_omp_among_assignments (gimple_seq seq, grid_prop *grid,
 					const char *name)
 {
   if (!seq)
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
-			 "Will not turn target construct into a simple "
-			 "GPGPU kernel because %s construct has empty "
-			 "body\n",
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			 GRID_MISSED_MSG_PREFIX "%s construct has empty body\n",
 			 name);
       return NULL;
     }
 
   gimple *ret = NULL;
-  if (grid_find_single_omp_among_assignments_1 (seq, target_loc, name, &ret))
+  if (grid_find_single_omp_among_assignments_1 (seq, grid, name, &ret))
     {
       if (!ret && dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, target_loc,
-			 "Will not turn target construct into a simple "
-			 "GPGPU kernel because %s construct does not contain"
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			 GRID_MISSED_MSG_PREFIX "%s construct does not contain"
 			 "any other OpenMP construct\n", name);
       return ret;
     }
@@ -17401,218 +17539,81 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi,
       *handled_ops_p = true;
       wi->info = stmt;
       return error_mark_node;
-
-    case GIMPLE_OMP_FOR:
-      if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD)
-	  && gimple_omp_for_combined_into_p (stmt))
-	{
-	  *handled_ops_p = true;
-	  wi->info = stmt;
-	  return error_mark_node;
-	}
-      break;
-
     default:
       break;
     }
   return NULL;
 }
 
-
-/* If TARGET follows a pattern that can be turned into a gridified GPGPU
-   kernel, return true, otherwise return false.  In the case of success, also
-   fill in GROUP_SIZE_P with the requested group size or NULL if there is
-   none.  */
+/* Examine clauses of omp parallel statement PAR and if any prevents
+   gridification, issue a missed-optimization diagnostics and return false,
+   otherwise return true.  GRID describes hitherto discovered properties of the
+   loop that is evaluated for possible gridification.  */
 
 static bool
-grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p)
+grid_parallel_clauses_gridifiable (gomp_parallel *par, location_t tloc)
 {
-  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
-    return false;
-
-  location_t tloc = gimple_location (target);
-  gimple *stmt
-    = grid_find_single_omp_among_assignments (gimple_omp_body (target),
-					      tloc, "target");
-  if (!stmt)
-    return false;
-  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
-  tree group_size = NULL;
-  if (!teams)
-    {
-      dump_printf_loc (MSG_NOTE, tloc,
-		       "Will not turn target construct into a simple "
-		       "GPGPU kernel because it does not have a sole teams "
-		       "construct in it.\n");
-      return false;
-    }
-
-  tree clauses = gimple_omp_teams_clauses (teams);
+  tree clauses = gimple_omp_parallel_clauses (par);
   while (clauses)
     {
       switch (OMP_CLAUSE_CODE (clauses))
 	{
-	case OMP_CLAUSE_NUM_TEAMS:
+	case OMP_CLAUSE_NUM_THREADS:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because we cannot "
-			     "handle num_teams clause of teams "
-			     "construct\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			       GRID_MISSED_MSG_PREFIX "because there is "
+			       "a num_threads clause of the parallel "
+			       "construct\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (par),
+			       "Parallel construct has a num_threads clause\n");
+	    }
 	  return false;
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			       GRID_MISSED_MSG_PREFIX "a reduction clause"
+			       "is present\n ");
+	      dump_printf_loc (MSG_NOTE, gimple_location (par),
+			       "Parallel construct has a reduction clause\n");
+	    }
 	  return false;
 
-	case OMP_CLAUSE_THREAD_LIMIT:
-	  group_size = OMP_CLAUSE_OPERAND (clauses, 0);
-	  break;
-
 	default:
 	  break;
 	}
       clauses = OMP_CLAUSE_CHAIN (clauses);
     }
+  return true;
+}
 
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (teams), tloc,
-						 "teams");
-  if (!stmt)
-    return false;
-  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
-  if (!dist)
-    {
-      dump_printf_loc (MSG_NOTE, tloc,
-		       "Will not turn target construct into a simple "
-		       "GPGPU kernel because the teams construct  does not have "
-		       "a sole distribute construct in it.\n");
-      return false;
-    }
+/* Examine clauses and the body of omp loop statement GFOR and if something
+   prevents gridification, issue a missed-optimization diagnostics and return
+   false, otherwise return true. GRID describes hitherto discovered properties
+   of the loop that is evaluated for possible gridification.  */
 
-  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
-  if (!gimple_omp_for_combined_p (dist))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because we cannot handle a standalone "
-			 "distribute construct\n ");
-      return false;
-    }
-  if (dist->collapse > 1)
+static bool
+grid_inner_loop_gridifiable_p (gomp_for *gfor, grid_prop *grid)
+{
+  if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor),
+						 grid))
     {
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the distribute construct contains "
-			 "collapse clause\n");
-      return false;
-    }
-  struct omp_for_data fd;
-  extract_omp_for_data (dist, &fd, NULL);
-  if (fd.chunk_size)
-    {
-      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
 	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because the teams "
-			     "thread limit is different from distribute "
-			     "schedule chunk\n");
-	  return false;
-	}
-      group_size = fd.chunk_size;
-    }
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (dist), tloc,
-						 "distribute");
-  gomp_parallel *par;
-  if (!stmt || !(par = dyn_cast <gomp_parallel *> (stmt)))
-    return false;
-
-  clauses = gimple_omp_parallel_clauses (par);
-  while (clauses)
-    {
-      switch (OMP_CLAUSE_CODE (clauses))
-	{
-	case OMP_CLAUSE_NUM_THREADS:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified"
-			     "GPGPU kernel because there is a num_threads "
-			     "clause of the parallel construct\n");
-	  return false;
-
-	case OMP_CLAUSE_REDUCTION:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
-	  return false;
-
-	default:
-	  break;
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "the inner loop "
+			   "loop bounds computation contains a complex "
+			   "statement\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "Loop construct cannot be analyzed for "
+			   "gridification\n");
 	}
-      clauses = OMP_CLAUSE_CHAIN (clauses);
-    }
-
-  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (par), tloc,
-						 "parallel");
-  gomp_for *gfor;
-  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
-    return false;
-
-  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop is not a simple for "
-			 "loop\n");
-      return false;
-    }
-  if (gfor->collapse > 1)
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop contains collapse "
-			 "clause\n");
-      return false;
-    }
-
-  if (!grid_seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor)))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, tloc,
-			 "Will not turn target construct into a gridified GPGPU "
-			 "kernel because the inner loop pre_body contains"
-			 "a complex instruction\n");
       return false;
     }
 
-  clauses = gimple_omp_for_clauses (gfor);
+  tree clauses = gimple_omp_for_clauses (gfor);
   while (clauses)
     {
       switch (OMP_CLAUSE_CODE (clauses))
@@ -17621,28 +17622,28 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	  if (OMP_CLAUSE_SCHEDULE_KIND (clauses) != OMP_CLAUSE_SCHEDULE_AUTO)
 	    {
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, tloc,
-				 "Will not turn target construct into a "
-				 "gridified GPGPU kernel because the inner "
-				 "loop has a non-automatic scheduling clause\n");
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "the inner loop "
+				   "has a non-automatic schedule clause\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+				   "Loop construct has a non automatic "
+				   "schedule clause\n");
+		}
 	      return false;
 	    }
 	  break;
 
 	case OMP_CLAUSE_REDUCTION:
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a reduction "
-			     "clause is present\n ");
-	  return false;
-
-	case OMP_CLAUSE_LASTPRIVATE:
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a "
-			     "gridified GPGPU kernel because a lastprivate "
-			     "clause is present\n ");
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "a reduction "
+			       "clause is present\n ");
+	      dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			       "Loop construct has a reduction schedule "
+			       "clause\n");
+	    }
 	  return false;
 
 	default:
@@ -17650,7 +17651,6 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
 	}
       clauses = OMP_CLAUSE_CHAIN (clauses);
     }
-
   struct walk_stmt_info wi;
   memset (&wi, 0, sizeof (wi));
   if (walk_gimple_seq (gimple_omp_body (gfor),
@@ -17661,62 +17661,560 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p
       if (dump_enabled_p ())
 	{
 	  if (is_gimple_call (bad))
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     " GPGPU kernel because the inner loop contains "
-			     "call to a noreturn function\n");
-	  if (gimple_code (bad) == GIMPLE_OMP_FOR)
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     " GPGPU kernel because the inner loop contains "
-			     "a simd construct\n");
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the inner loop contains "
+			       "call to a noreturn function\n");
 	  else
-	    dump_printf_loc (MSG_NOTE, tloc,
-			     "Will not turn target construct into a gridified "
-			     "GPGPU kernel because the inner loop contains "
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			     GRID_MISSED_MSG_PREFIX "the inner loop contains "
 			     "statement %s which cannot be transformed\n",
 			     gimple_code_name[(int) gimple_code (bad)]);
+	  dump_printf_loc (MSG_NOTE, gimple_location (bad),
+			   "This statement cannot be analyzed for "
+			   "gridification\n");
 	}
       return false;
     }
-
-  *group_size_p = group_size;
   return true;
 }
 
-/* Operand walker, used to remap pre-body declarations according to a hash map
-   provided in DATA.  */
+/* Given distribute omp construct represented by DIST, which in the original
+   source forms a compound construct with a looping construct, return true if it
+   can be turned into a gridified HSA kernel.  Otherwise return false. GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  */
 
-static tree
-grid_remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+static bool
+grid_dist_follows_simple_pattern (gomp_for *dist, grid_prop *grid)
 {
-  tree t = *tp;
+  location_t tloc = grid->target_loc;
+  gimple *stmt = grid_find_single_omp_among_assignments (gimple_omp_body (dist),
+							 grid, "distribute");
+  gomp_parallel *par;
+  if (!stmt
+      || !(par = dyn_cast <gomp_parallel *> (stmt))
+      || !grid_parallel_clauses_gridifiable (par, tloc))
+    return false;
 
-  if (DECL_P (t) || TYPE_P (t))
-    *walk_subtrees = 0;
-  else
-    *walk_subtrees = 1;
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (par), grid,
+						 "parallel");
+  gomp_for *gfor;
+  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
+    return false;
 
-  if (VAR_P (t))
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
     {
-      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
-      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
-      tree *repl = declmap->get (t);
-      if (repl)
-	*tp = *repl;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "the inner loop is not "
+			 "a simple for loop\n");
+      return false;
     }
-  return NULL_TREE;
+  gcc_assert (gimple_omp_for_collapse (gfor) == grid->collapse);
+
+  if (!grid_inner_loop_gridifiable_p (gfor, grid))
+    return false;
+
+  return true;
 }
 
-/* Copy leading register-type assignments to local variables in SRC to just
-   before DST, Creating temporaries, adjusting mapping of operands in WI and
-   remapping operands as necessary.  Add any new temporaries to TGT_BIND.
-   Return the first statement that does not conform to
-   grid_reg_assignment_to_local_var_p or NULL.  */
+/* Given an omp loop statement GFOR, return true if it can participate in
+   tiling gridification, i.e. in one where the distribute and parallel for
+   loops do not form a compound statement.  GRID describes hitherto discovered
+   properties of the loop that is evaluated for possible gridification. */
 
-static gimple *
+static bool
+grid_gfor_follows_tiling_pattern (gomp_for *gfor, grid_prop *grid)
+{
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "an inner loop is not "
+			   "a simple for loop\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "This statement is not a simple for loop\n");
+	}
+      return false;
+    }
+
+  if (!grid_inner_loop_gridifiable_p (gfor, grid))
+    return false;
+
+  if (gimple_omp_for_collapse (gfor) != grid->collapse)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			   GRID_MISSED_MSG_PREFIX "an inner loop does not "
+			   "have use the same collapse clause\n");
+	  dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			   "Loop construct uses a different collapse clause\n");
+	}
+      return false;
+    }
+
+  struct omp_for_data fd;
+  struct omp_for_data_loop *loops
+    = (struct omp_for_data_loop *)alloca (grid->collapse
+					  * sizeof (struct omp_for_data_loop));
+  extract_omp_for_data (gfor, &fd, loops);
+  for (unsigned i = 0; i < grid->collapse; i++)
+    {
+      tree itype, type = TREE_TYPE (fd.loops[i].v);
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
+      else
+	itype = type;
+
+      tree n1 = fold_convert (itype, fd.loops[i].n1);
+      tree n2 = fold_convert (itype, fd.loops[i].n2);
+      tree t = build_int_cst (itype,
+			      (fd.loops[i].cond_code == LT_EXPR ? -1 : 1));
+      t = fold_build2 (PLUS_EXPR, itype, fd.loops[i].step, t);
+      t = fold_build2 (PLUS_EXPR, itype, t, n2);
+      t = fold_build2 (MINUS_EXPR, itype, t, n1);
+      if (TYPE_UNSIGNED (itype) && fd.loops[i].cond_code == GT_EXPR)
+	t = fold_build2 (TRUNC_DIV_EXPR, itype,
+			 fold_build1 (NEGATE_EXPR, itype, t),
+			 fold_build1 (NEGATE_EXPR, itype, fd.loops[i].step));
+      else
+	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, fd.loops[i].step);
+
+      if (!operand_equal_p (grid->group_sizes[i], t, 0))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute and "
+			       "an internal loop do not agree on tile size\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (gfor),
+			       "Loop construct does not seem to loop over "
+			       "a tile size\n");
+	    }
+	  return false;
+	}
+    }
+  return true;
+}
+
+/* Facing a call to FNDECL in the body of a distribute construct, return true
+   if we can handle it or false if it precludes gridification.  */
+
+static bool
+grid_call_permissible_in_distribute_p (tree fndecl)
+{
+  if (DECL_PURE_P (fndecl) || TREE_READONLY (fndecl))
+    return true;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if (strstr (name, "omp_") != name)
+    return false;
+
+  if ((strcmp (name, "omp_get_thread_num") == 0)
+      || (strcmp (name, "omp_get_num_threads") == 0)
+      || (strcmp (name, "omp_get_num_teams") == 0)
+      || (strcmp (name, "omp_get_team_num") == 0)
+      || (strcmp (name, "omp_get_level") == 0)
+      || (strcmp (name, "omp_get_active_level") == 0)
+      || (strcmp (name, "omp_in_parallel") == 0))
+    return true;
+
+  return false;
+}
+
+/* Facing a call satisfying grid_call_permissible_in_distribute_p in the body
+   of a distribute construct that is pointed at by GSI, modify it as necessary
+   for gridification.  If the statement itself got removed, return true.  */
+
+static bool
+grid_handle_call_in_distribute (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  gcc_checking_assert (stmt);
+  if (DECL_PURE_P (fndecl) || TREE_READONLY (fndecl))
+    return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  if ((strcmp (name, "omp_get_thread_num") == 0)
+      || (strcmp (name, "omp_get_level") == 0)
+      || (strcmp (name, "omp_get_active_level") == 0)
+      || (strcmp (name, "omp_in_parallel") == 0))
+    {
+      tree lhs = gimple_call_lhs (stmt);
+      if (lhs)
+	{
+	  gassign *assign
+	    = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs)));
+	  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
+	}
+      gsi_remove (gsi, true);
+      return true;
+    }
+
+  /* The rest of the omp functions can stay as they are, HSA back-end will
+     handle them correctly.  */
+  gcc_checking_assert ((strcmp (name, "omp_get_num_threads") == 0)
+		       || (strcmp (name, "omp_get_num_teams") == 0)
+		       || (strcmp (name, "omp_get_team_num") == 0));
+  return false;
+}
+
+/* Given a sequence of statements within a distribute omp construct or a
+   parallel construct, which in the original source does not form a compound
+   construct with a looping construct, return true if it does not prevent us
+   from turning it into a gridified HSA kernel.  Otherwise return false. GRID
+   describes hitherto discovered properties of the loop that is evaluated for
+   possible gridification.  IN_PARALLEL must be true if seq is within a
+   parallel construct and flase if it is only within a distribute
+   construct.  */
+
+static bool
+grid_dist_follows_tiling_pattern (gimple_seq seq, grid_prop *grid,
+				  bool in_parallel)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (grid_safe_assignment_p (stmt, grid)
+	  || gimple_code (stmt) == GIMPLE_GOTO
+	  || gimple_code (stmt) == GIMPLE_LABEL
+	  || gimple_code (stmt) == GIMPLE_COND)
+	continue;
+      else if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  if (!grid_dist_follows_tiling_pattern (gimple_bind_body (bind),
+						 grid, in_parallel))
+	    return false;
+	  continue;
+	}
+      else if (gtry *try_stmt = dyn_cast <gtry *> (stmt))
+	{
+	  if (gimple_try_kind (try_stmt) == GIMPLE_TRY_CATCH)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "the distribute "
+				   "construct contains a try..catch region\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (try_stmt),
+				   "This statement cannot be analyzed for "
+				   "tiled gridification\n");
+		}
+	      return false;
+	    }
+	  if (!grid_dist_follows_tiling_pattern (gimple_try_eval (try_stmt),
+						 grid, in_parallel))
+	    return false;
+	  if (!grid_dist_follows_tiling_pattern (gimple_try_cleanup (try_stmt),
+						 grid, in_parallel))
+	    return false;
+	  continue;
+	}
+      else if (is_gimple_call (stmt))
+	{
+	  tree fndecl = gimple_call_fndecl (stmt);
+	  if (fndecl && grid_call_permissible_in_distribute_p (fndecl))
+	    continue;
+
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute "
+			       "construct contains a call\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "tiled gridification\n");
+	    }
+	  return false;
+	}
+      else if (gomp_parallel *par = dyn_cast <gomp_parallel *> (stmt))
+	{
+	  if (in_parallel)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "a parallel "
+				   "construct contains another parallel "
+				   "construct\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "This parallel construct is nested in "
+				   "another one\n");
+		}
+	      return false;
+	    }
+	  if (!grid_parallel_clauses_gridifiable (par, grid->target_loc)
+	      || !grid_dist_follows_tiling_pattern (gimple_omp_body (par),
+						    grid, true))
+	    return false;
+	}
+      else if (gomp_for *gfor = dyn_cast <gomp_for *> (stmt))
+	{
+	  if (!in_parallel)
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+				   GRID_MISSED_MSG_PREFIX "a loop "
+				   "construct is not nested within a parallel "
+				   "construct\n");
+		  dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+				   "This loop construct is not nested in "
+				   "a parallel construct\n");
+		}
+	      return false;
+	    }
+	  if (!grid_gfor_follows_tiling_pattern (gfor, grid))
+	    return false;
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
+			       GRID_MISSED_MSG_PREFIX "the distribute "
+			       "construct contains a complex statement\n");
+	      dump_printf_loc (MSG_NOTE, gimple_location (stmt),
+			       "This statement cannot be analyzed for "
+			       "tiled gridification\n");
+	    }
+	  return false;
+	}
+    }
+    return true;
+}
+
+/* If TARGET follows a pattern that can be turned into a gridified HSA kernel,
+   return true, otherwise return false.  In the case of success, also fill in
+   GRID with information describing the kernel grid.  */
+
+static bool
+grid_target_follows_gridifiable_pattern (gomp_target *target, grid_prop *grid)
+{
+  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
+    return false;
+
+  location_t tloc = gimple_location (target);
+  grid->target_loc = tloc;
+  gimple *stmt
+    = grid_find_single_omp_among_assignments (gimple_omp_body (target),
+					      grid, "target");
+  if (!stmt)
+    return false;
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  tree group_size = NULL;
+  if (!teams)
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+		       GRID_MISSED_MSG_PREFIX "it does not have a sole teams "
+		       "construct in it.\n");
+      return false;
+    }
+
+  tree clauses = gimple_omp_teams_clauses (teams);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_TEAMS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "the teams construct "
+			     "contains a num_teams clause\n ");
+	  return false;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_THREAD_LIMIT:
+	  if (!integer_zerop (OMP_CLAUSE_OPERAND (clauses, 0)))
+	    group_size = OMP_CLAUSE_OPERAND (clauses, 0);
+	  break;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = grid_find_single_omp_among_assignments (gimple_omp_body (teams), grid,
+						 "teams");
+  if (!stmt)
+    return false;
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  if (!dist)
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+		       GRID_MISSED_MSG_PREFIX "the teams construct does not "
+		       "have a single distribute construct in it.\n");
+      return false;
+    }
+
+  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
+
+  grid->collapse = gimple_omp_for_collapse (dist);
+  if (grid->collapse > 3)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "the distribute construct "
+			 "contains collapse clause with parameter greater "
+			 "than 3\n");
+      return false;
+    }
+
+  struct omp_for_data fd;
+  struct omp_for_data_loop *dist_loops
+    = (struct omp_for_data_loop *)alloca (grid->collapse
+					  * sizeof (struct omp_for_data_loop));
+  extract_omp_for_data (dist, &fd, dist_loops);
+  if (fd.chunk_size)
+    {
+      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "the teams "
+			     "thread limit is different from distribute "
+			     "schedule chunk\n");
+	  return false;
+	}
+      group_size = fd.chunk_size;
+    }
+  if (group_size && grid->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			 GRID_MISSED_MSG_PREFIX "group size cannot be "
+			 "set using thread_limit or schedule clauses "
+			 "when also using a collapse clause greater than 1\n");
+      return false;
+    }
+
+  if (gimple_omp_for_combined_p (dist))
+    {
+      grid->tiling = false;
+      grid->group_sizes[0] = group_size;
+      for (unsigned i = 1; i < grid->collapse; i++)
+	grid->group_sizes[i] = NULL;
+      return grid_dist_follows_simple_pattern (dist, grid);
+    }
+  else
+    {
+      grid->tiling = true;
+      if (group_size)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc,
+			     GRID_MISSED_MSG_PREFIX "group size cannot be set "
+			     "using thread_limit or schedule clauses when "
+			     "distribute and loop constructs do not form "
+			     "one combined construct\n");
+	  return false;
+	}
+      for (unsigned i = 0; i < grid->collapse; i++)
+	{
+	  if (fd.loops[i].cond_code == GT_EXPR)
+	    grid->group_sizes[i] = fold_build1 (NEGATE_EXPR,
+						TREE_TYPE (fd.loops[i].step),
+						fd.loops[i].step);
+	  else
+	    grid->group_sizes[i] = fd.loops[i].step;
+	}
+      return grid_dist_follows_tiling_pattern (gimple_omp_body (dist), grid,
+					       false);
+    }
+}
+
+/* Operand walker, used to remap pre-body declarations according to a hash map
+   provided in DATA.  */
+
+static tree
+grid_remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (DECL_P (t) || TYPE_P (t))
+    *walk_subtrees = 0;
+  else
+    *walk_subtrees = 1;
+
+  if (VAR_P (t))
+    {
+      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+      tree *repl = declmap->get (t);
+      if (repl)
+	*tp = *repl;
+    }
+  return NULL_TREE;
+}
+
+/* Identifiers of segments into which a particular variable should be places
+   when gridifying.  */
+
+enum grid_var_segment {GRID_SEGMENT_PRIVATE, GRID_SEGMENT_GROUP,
+		       GRID_SEGMENT_GLOBAL};
+
+/* Mark VAR so that it is eventually placed into SEGMENT.  Place an artificial
+   builtin call into SEQ that will make sure the variable is always considered
+   address taken.  */
+
+static void
+grid_mark_variable_segment (tree var, enum grid_var_segment segment)
+{
+  /* Making a non-addressable variables would require that we re-gimplify all
+     their uses.  Fortunately, we do not have to do this because if they are
+     not addressable, it means they are not used in atomic or parallel
+     statements and so relaxed GPU consistency rules mean we can just keep them
+     private. */
+  if (!TREE_ADDRESSABLE (var))
+    return;
+
+  switch (segment)
+    {
+    case GRID_SEGMENT_GROUP:
+      DECL_ATTRIBUTES (var) = tree_cons (get_identifier ("hsa_group_segment"),
+					 NULL, DECL_ATTRIBUTES (var));
+      break;
+    case GRID_SEGMENT_GLOBAL:
+      DECL_ATTRIBUTES (var) = tree_cons (get_identifier ("hsa_global_segment"),
+					 NULL, DECL_ATTRIBUTES (var));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (!TREE_STATIC (var))
+    {
+      TREE_STATIC (var) = 1;
+      varpool_node::finalize_decl (var);
+    }
+
+}
+
+/* Copy leading register-type assignments to local variables in SRC to just
+   before DST, Creating temporaries, adjusting mapping of operands in WI and
+   remapping operands as necessary.  Add any new temporaries to TGT_BIND.
+   Return the first statement that does not conform to grid_safe_assignment_p
+   or NULL.  If VAR_SEGMENT is not GRID_SEGMENT_PRIVATE, also mark all
+   variables in traversed bind statements so that they are put into the
+   appropriate segment.  */
+
+static gimple *
 grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
-				gbind *tgt_bind, struct walk_stmt_info *wi)
+				     gbind *tgt_bind,
+				     enum grid_var_segment var_segment,
+				     struct walk_stmt_info *wi)
 {
   hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
   gimple_stmt_iterator gsi;
@@ -17726,13 +18224,17 @@ grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
       if (gbind *bind = dyn_cast <gbind *> (stmt))
 	{
 	  gimple *r = grid_copy_leading_local_assignments
-	    (gimple_bind_body (bind), dst, tgt_bind, wi);
+	    (gimple_bind_body (bind), dst, tgt_bind, var_segment, wi);
+
+	  if (var_segment != GRID_SEGMENT_PRIVATE)
+	    for (tree var = gimple_bind_vars (bind); var; var = DECL_CHAIN (var))
+	      grid_mark_variable_segment (var, var_segment);
 	  if (r)
 	    return r;
 	  else
 	    continue;
 	}
-      if (!grid_reg_assignment_to_local_var_p (stmt))
+      if (!grid_safe_assignment_p (stmt, NULL))
 	return stmt;
       tree lhs = gimple_assign_lhs (as_a <gassign *> (stmt));
       tree repl = copy_var_decl (lhs, create_tmp_var_name (NULL),
@@ -17748,43 +18250,262 @@ grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
   return NULL;
 }
 
+/* Statement walker function to make adjustments to statements within the
+   gridifed kernel copy.  */
+
+static tree
+grid_process_grid_body (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD))
+  {
+    gomp_for *loop = as_a <gomp_for *> (stmt);
+    tree clauses = gimple_omp_for_clauses (loop);
+    tree cl = find_omp_clause (clauses, OMP_CLAUSE_SAFELEN);
+    if (cl)
+      OMP_CLAUSE_SAFELEN_EXPR (cl) = integer_one_node;
+    else
+      {
+	tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
+	OMP_CLAUSE_SAFELEN_EXPR (c) = integer_one_node;
+	OMP_CLAUSE_CHAIN (c) = clauses;
+	gimple_omp_for_set_clauses (loop, c);
+      }
+  }
+  return NULL_TREE;
+}
+
+/* Given a PARLOOP that is a normal for looping construct but also a part of a
+   combined construct with a simd loop, eliminate the simd loop.  */
+
+static void
+grid_eliminate_combined_simd_part (gomp_for *parloop)
+{
+  struct walk_stmt_info wi;
+
+  memset (&wi, 0, sizeof (wi));
+  wi.val_only = true;
+  enum gf_mask msk = GF_OMP_FOR_SIMD;
+  wi.info = (void *) &msk;
+  walk_gimple_seq (gimple_omp_body (parloop), find_combined_for, NULL, &wi);
+  gimple *stmt = (gimple *) wi.info;
+  /* We expect that the SIMD id the only statement in the parallel loop.  */
+  gcc_assert (stmt
+	      && gimple_code (stmt) == GIMPLE_OMP_FOR
+	      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_SIMD)
+	      && gimple_omp_for_combined_into_p (stmt)
+	      && !gimple_omp_for_combined_p (stmt));
+  gomp_for *simd = as_a <gomp_for *> (stmt);
+
+  /* Copy over the iteration properties because the body refers to the index in
+     the bottmom-most loop.  */
+  unsigned i, collapse = gimple_omp_for_collapse (parloop);
+  gcc_checking_assert (collapse == gimple_omp_for_collapse (simd));
+  for (i = 0; i < collapse; i++)
+    {
+      gimple_omp_for_set_index (parloop, i, gimple_omp_for_index (simd, i));
+      gimple_omp_for_set_initial (parloop, i, gimple_omp_for_initial (simd, i));
+      gimple_omp_for_set_final (parloop, i, gimple_omp_for_final (simd, i));
+      gimple_omp_for_set_incr (parloop, i, gimple_omp_for_incr (simd, i));
+    }
+
+  tree *tgt= gimple_omp_for_clauses_ptr (parloop);
+  while (*tgt)
+    tgt = &OMP_CLAUSE_CHAIN (*tgt);
+
+  /* Copy over all clauses, except for linaer clauses, which are turned into
+     private clauses, and all other simd-specificl clauses, which are
+     ignored.  */
+  tree *pc = gimple_omp_for_clauses_ptr (simd);
+  while (*pc)
+    {
+      tree c = *pc;
+      switch (TREE_CODE (c))
+	{
+	case OMP_CLAUSE_LINEAR:
+	  {
+	    tree priv = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_PRIVATE);
+	    OMP_CLAUSE_DECL (priv) = OMP_CLAUSE_DECL (c);
+	    OMP_CLAUSE_CHAIN (priv) = NULL;
+	    *tgt = priv;
+	    tgt = &OMP_CLAUSE_CHAIN (priv);
+	    pc = &OMP_CLAUSE_CHAIN (c);
+	    break;
+	  }
+
+	case OMP_CLAUSE_SAFELEN:
+	case OMP_CLAUSE_SIMDLEN:
+	case OMP_CLAUSE_ALIGNED:
+	  pc = &OMP_CLAUSE_CHAIN (c);
+	  break;
+
+	default:
+	  *pc = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (c) = NULL;
+	  *tgt = c;
+	  tgt = &OMP_CLAUSE_CHAIN(c);
+	  break;
+	}
+    }
+
+  /* Finally, throw away the simd and mark the parallel loop as not
+     combined.  */
+  gimple_omp_set_body (parloop, gimple_omp_body (simd));
+  gimple_omp_for_set_combined_p (parloop, false);
+}
+
+/* Statement walker function marking all parallels as grid_phony and loops as
+   grid ones representing threads of a particular thread group.  */
+
+static tree
+grid_mark_tiling_loops (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			struct walk_stmt_info *wi_in)
+{
+  *handled_ops_p = false;
+  if (gomp_for *loop = dyn_cast <gomp_for *> (gsi_stmt (*gsi)))
+    {
+      *handled_ops_p = true;
+      gimple_omp_for_set_kind (loop, GF_OMP_FOR_KIND_GRID_LOOP);
+      gimple_omp_for_set_grid_intra_group (loop, true);
+      if (gimple_omp_for_combined_p (loop))
+	grid_eliminate_combined_simd_part (loop);
+
+      struct walk_stmt_info body_wi;
+      memset (&body_wi, 0, sizeof (body_wi));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (loop),
+			   grid_process_grid_body, NULL, &body_wi);
+
+      gbind *bind = (gbind *) wi_in->info;
+      tree c;
+      for (c = gimple_omp_for_clauses (loop); c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
+	  {
+	    push_gimplify_context ();
+	    tree ov = OMP_CLAUSE_DECL (c);
+	    tree gv = copy_var_decl (ov, create_tmp_var_name (NULL),
+				    TREE_TYPE (ov));
+
+	    grid_mark_variable_segment (gv, GRID_SEGMENT_GROUP);
+	    DECL_CONTEXT (gv) = current_function_decl;
+	    gimple_bind_append_vars (bind, gv);
+	    tree x = lang_hooks.decls.omp_clause_assign_op (c, gv, ov);
+	    gimplify_and_add (x, &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c));
+	    x = lang_hooks.decls.omp_clause_copy_ctor (c, ov, gv);
+	    gimple_seq l = NULL;
+	    gimplify_and_add (x, &l);
+	    gsi_insert_seq_after (gsi, l, GSI_SAME_STMT);
+	    pop_gimplify_context (bind);
+	  }
+    }
+  return NULL_TREE;
+}
+
+/* Statement walker function marking all parallels as grid_phony and loops as
+   grid ones representing threads of a particular thread group.  */
+
+static tree
+grid_mark_tiling_parallels_and_loops (gimple_stmt_iterator *gsi,
+				      bool *handled_ops_p,
+				      struct walk_stmt_info *wi_in)
+{
+  *handled_ops_p = false;
+  wi_in->removed_stmt = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  if (gbind *bind = dyn_cast <gbind *> (stmt))
+    {
+      for (tree var = gimple_bind_vars (bind); var; var = DECL_CHAIN (var))
+	grid_mark_variable_segment (var, GRID_SEGMENT_GROUP);
+    }
+  else if (gomp_parallel *parallel = dyn_cast <gomp_parallel *> (stmt))
+    {
+      *handled_ops_p = true;
+      gimple_omp_parallel_set_grid_phony (parallel, true);
+
+      gbind *new_bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
+      gimple_bind_set_body (new_bind, gimple_omp_body (parallel));
+      gimple_seq s = NULL;
+      gimple_seq_add_stmt (&s, new_bind);
+      gimple_omp_set_body (parallel, s);
+
+      struct walk_stmt_info wi_par;
+      memset (&wi_par, 0, sizeof (wi_par));
+      wi_par.info = new_bind;
+      walk_gimple_seq_mod (gimple_bind_body_ptr (new_bind),
+			   grid_mark_tiling_loops, NULL, &wi_par);
+    }
+  else if (is_a <gcall *> (stmt))
+    wi_in->removed_stmt = grid_handle_call_in_distribute (gsi);
+  return NULL_TREE;
+}
+
 /* Given freshly copied top level kernel SEQ, identify the individual OMP
-   components, mark them as part of kernel and return the inner loop, and copy
-   assignment leading to them just before DST, remapping them using WI and
-   adding new temporaries to TGT_BIND.  */
+   components, mark them as part of kernel, copy assignment leading to them
+   just before DST, remapping them using WI and adding new temporaries to
+   TGT_BIND, and and return the loop that will be used for kernel dispatch.  */
 
 static gomp_for *
-grid_process_kernel_body_copy (gimple_seq seq, gimple_stmt_iterator *dst,
+grid_process_kernel_body_copy (grid_prop *grid, gimple_seq seq,
+			       gimple_stmt_iterator *dst,
 			       gbind *tgt_bind, struct walk_stmt_info *wi)
 {
-  gimple *stmt = grid_copy_leading_local_assignments (seq, dst, tgt_bind, wi);
+  gimple *stmt = grid_copy_leading_local_assignments (seq, dst, tgt_bind,
+						      GRID_SEGMENT_GLOBAL, wi);
   gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
   gcc_assert (teams);
   gimple_omp_teams_set_grid_phony (teams, true);
   stmt = grid_copy_leading_local_assignments (gimple_omp_body (teams), dst,
-					 tgt_bind, wi);
+					      tgt_bind, GRID_SEGMENT_GLOBAL, wi);
   gcc_checking_assert (stmt);
   gomp_for *dist = dyn_cast <gomp_for *> (stmt);
   gcc_assert (dist);
   gimple_seq prebody = gimple_omp_for_pre_body (dist);
   if (prebody)
-    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
-  gimple_omp_for_set_grid_phony (dist, true);
-  stmt = grid_copy_leading_local_assignments (gimple_omp_body (dist), dst,
-					 tgt_bind, wi);
-  gcc_checking_assert (stmt);
+    grid_copy_leading_local_assignments (prebody, dst, tgt_bind,
+					 GRID_SEGMENT_GROUP, wi);
 
-  gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
-  gimple_omp_parallel_set_grid_phony (parallel, true);
-  stmt = grid_copy_leading_local_assignments (gimple_omp_body (parallel), dst,
-					 tgt_bind, wi);
-  gomp_for *inner_loop = as_a <gomp_for *> (stmt);
-  gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_GRID_LOOP);
-  prebody = gimple_omp_for_pre_body (inner_loop);
-  if (prebody)
-    grid_copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+  if (grid->tiling)
+    {
+      gimple_omp_for_set_kind (dist, GF_OMP_FOR_KIND_GRID_LOOP);
+      gimple_omp_for_set_grid_group_iter (dist, true);
 
-  return inner_loop;
+      struct walk_stmt_info wi_tiled;
+      memset (&wi_tiled, 0, sizeof (wi_tiled));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (dist),
+			   grid_mark_tiling_parallels_and_loops, NULL,
+			   &wi_tiled);
+      return dist;
+    }
+  else
+    {
+      gimple_omp_for_set_grid_phony (dist, true);
+      stmt = grid_copy_leading_local_assignments (gimple_omp_body (dist), dst,
+						  tgt_bind,
+						  GRID_SEGMENT_PRIVATE, wi);
+      gcc_checking_assert (stmt);
+      gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
+      gimple_omp_parallel_set_grid_phony (parallel, true);
+      stmt = grid_copy_leading_local_assignments (gimple_omp_body (parallel),
+						  dst, tgt_bind,
+						  GRID_SEGMENT_PRIVATE, wi);
+      gomp_for *inner_loop = as_a <gomp_for *> (stmt);
+      gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_GRID_LOOP);
+      prebody = gimple_omp_for_pre_body (inner_loop);
+      if (prebody)
+	grid_copy_leading_local_assignments (prebody, dst, tgt_bind,
+					     GRID_SEGMENT_PRIVATE, wi);
+
+      if (gimple_omp_for_combined_p (inner_loop))
+	grid_eliminate_combined_simd_part (inner_loop);
+      struct walk_stmt_info body_wi;;
+      memset (&body_wi, 0, sizeof (body_wi));
+      walk_gimple_seq_mod (gimple_omp_body_ptr (inner_loop),
+			   grid_process_grid_body, NULL, &body_wi);
+
+      return inner_loop;
+    }
 }
 
 /* If TARGET points to a GOMP_TARGET which follows a gridifiable pattern,
@@ -17797,14 +18518,16 @@ grid_attempt_target_gridification (gomp_target *target,
 				   gimple_stmt_iterator *gsi,
 				   gbind *tgt_bind)
 {
-  tree group_size;
-  if (!target || !grid_target_follows_gridifiable_pattern (target, &group_size))
+  /* removed group_size */
+  grid_prop grid;
+  memset (&grid, 0, sizeof (grid));
+  if (!target || !grid_target_follows_gridifiable_pattern (target, &grid))
     return;
 
   location_t loc = gimple_location (target);
   if (dump_enabled_p ())
     dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-		     "Target construct will be turned into a gridified GPGPU "
+		     "Target construct will be turned into a gridified HSA "
 		     "kernel\n");
 
   /* Copy target body to a GPUKERNEL construct:  */
@@ -17817,8 +18540,8 @@ grid_attempt_target_gridification (gomp_target *target,
   wi.info = declmap;
 
   /* Copy assignments in between OMP statements before target, mark OMP
-     statements within copy appropriatly.  */
-  gomp_for *inner_loop = grid_process_kernel_body_copy (kernel_seq, gsi,
+     statements within copy appropriately.  */
+  gomp_for *inner_loop = grid_process_kernel_body_copy (&grid, kernel_seq, gsi,
 							tgt_bind, &wi);
 
   gbind *old_bind = as_a <gbind *> (gimple_seq_first (gimple_omp_body (target)));
@@ -17833,10 +18556,10 @@ grid_attempt_target_gridification (gomp_target *target,
     (gimple_bind_body_ptr (as_a <gbind *> (gimple_omp_body (target))),
      gpukernel);
 
-  walk_tree (&group_size, grid_remap_prebody_decls, &wi, NULL);
+  for (size_t i = 0; i < grid.collapse; i++)
+    walk_tree (&grid.group_sizes[i], grid_remap_prebody_decls, &wi, NULL);
   push_gimplify_context ();
-  size_t collapse = gimple_omp_for_collapse (inner_loop);
-  for (size_t i = 0; i < collapse; i++)
+  for (size_t i = 0; i < grid.collapse; i++)
     {
       tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i));
       if (POINTER_TYPE_P (type))
@@ -17850,12 +18573,12 @@ grid_attempt_target_gridification (gomp_target *target,
       tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i));
       walk_tree (&n2, grid_remap_prebody_decls, &wi, NULL);
       adjust_for_condition (loc, &cond_code, &n2);
-      tree step;
-      step = get_omp_for_step_from_incr (loc,
-					 gimple_omp_for_incr (inner_loop, i));
-      gimple_seq tmpseq = NULL;
       n1 = fold_convert (itype, n1);
       n2 = fold_convert (itype, n2);
+
+      tree step
+	= get_omp_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i));
+
       tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1));
       t = fold_build2 (PLUS_EXPR, itype, step, t);
       t = fold_build2 (PLUS_EXPR, itype, t, n2);
@@ -17866,15 +18589,23 @@ grid_attempt_target_gridification (gomp_target *target,
 			 fold_build1 (NEGATE_EXPR, itype, step));
       else
 	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
+      if (grid.tiling)
+        {
+          if (cond_code == GT_EXPR)
+            step = fold_build1 (NEGATE_EXPR, itype, step);
+          t = fold_build2 (MULT_EXPR, itype, t, step);
+        }
+
       tree gs = fold_convert (uint32_type_node, t);
+      gimple_seq tmpseq = NULL;
       gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue);
       if (!gimple_seq_empty_p (tmpseq))
 	gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
 
       tree ws;
-      if (i == 0 && group_size)
+      if (grid.group_sizes[i])
 	{
-	  ws = fold_convert (uint32_type_node, group_size);
+	  ws = fold_convert (uint32_type_node, grid.group_sizes[i]);
 	  tmpseq = NULL;
 	  gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue);
 	  if (!gimple_seq_empty_p (tmpseq))
@@ -17995,7 +18726,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp, /* properties_provided */
@@ -18466,7 +19197,7 @@ const pass_data pass_data_diagnose_omp_blocks =
 {
   GIMPLE_PASS, /* type */
   "*diagnose_omp_blocks", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   0, /* properties_provided */
@@ -19897,7 +20628,7 @@ const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
   "oaccdevlow", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
@@ -19939,7 +20670,7 @@ const pass_data pass_data_omp_target_link =
 {
   GIMPLE_PASS,			/* type */
   "omptargetlink",		/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OPENMP,		/* optinfo_flags */
   TV_NONE,			/* tv_id */
   PROP_ssa,			/* properties_required */
   0,				/* properties_provided */
diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
index ba7a866..f9b03eb 100644
--- a/gcc/testsuite/c-c++-common/gomp/gridify-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
@@ -51,4 +51,4 @@ foo4 (int j, int n, int *a)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Target construct will be turned into a gridified GPGPU kernel" 4 "omplower" } } */
+/* { dg-final { scan-tree-dump-times "Target construct will be turned into a gridified HSA kernel" 4 "omplower" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-2.c b/gcc/testsuite/c-c++-common/gomp/gridify-2.c
new file mode 100644
index 0000000..6b5cc9a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-2.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+#define BLOCK_SIZE 16
+
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+#pragma omp parallel
+	 {
+         int C_row, C_col;
+         float Cval = 0.0;
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+	       for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cval += As[row][e] * Bs[e][col];
+		 }
+	   }  /* End for kblock .. */
+
+
+#pragma omp for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
+
+	     }
+         } /* end parallel */
+      }	   /* end target teams distribute */
+}
+
+/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-3.c b/gcc/testsuite/c-c++-common/gomp/gridify-3.c
new file mode 100644
index 0000000..8dbeaef
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-3.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+#define BLOCK_SIZE 16
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC)
+{
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+	  float As[BLOCK_SIZE][BLOCK_SIZE];
+	  float Bs[BLOCK_SIZE][BLOCK_SIZE];
+	  float Cs[BLOCK_SIZE][BLOCK_SIZE];
+	  int C_row, C_col;
+
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               Cs[row][col] = 0.0;
+	     }
+
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp parallel for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp parallel for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cs[row][col] += As[row][e] * Bs[e][col];
+		 }
+         }  /* End for kblock .. */
+
+
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
+	     }
+      }	/* End distribute */
+}
+
+/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
index 00ff7f5..7def279 100644
--- a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
@@ -13,4 +13,4 @@ subroutine vector_square(n, a, b)
 !$omp end target teams
 end subroutine vector_square
 
-! { dg-final { scan-tree-dump "Target construct will be turned into a gridified GPGPU kernel" "omplower" } }
+! { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } }
diff --git a/libgomp/testsuite/libgomp.hsa.c/tiling-1.c b/libgomp/testsuite/libgomp.hsa.c/tiling-1.c
new file mode 100644
index 0000000..9149adc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/tiling-1.c
@@ -0,0 +1,212 @@
+/*
+
+   matmul.c : Matrix Multiplication with tiling for openmp4 example
+
+*/
+
+#include <stdlib.h>
+#include <math.h>
+
+#define BLOCK_SIZE 16
+/*
+  #define BLOCK_SIZE 32
+*/
+#define NSECPERSEC 1000000000L
+
+typedef struct {
+   int width;
+   int height;
+   int stride;
+   int hpad;
+   float* elements;
+} Matrix;
+
+/* Correctly extract the number of nanoseconds from the two time structures */
+long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
+   long int nanosecs;
+   if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
+      ((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
+      ( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
+   else nanosecs =
+      (((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
+      ( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
+   return nanosecs;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void  tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+
+int verify(float* v_res, float* v_ref, int len) {
+    int passed = 1;
+    int i;
+    for (i = 0; i < len; ++i) {
+        if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
+	  __builtin_abort ();
+        }
+    }
+    return passed;
+}
+
+
+int main(int argc, char* argv[]){
+
+   Matrix A,B,Bt,C,Cref;
+   int a1,a2,a3,i,j;
+   struct timespec start_time1, end_time1;
+   struct timespec start_time2, end_time2;
+   long int nanosecs,total_ops;
+   float gflopsTiled,gflopsCPU;
+
+   a1 = 35;
+   a2 = 28;
+   a3 = 47;
+
+   A.height = a1;
+   A.width = a2;
+   A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
+
+   B.height = a2;
+   B.width = a3;
+   B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
+
+   /* Bt is same as B but stored in column-major order */
+   Bt.height = B.height;
+   Bt.width = B.width;
+   Bt.stride = B.stride;
+   Bt.hpad = B.hpad;
+   Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
+
+   C.height = a1;
+   C.width = a3;
+   C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
+
+   Cref.height = a1;
+   Cref.width = a3;
+   Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
+
+   for(i = 0; i < A.hpad ; i++)
+      for(j = 0; j < A.stride; j++) {
+         if (( j<A.width ) && (i<A.height)) {
+            A.elements[i*A.stride + j] = (i % 3);
+         } else {
+            A.elements[i*A.stride + j] = 0.0;
+         }
+      }
+
+   /*  Initialize B and Bt */
+   for(i = 0; i < B.hpad ; i++)
+      for(j = 0; j < B.stride; j++) {
+         if (( j<B.width ) && (i<B.height)) {
+            B.elements[i*B.stride+j] = (j % 2);
+            Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
+         } else {
+            B.elements[i*B.stride+j] = 0.0;
+            Bt.elements[j*Bt.stride+i] = 0.0;
+         }
+      }
+
+   /* zero C, and Cref */
+   for(i = 0; i < C.hpad; i++)
+      for(j = 0; j < C.stride; j++) {
+         C.elements[i*C.stride+j] = 0.0;
+         Cref.elements[i*Cref.stride+j] = 0.0;
+      }
+
+   simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
+   tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
+
+   verify(C.elements, Cref.elements, C.height * C.stride);
+   return 0;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+const float* B,const int LDB, const float beta,float* C, const int LDC) {
+   /*  A,B, and C  are in row-major order */
+   int c_row,c_col,inner;
+   float sum;
+   for (c_col  = 0 ;  c_col<N; c_col++ ) {
+      for (c_row = 0 ; c_row<M; c_row++ ) {
+         sum = 0.0 ;
+         for (inner = 0 ; inner<K; inner++ ) {
+            sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
+         }
+         C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
+      }
+   }
+}
+
+/***************************
+
+   tiled_sgemm_tt:  Tiled matrix multiplication:
+
+***************************/
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
+	{
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+#pragma omp parallel
+	 {
+         int C_row, C_col;
+         float Cval = 0.0;
+
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE )
+	   {
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+               for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   C_row = C_row_start + row;
+		   C_col = C_col_start + col;
+		   if ((C_row < M) && (kblock + col < K))
+		     As[row][col] = A[(C_row*LDA)+ kblock + col];
+		   else
+		     As[row][col] = 0;
+		   if ((kblock + row < K) && C_col < N)
+		     Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		   else
+		     Bs[row][col] = 0;
+		 }
+
+#pragma omp for collapse(2)
+	     for (int row=0 ; row < BLOCK_SIZE ; row++)
+	       for (int col=0 ; col < BLOCK_SIZE ; col++)
+		 {
+		   for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cval += As[row][e] * Bs[e][col];
+		 }
+	   }  /* End for kblock .. */
+
+
+#pragma omp for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++)
+	   for (int col=0 ; col < BLOCK_SIZE ; col++)
+	     {
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N))
+		 C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
+
+	     }
+         } /* end parallel */
+      }	   /* end target teams distribute */
+}
diff --git a/libgomp/testsuite/libgomp.hsa.c/tiling-2.c b/libgomp/testsuite/libgomp.hsa.c/tiling-2.c
new file mode 100644
index 0000000..6e54304
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/tiling-2.c
@@ -0,0 +1,258 @@
+/*
+
+   matmul.c : Matrix Multiplication with tiling for openmp4 example
+
+*/
+
+#include <stdlib.h>
+#include <math.h>
+
+#define BLOCK_SIZE 16
+/*
+  #define BLOCK_SIZE 32
+*/
+#define NSECPERSEC 1000000000L
+
+typedef struct {
+   int width;
+   int height;
+   int stride;
+   int hpad;
+   float* elements;
+} Matrix;
+
+/* Correctly extract the number of nanoseconds from the two time structures */
+long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
+   long int nanosecs;
+   if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
+      ((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
+      ( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
+   else nanosecs =
+      (((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
+      ( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
+   return nanosecs;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+void  tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
+     const float* B,const int LDB, const float beta,float* C, const int LDC) ;
+
+int verify(float* v_res, float* v_ref, int len) {
+    int passed = 1;
+    int i;
+    for (i = 0; i < len; ++i) {
+        if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
+	  __builtin_abort ();
+        }
+    }
+    return passed;
+}
+
+
+int main(int argc, char* argv[]){
+
+   Matrix A,B,Bt,C,Cref;
+   int a1,a2,a3,i,j;
+   struct timespec start_time1, end_time1;
+   struct timespec start_time2, end_time2;
+   long int nanosecs,total_ops;
+   float gflopsTiled,gflopsCPU;
+
+   a1 = 35;
+   a2 = 28;
+   a3 = 47;
+
+   A.height = a1;
+   A.width = a2;
+   A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
+
+   B.height = a2;
+   B.width = a3;
+   B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
+
+   /* Bt is same as B but stored in column-major order */
+   Bt.height = B.height;
+   Bt.width = B.width;
+   Bt.stride = B.stride;
+   Bt.hpad = B.hpad;
+   Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
+
+   C.height = a1;
+   C.width = a3;
+   C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
+
+   Cref.height = a1;
+   Cref.width = a3;
+   Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
+   Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
+
+   for(i = 0; i < A.hpad ; i++)
+      for(j = 0; j < A.stride; j++) {
+         if (( j<A.width ) && (i<A.height)) {
+            A.elements[i*A.stride + j] = (i % 3);
+         } else {
+            A.elements[i*A.stride + j] = 0.0;
+         }
+      }
+
+   /*  Initialize B and Bt */
+   for(i = 0; i < B.hpad ; i++)
+      for(j = 0; j < B.stride; j++) {
+         if (( j<B.width ) && (i<B.height)) {
+            B.elements[i*B.stride+j] = (j % 2);
+            Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
+         } else {
+            B.elements[i*B.stride+j] = 0.0;
+            Bt.elements[j*Bt.stride+i] = 0.0;
+         }
+      }
+
+   /* zero C, and Cref */
+   for(i = 0; i < C.hpad; i++)
+      for(j = 0; j < C.stride; j++) {
+         C.elements[i*C.stride+j] = 0.0;
+         Cref.elements[i*Cref.stride+j] = 0.0;
+      }
+
+   simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
+   tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
+
+   verify(C.elements, Cref.elements, C.height * C.stride);
+   return 0;
+}
+
+void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
+const float* B,const int LDB, const float beta,float* C, const int LDC) {
+   /*  A,B, and C  are in row-major order */
+   int c_row,c_col,inner;
+   float sum;
+   for (c_col  = 0 ;  c_col<N; c_col++ ) {
+      for (c_row = 0 ; c_row<M; c_row++ ) {
+         sum = 0.0 ;
+         for (inner = 0 ; inner<K; inner++ ) {
+            sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
+         }
+         C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
+      }
+   }
+}
+
+/***************************
+
+   tiled_sgemm_tt:  Tiled matrix multiplication:
+
+***************************/
+
+void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
+   const float*B, const int LDB, const float beta, float*C, const int LDC){
+
+#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
+#pragma omp distribute collapse(2)
+   for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE) {
+      for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE) {
+
+// We now have M/BLOCK_SIZE * N/BLOCK_SIZE teams = (M*N)/(BLOCK_SIZE*BLOCK_SIZE)
+// The grid global dimensions are M,N,1
+// The grid local dimensions are BLOCK_SIZE,BLOCK_SIZE,1
+
+// -------------------------------------------------------------------
+//      The rest of this code forms the HSAIL kernel with the
+//      pairs of "paralell for collapse(2)" loops repalced with a barrier.
+//      The kernel initializes these values
+//      C_row_start = get_group_id(0) * BLOCK_SIZE
+//      C_col_start = get_group_id(1) * BLOCK_SIZE
+//      row=get_local_id(0)
+//      col=get_local_id(1)
+// -------------------------------------------------------------------
+
+//       Each team has a local copy of these mini matrices
+         float As[BLOCK_SIZE][BLOCK_SIZE];
+         float Bs[BLOCK_SIZE][BLOCK_SIZE];
+         float Cs[BLOCK_SIZE][BLOCK_SIZE];
+         int C_row, C_col;
+
+         /* Zero Cs for this BLOCK */
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++) {
+            for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+               Cs[row][col] = 0.0;
+            }
+         }
+
+         // This kblock loop is run on the master thread of each team
+         for (int kblock = 0; kblock  < K ; kblock += BLOCK_SIZE ) {
+
+            // Copy global memory values to local memory
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+            for (int row=0 ; row < BLOCK_SIZE ; row++) {
+               for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+                  C_row = C_row_start + row;
+                  C_col = C_col_start + col;
+		  if ((C_row < M) && (kblock + col < K))
+		    As[row][col] = A[(C_row*LDA)+ kblock + col];
+		  else
+		    As[row][col] = 0;
+		  if ((kblock + row < K) && C_col < N)
+		    Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
+		  else
+		    Bs[row][col] = 0;
+               }
+            }
+
+            // Calculate Cs <- Sum(As X Bs) across all kblocks
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+            for (int row=0 ; row < BLOCK_SIZE ; row++) {
+               for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+                  for (int e = 0; e < BLOCK_SIZE; ++e)
+                     Cs[row][col] += As[row][e] * Bs[e][col];
+                }
+            }
+
+         }  /* End for kblock .. */
+
+
+         // Scale Update actual C from Cs
+// - - - - - - - - - - - - - - - - - - - -
+// REPLACE NEXT THREE LINES WITH A BARRIER
+#pragma omp parallel for collapse(2)
+         for (int row=0 ; row < BLOCK_SIZE ; row++) {
+            for (int col=0 ; col < BLOCK_SIZE ; col++) {
+// END BARRIER
+// - - - - - - - - - - - - - - - - - - - -
+               C_row = C_row_start + row;
+               C_col = C_col_start + col;
+	       if ((C_row < M) && (C_col < N)) {
+		 C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
+	       }
+            }
+         }
+
+// -------------------------------------------------------------------
+// This is the end of the kernel
+
+      }
+   }
+
+}
-- 
2.10.2

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2016-11-22 13:27     ` Martin Jambor
@ 2016-11-22 14:13       ` Jakub Jelinek
  2021-01-14 14:50       ` Thomas Schwinge
  1 sibling, 0 replies; 36+ messages in thread
From: Jakub Jelinek @ 2016-11-22 14:13 UTC (permalink / raw)
  To: GCC Patches, Martin Liska

On Tue, Nov 22, 2016 at 02:27:44PM +0100, Martin Jambor wrote:
> I have basically copied what libgfortran did, with additional checking
> for HAVE_UNISTD_H when attempting to implement secure_getenv in its
> absence (which is maybe unnecessary but should not do any harm) and I
> also needed to add -D_GNU_SOURCE to plugin compilation flags.
> Finally, I have changed all getenv users in the plugin to use
> secure_getenv.

I'm not sure about the all getenv users to secure_getenv, for the
specification of the library to dlopen it is essential, for the rest it
is debatable; but it is your choice.

> +hsa_status_t hsa_executable_validate(hsa_executable_t executable,
> +                                     uint32_t *result);
> +uint64_t hsa_queue_add_write_index_acq_rel(const hsa_queue_t *queue,
> +                                           uint64_t value);
...
> +hsa_status_t hsa_executable_readonly_variable_define(
> +    hsa_executable_t executable, hsa_agent_t agent, const char *variable_name,
> +    void *address);

If hsa.h is our header rather than one imported from somewhere else,
can you tweak the formatting (space before (, in the last above case
wrap after type to allow more arguments on a line?
If it is just imported from somewhere else, please disregard.

Otherwise LGTM.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [gomp4] add -finform-parallelism
@ 2017-02-21  8:09       ` Cesar Philippidis
  2017-02-22  8:28         ` Thomas Schwinge
  0 siblings, 1 reply; 36+ messages in thread
From: Cesar Philippidis @ 2017-02-21  8:09 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

This patch introduces a new -finform-parallelism flag to report any
detected parallelism encountered by the compiler. Initially, it's being
used to report how oaccdevlow partitions OpenACC loops. Currently, if
you want to extract this information, you need to compile the program
with -fdump-tree-oaccdevlow, then scan the tree dump for lines marked
Loop and decode the decimal bitmask that represents the parallelism.
This patch makes this process more user friendly by utilizing inform
messages to highlight the directives inside the source code, and clearly
print out the associated parallelism. E.g. given

  !$acc parallel loop
  do i = ...
    !$acc parallel loop
     do j = ...

-finform-parallelism reports

  inform-parallelism.f90: In function ‘MAIN__._omp_fn.0’:
  inform-parallelism.f90:10:0: note: ACC LOOP GANG
     !$acc parallel loop

  inform-parallelism.f90:12:0: note: ACC LOOP WORKER VECTOR
        !$acc loop

Unfortunately, because this oaccdevlow runs so late, the offloaded
function name doesn't match the one specified by the user.

While working on this, I noticed that the fortran FE wasn't recording
the location of combined loop directives properly, so I fixed that bug.
I also removed an unused variable inside trans-openmp.c.

This patch still isn't complete because I found a similar bug in the c++
FE. Thomas, before I fix that bug, do you think this patch is worth
pursuing for gomp-4_0-branch or maybe even trunk in general? Ideally, we
can extend these diagnostics to report any detected loops inside kernels
regions.

Cesar

[-- Attachment #2: gomp4-inform-parallelism.diff --]
[-- Type: text/x-patch, Size: 8943 bytes --]

2017-02-20  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* common.opt (finform-parallelism): New option.
	* omp-low.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to report how ACC LOOPs are
	assigned parallelism.

	gcc/doc/
	* invoke.texi: Document -finform-parallelism.

	gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses_1): Delete unused orig_decl.
	(gfc_trans_oacc_combined_directive): Set the location of
	combined acc loops.

	gcc/testsuite/
	* c-c++-common/goacc/inform-parallelism.c: New test.
	* gfortran.dg/goacc/inform-parallelism.f90: New test.


diff --git a/gcc/common.opt b/gcc/common.opt
index 42c0b2f..a7e5494 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1451,6 +1451,10 @@ fif-conversion2
 Common Report Var(flag_if_conversion2) Optimization
 Perform conversion of conditional jumps to conditional execution.
 
+finform-parallelism
+Common Var(flag_inform_parallelism) Init(0)
+Report all paralllelism detected inside offloaded regions.
+
 fstack-reuse=
 Common Joined RejectNegative Enum(stack_reuse_level) Var(flag_stack_reuse) Init(SR_ALL) Optimization
 -fstack-reuse=[all|named_vars|none] Set stack reuse level for local variables.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fd8ba42..9cc8a8d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
 -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
 -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
 -fgcse-sm -fhoist-adjacent-loads -fif-conversion @gol
--fif-conversion2 -findirect-inlining @gol
+-fif-conversion2 -findirect-inlining -finform-parallelism @gol
 -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
 -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-cp-alignment @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference -fipa-icf @gol
@@ -6428,6 +6428,13 @@ or @option{-finline-small-functions} options.
 
 Enabled at level @option{-O2}.
 
+@item -finform-parallelism
+@opindex finform-parallelism
+Report any parallelism detected by the compiler.  Inside OpenACC
+offloaded regions, this includes the gang, worker and vector level
+parallelism associated with any @code{ACC LOOP}s.  This option is disabled
+by default.
+
 @item -finline-functions
 @opindex finline-functions
 Consider all functions for inlining, even if they are not declared inline.
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 295f172..8688425 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1947,7 +1947,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 			  && n->expr->ref->next->u.ar.type == AR_FULL)))
 		{
 		  gfc_ref *ref = n->expr->ref;
-		  tree orig_decl = decl;
 		  gfc_component *c = ref->u.c.component;
 		  tree field;
 		  tree context;
@@ -3819,6 +3818,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   enum tree_code construct_code;
   bool scan_nodesc_arrays = false;
   hash_set<gfc_symbol *> *array_set = NULL;
+  location_t loc = input_location;
 
   switch (code->op)
     {
@@ -3850,6 +3850,9 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
     pushlevel ();
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, loop_clauses, NULL);
 
+  if (CAN_HAVE_LOCATION_P (stmt))
+    SET_EXPR_LOCATION (stmt, loc);
+
   if (array_set && array_set->elements ())
     gfc_add_expr_to_block (&inner, stmt);
 
@@ -3865,8 +3868,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
       delete array_set;
     }
 
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-		     oacc_clauses);
+  stmt = build2_loc (loc, construct_code, void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (&block, stmt);
 
   gfc_free_omp_clauses (loop_clauses);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0f79533..6ea8738 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -20399,6 +20399,28 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " SEQ" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+    ? " GANG" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+    ? " WORKER" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+    ? " VECTOR" : "";
+
+  inform (loop->loc, "ACC LOOP%s%s%s%s", seq, gang, worker, vector);
+
+  if (loop->child)
+    inform_oacc_loop (loop->child);
+  if (loop->sibling)
+    inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
    structures as we go.  By construction these loops are properly
    nested.  */
@@ -21069,6 +21091,8 @@ execute_oacc_device_lower ()
       dump_oacc_loop (dump_file, loops, 0);
       fprintf (dump_file, "\n");
     }
+  if (flag_inform_parallelism && loops->child)
+    inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
      dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/inform-parallelism.c b/gcc/testsuite/c-c++-common/goacc/inform-parallelism.c
new file mode 100644
index 0000000..b892bf0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/inform-parallelism.c
@@ -0,0 +1,61 @@
+/* Test the output of -finform-parallelism.  */
+
+/* { dg-additional-options "-finform-parallelism" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq /* { dg-message "ACC LOOP SEQ" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang /* { dg-message "ACC LOOP GANG" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop worker /* { dg-message "ACC LOOP WORKER" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop vector /* { dg-message "ACC LOOP VECTOR" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang vector /* { dg-message "ACC LOOP GANG VECTOR" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang worker /* { dg-message "ACC LOOP GANG WORKER" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop worker vector /* { dg-message "ACC LOOP WORKER VECTOR" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang worker vector /* { dg-message "ACC LOOP GANG WORKER VECTOR" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop /* { dg-message "ACC LOOP GANG VECTOR" } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop /* { dg-message "ACC LOOP GANG WORKER" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop /* { dg-message "ACC LOOP VECTOR" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc parallel loop gang /* { dg-message "ACC LOOP GANG" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker /* { dg-message "ACC LOOP WORKER" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop vector /* { dg-message "ACC LOOP VECTOR" } */
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/inform-parallelism.f90 b/gcc/testsuite/gfortran.dg/goacc/inform-parallelism.f90
new file mode 100644
index 0000000..6e11331
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/inform-parallelism.f90
@@ -0,0 +1,62 @@
+! Test the output of -finform-parallellism.
+
+! { dg-additional-options "-finform-parallelism" }
+
+program test
+  implicit none
+
+  integer x, y, z
+
+  !$acc parallel loop seq ! { dg-message "ACC LOOP SEQ" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang ! { dg-message "ACC LOOP GANG" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop worker ! { dg-message "ACC LOOP WORKER" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop vector ! { dg-message "ACC LOOP VECTOR" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang vector ! { dg-message "ACC LOOP GANG VECTOR" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang worker ! { dg-message "ACC LOOP GANG WORKER" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop worker vector ! { dg-message "ACC LOOP WORKER VECTOR" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang worker vector ! { dg-message "ACC LOOP GANG WORKER VECTOR" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop ! { dg-message "ACC LOOP GANG VECTOR" }
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop ! { dg-message "ACC LOOP GANG WORKER" }
+  do x = 1, 10
+     !$acc loop ! { dg-message "ACC LOOP VECTOR" }
+     do y = 1, 10
+     end do
+  end do
+
+  !$acc parallel loop gang ! { dg-message "ACC LOOP GANG" }
+  do x = 1, 10
+     !$acc loop worker ! { dg-message "ACC LOOP WORKER" }
+     do y = 1, 10
+        !$acc loop vector ! { dg-message "ACC LOOP VECTOR" }
+        do z = 1, 10
+        end do
+     end do
+  end do
+end program test

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Rename the "openmp" group of optimizations to "omp" (was: [PATCH 3/4] OpenMP lowering changes from the hsa branch)
  2016-11-22 13:43     ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
@ 2017-02-22  7:58       ` Thomas Schwinge
  2017-02-22  8:17         ` Miscellaneous optimization group fixes (was: Rename the "openmp" group of optimizations to "omp") Thomas Schwinge
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-22  7:58 UTC (permalink / raw)
  To: Martin Jambor, Jakub Jelinek, GCC Patches; +Cc: Cesar Philippidis

Hi!

On Tue, 22 Nov 2016 14:43:02 +0100, Martin Jambor <mjambor@suse.cz> wrote:
> On Fri, Nov 18, 2016 at 11:38:56AM +0100, Jakub Jelinek wrote:
> > On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> > > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
> > >  {
> > >    GIMPLE_PASS, /* type */
> > >    "ompexp", /* name */
> > > -  OPTGROUP_NONE, /* optinfo_flags */
> > > +  OPTGROUP_OPENMP, /* optinfo_flags */
> > >    TV_NONE, /* tv_id */
> > >    PROP_gimple_any, /* properties_required */
> > >    PROP_gimple_eomp, /* properties_provided */

Thanks for that!  (I noticed there is no testsuite coverage, though.)

This will be useful for what Cesar is currently working on,
<http://mid.mail-archive.com/bbc0b25b-bf92-998b-99ee-cae20ace1fab@codesourcery.com>
"[gomp4] add -finform-parallelism".  That is, instead of adding a new
"-finform-parallelism" flag, this should just build on top of the
existing "-fopt-info".

> > What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What about
> > openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
> > ompdevlow pass soon.
> 
> I was not sure about those at first, but I suppose all of them should
> also be in the same group (though I hope the name is still fine)

According to similar "rebrandings" regarding "omp", OK to commit the
following (not yet tested)?

commit e878bc10881810adf64891f76c503fd1d83fb536
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Feb 22 08:25:32 2017 +0100

    Rename the "openmp" group of optimizations to "omp"
    
            gcc/
            * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
            all users.
            * dumpfile.c (optgroup_options): Instead of "openmp", associate
            OPTGROUP_OMP with "omp".
---
 gcc/doc/optinfo.texi | 5 +++--
 gcc/dumpfile.c       | 2 +-
 gcc/dumpfile.h       | 3 ++-
 gcc/omp-expand.c     | 4 ++--
 gcc/omp-low.c        | 4 ++--
 gcc/omp-offload.c    | 6 +++---
 6 files changed, 13 insertions(+), 11 deletions(-)

diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index 415e9a9..cf6ce00 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -59,8 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
-@item OPTGROUP_OPENMP
-OpenMP passes. Enabled by @option{-openmp}.
+@item OPTGROUP_OMP
+OMP (Offloading and Multi Processing) passes. Enabled by
+@option{-omp}.
 
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
diff --git gcc/dumpfile.c gcc/dumpfile.c
index 2c5dce2..6b9a47c 100644
--- gcc/dumpfile.c
+++ gcc/dumpfile.c
@@ -140,7 +140,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
-  {"openmp", OPTGROUP_OPENMP},
+  {"omp", OPTGROUP_OMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git gcc/dumpfile.h gcc/dumpfile.h
index 7c8f7a2..3886f98 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -99,7 +99,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+					   Processing) transformations */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
diff --git gcc/omp-expand.c gcc/omp-expand.c
index 55e54e4..ea951d6 100644
--- gcc/omp-expand.c
+++ gcc/omp-expand.c
@@ -8134,7 +8134,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -8181,7 +8181,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
diff --git gcc/omp-low.c gcc/omp-low.c
index 35df02c..c2c69cb 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -8920,7 +8920,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp | PROP_gimple_lomp_dev, /* properties_provided */
@@ -9232,7 +9232,7 @@ const pass_data pass_data_diagnose_omp_blocks =
 {
   GIMPLE_PASS, /* type */
   "*diagnose_omp_blocks", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   0, /* properties_provided */
diff --git gcc/omp-offload.c gcc/omp-offload.c
index aed9e14..fad038f 100644
--- gcc/omp-offload.c
+++ gcc/omp-offload.c
@@ -1625,7 +1625,7 @@ const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
   "oaccdevlow", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
@@ -1727,7 +1727,7 @@ const pass_data pass_data_omp_device_lower =
 {
   GIMPLE_PASS, /* type */
   "ompdevlow", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   PROP_gimple_lomp_dev, /* properties_provided */
@@ -1771,7 +1771,7 @@ const pass_data pass_data_omp_target_link =
 {
   GIMPLE_PASS,			/* type */
   "omptargetlink",		/* name */
-  OPTGROUP_OPENMP,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   PROP_ssa,			/* properties_required */
   0,				/* properties_provided */


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Miscellaneous optimization group fixes (was: Rename the "openmp" group of optimizations to "omp")
  2017-02-22  7:58       ` Rename the "openmp" group of optimizations to "omp" (was: [PATCH 3/4] OpenMP lowering changes from the hsa branch) Thomas Schwinge
@ 2017-02-22  8:17         ` Thomas Schwinge
  2017-02-22  9:53           ` Martin Jambor
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-22  8:17 UTC (permalink / raw)
  To: Martin Jambor, Jakub Jelinek, GCC Patches; +Cc: Cesar Philippidis

Hi!

On Wed, 22 Feb 2017 08:48:40 +0100, I wrote:
> On Tue, 22 Nov 2016 14:43:02 +0100, Martin Jambor <mjambor@suse.cz> wrote:
> > On Fri, Nov 18, 2016 at 11:38:56AM +0100, Jakub Jelinek wrote:
> > > On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> > > > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
> > > >  {
> > > >    GIMPLE_PASS, /* type */
> > > >    "ompexp", /* name */
> > > > -  OPTGROUP_NONE, /* optinfo_flags */
> > > > +  OPTGROUP_OPENMP, /* optinfo_flags */
> > > >    TV_NONE, /* tv_id */
> > > >    PROP_gimple_any, /* properties_required */
> > > >    PROP_gimple_eomp, /* properties_provided */
> 
> Thanks for that!  (I noticed there is no testsuite coverage, though.)

> > > What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What about
> > > openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
> > > ompdevlow pass soon.
> > 
> > I was not sure about those at first, but I suppose all of them should
> > also be in the same group (though I hope the name is still fine)
> 
> According to similar "rebrandings" regarding "omp", OK to commit the
> following (not yet tested)?
> 
> commit e878bc10881810adf64891f76c503fd1d83fb536
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Wed Feb 22 08:25:32 2017 +0100
> 
>     Rename the "openmp" group of optimizations to "omp"
>     
>             gcc/
>             * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
>             all users.
>             * dumpfile.c (optgroup_options): Instead of "openmp", associate
>             OPTGROUP_OMP with "omp".

On top of that, OK to commit the following (not yet tested) -- these all
look like oversights to me, but please verify?

commit 9865976a121c1bd0fc59ea75e819924733f7ea98
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Feb 22 08:32:54 2017 +0100

    Miscellaneous optimization group fixes
    
            gcc/doc/
            * invoke.texi (-fopt-info): Document "omp".
            * optinfo.texi (Optimization groups): Fix option used for
            OPTGROUP_ALL.
            gcc/
            * dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
            * hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
            * ipa-hsa.c (pass_data_ipa_hsa): Likewise.
            * omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.
---
 gcc/doc/invoke.texi  | 2 ++
 gcc/doc/optinfo.texi | 2 +-
 gcc/dumpfile.h       | 6 +++---
 gcc/hsa-gen.c        | 2 +-
 gcc/ipa-hsa.c        | 2 +-
 gcc/omp-simd-clone.c | 2 +-
 6 files changed, 9 insertions(+), 7 deletions(-)

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 26bc146..356727b 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -13165,6 +13165,8 @@ Enable dumps from all interprocedural optimizations.
 Enable dumps from all loop optimizations.
 @item inline
 Enable dumps from all inlining optimizations.
+@item omp
+Enable dumps from all OMP (Offloading and Multi Processing) optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
 @item optall
diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index cf6ce00..e17cb37 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -70,7 +70,7 @@ Vectorization passes. Enabled by @option{-vec}.
 All other optimization passes which do not fall into one of the above.
 
 @item OPTGROUP_ALL
-All optimization passes. Enabled by @option{-all}.
+All optimization passes. Enabled by @option{-optall}.
 
 @end ftable
 
diff --git gcc/dumpfile.h gcc/dumpfile.h
index 3886f98..fef58f5 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -98,12 +98,12 @@ enum tree_dump_index
 #define OPTGROUP_IPA         (1 << 1)   /* IPA optimization passes */
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
-#define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+#define OPTGROUP_OMP         (1 << 4)   /* OMP (Offloading and Multi
 					   Processing) transformations */
+#define OPTGROUP_VEC         (1 << 5)   /* Vectorization passes */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-                              | OPTGROUP_VEC | OPTGROUP_OTHER)
+                              | OPTGROUP_OMP | OPTGROUP_VEC | OPTGROUP_OTHER)
 
 /* Define a tree dump switch.  */
 struct dump_file_info
diff --git gcc/hsa-gen.c gcc/hsa-gen.c
index 7721fcc..7b69d64 100644
--- gcc/hsa-gen.c
+++ gcc/hsa-gen.c
@@ -6474,7 +6474,7 @@ const pass_data pass_data_gen_hsail =
 {
   GIMPLE_PASS,
   "hsagen",	 			/* name */
-  OPTGROUP_NONE,			/* optinfo_flags */
+  OPTGROUP_OMP,				/* optinfo_flags */
   TV_NONE,				/* tv_id */
   PROP_cfg | PROP_ssa,			/* properties_required */
   0,					/* properties_provided */
diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c
index af70b0a..c02dada 100644
--- gcc/ipa-hsa.c
+++ gcc/ipa-hsa.c
@@ -289,7 +289,7 @@ const pass_data pass_data_ipa_hsa =
 {
   IPA_PASS, /* type */
   "hsa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_IPA_HSA, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
diff --git gcc/omp-simd-clone.c gcc/omp-simd-clone.c
index 09ad40b..99589d4 100644
--- gcc/omp-simd-clone.c
+++ gcc/omp-simd-clone.c
@@ -1690,7 +1690,7 @@ const pass_data pass_data_omp_simd_clone =
 {
   SIMPLE_IPA_PASS,		/* type */
   "simdclone",			/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   ( PROP_ssa | PROP_cfg ),	/* properties_required */
   0,				/* properties_provided */


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [gomp4] add -finform-parallelism
  2017-02-21  8:09       ` [gomp4] add -finform-parallelism Cesar Philippidis
@ 2017-02-22  8:28         ` Thomas Schwinge
  2017-02-23 15:06           ` Cesar Philippidis
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-22  8:28 UTC (permalink / raw)
  To: Cesar Philippidis, gcc-patches; +Cc: Martin Jambor, Jakub Jelinek

Hi Cesar!

On Mon, 20 Feb 2017 20:42:59 -0800, Cesar Philippidis <cesar@codesourcery.com> wrote:
> This patch introduces a new -finform-parallelism flag to report any
> detected parallelism encountered by the compiler. Initially, it's being
> used to report how oaccdevlow partitions OpenACC loops. Currently, if
> you want to extract this information, you need to compile the program
> with -fdump-tree-oaccdevlow, then scan the tree dump for lines marked
> Loop and decode the decimal bitmask that represents the parallelism.
> This patch makes this process more user friendly by utilizing inform
> messages to highlight the directives inside the source code, and clearly
> print out the associated parallelism. E.g. given
> 
>   !$acc parallel loop
>   do i = ...
>     !$acc parallel loop
>      do j = ...
> 
> -finform-parallelism reports

(Actually that should report that an OpenACC parallel construct nested in
another OpenACC parallel construct is not yet supported?)

>   inform-parallelism.f90: In function ‘MAIN__._omp_fn.0’:
>   inform-parallelism.f90:10:0: note: ACC LOOP GANG
>      !$acc parallel loop
> 
>   inform-parallelism.f90:12:0: note: ACC LOOP WORKER VECTOR
>         !$acc loop

Thanks!


> Unfortunately, because this oaccdevlow runs so late, the offloaded
> function name doesn't match the one specified by the user.

It's still useful.  We can later look into how to preserve the "original
name".


> While working on this, I noticed that the fortran FE wasn't recording
> the location of combined loop directives properly, so I fixed that bug.

That's a separate bug fix, please.


> I also removed an unused variable inside trans-openmp.c.

Also a separate bug fix, please.  In
<http://mid.mail-archive.com/87sho0c12q.fsf@euler.schwinge.homeip.net>, I
asked you to fix that one.


> This patch still isn't complete because I found a similar bug in the c++
> FE. Thomas, before I fix that bug

Again, a separate bug fix, please.


> do you think this patch is worth
> pursuing for gomp-4_0-branch or maybe even trunk in general?

Definitely!

> Ideally, we
> can extend these diagnostics to report any detected loops inside kernels
> regions.

Right.

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

> +@item -finform-parallelism
> +@opindex finform-parallelism
> +Report any parallelism detected by the compiler.  Inside OpenACC
> +offloaded regions, this includes the gang, worker and vector level
> +parallelism associated with any @code{ACC LOOP}s.  This option is disabled
> +by default.

I know Fortran likes to use upper case, but the generic GCC code uses
lower-case names.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
> +   children.  */
> +
> +static void
> +inform_oacc_loop (oacc_loop *loop)
> +{
> +  const char *seq = loop->mask == 0 ? " SEQ" : "";
> +  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
> +    ? " GANG" : "";
> +  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
> +    ? " WORKER" : "";
> +  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
> +    ? " VECTOR" : "";
> +
> +  inform (loop->loc, "ACC LOOP%s%s%s%s", seq, gang, worker, vector);

Likewise.

Per
<http://mid.mail-archive.com/8737f68y3r.fsf@euler.schwinge.homeip.net>
I'm suggesting this to be done a bit differently: instead of "inform",
this would then use the appropriate "-fopt-info-note-omp" option group
output.


If that's not yet there, possibly there could be some new flag added for
"-fopt-info" to display the "rich" location, which will also print the
original source code?


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Miscellaneous optimization group fixes (was: Rename the "openmp" group of optimizations to "omp")
  2017-02-22  8:17         ` Miscellaneous optimization group fixes (was: Rename the "openmp" group of optimizations to "omp") Thomas Schwinge
@ 2017-02-22  9:53           ` Martin Jambor
  2017-02-28  8:52             ` Rename the "openmp" group of optimizations to "omp" (was: Miscellaneous optimization group fixes) Thomas Schwinge
  2017-02-28  9:04             ` Miscellaneous optimization group fixes Thomas Schwinge
  0 siblings, 2 replies; 36+ messages in thread
From: Martin Jambor @ 2017-02-22  9:53 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Jakub Jelinek, GCC Patches, Cesar Philippidis

Hi,

On Wed, Feb 22, 2017 at 08:58:06AM +0100, Thomas Schwinge wrote:
> > 
> >     Rename the "openmp" group of optimizations to "omp"
> >     
> >             gcc/
> >             * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
> >             all users.
> >             * dumpfile.c (optgroup_options): Instead of "openmp", associate
> >             OPTGROUP_OMP with "omp".


I am of course fine with OPTGROUP_OMP.

> 
> On top of that, OK to commit the following (not yet tested) -- these all
> look like oversights to me, but please verify?

The missing documentation is an oversight.  Thanks for spotting it.

Martin


> 
> commit 9865976a121c1bd0fc59ea75e819924733f7ea98
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Wed Feb 22 08:32:54 2017 +0100
> 
>     Miscellaneous optimization group fixes
>     
>             gcc/doc/
>             * invoke.texi (-fopt-info): Document "omp".
>             * optinfo.texi (Optimization groups): Fix option used for
>             OPTGROUP_ALL.
>             gcc/
>             * dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
>             * hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
>             * ipa-hsa.c (pass_data_ipa_hsa): Likewise.
>             * omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [gomp4] add -finform-parallelism
  2017-02-22  8:28         ` Thomas Schwinge
@ 2017-02-23 15:06           ` Cesar Philippidis
  0 siblings, 0 replies; 36+ messages in thread
From: Cesar Philippidis @ 2017-02-23 15:06 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches; +Cc: Martin Jambor, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

On 02/22/2017 12:17 AM, Thomas Schwinge wrote:
> On Mon, 20 Feb 2017 20:42:59 -0800, Cesar Philippidis <cesar@codesourcery.com> wrote:

>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
> 
>> +/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
>> +   children.  */
>> +
>> +static void
>> +inform_oacc_loop (oacc_loop *loop)
>> +{
>> +  const char *seq = loop->mask == 0 ? " SEQ" : "";
>> +  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
>> +    ? " GANG" : "";
>> +  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
>> +    ? " WORKER" : "";
>> +  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
>> +    ? " VECTOR" : "";
>> +
>> +  inform (loop->loc, "ACC LOOP%s%s%s%s", seq, gang, worker, vector);
> 
> Likewise.

This is now lower case.

> Per
> <http://mid.mail-archive.com/8737f68y3r.fsf@euler.schwinge.homeip.net>
> I'm suggesting this to be done a bit differently: instead of "inform",
> this would then use the appropriate "-fopt-info-note-omp" option group
> output.

Thank you for finding that. I see that you want to rename
OPTGROUP_OPENMP to OPTGROUP_OMP. In order to keep gomp-4_0-branch
somewhat consistent with trunk, I keep the original OPENMP name. We can
fix that later.

> If that's not yet there, possibly there could be some new flag added for
> "-fopt-info" to display the "rich" location, which will also print the
> original source code?

No it doesn't support rich locations yet. But that's a good idea to
support it. But we can add that later. For the time being, the line
number shall suffice.

Cesar


[-- Attachment #2: gomp4-optgroup-info.diff --]
[-- Type: text/x-patch, Size: 7609 bytes --]

2017-02-23  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-low.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to display loop parallelism.

	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism.c: New test.
	* gfortran.dg/goacc/note-parallelism.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b25fe27..40f2003 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -20399,6 +20399,30 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+    ? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+    ? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+    ? " vector" : "";
+
+  dump_printf_loc (MSG_NOTE, loop->loc,
+		   "Detected parallelism <acc loop%s%s%s%s>\n", seq, gang,
+		   worker, vector);
+
+  if (loop->child)
+    inform_oacc_loop (loop->child);
+  if (loop->sibling)
+    inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
    structures as we go.  By construction these loops are properly
    nested.  */
@@ -21069,6 +21093,8 @@ execute_oacc_device_lower ()
       dump_oacc_loop (dump_file, loops, 0);
       fprintf (dump_file, "\n");
     }
+  if (dump_enabled_p () && loops->child)
+    inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
      dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 0000000..ddbce99
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,76 @@
+/* Test the output of -fopt-info-not-openmp.  */
+
+/* { dg-additional-options "-fopt-info-note-openmp" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop worker
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop vector
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang vector
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang worker
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop worker vector
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop gang worker vector
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc parallel loop
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc parallel loop gang
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop vector
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
+
+/* { dg-message "note-parallelism.c:10:9: note: Detected parallelism <acc loop seq>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:14:9: note: Detected parallelism <acc loop gang>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:18:9: note: Detected parallelism <acc loop worker>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:22:9: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:26:9: note: Detected parallelism <acc loop gang vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:30:9: note: Detected parallelism <acc loop gang worker>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:34:9: note: Detected parallelism <acc loop worker vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:38:9: note: Detected parallelism <acc loop gang worker vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:42:9: note: Detected parallelism <acc loop gang vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:46:9: note: Detected parallelism <acc loop gang worker>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:48:9: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:52:9: note: Detected parallelism <acc loop gang>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:54:9: note: Detected parallelism <acc loop worker>" "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:56:9: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90 b/gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
new file mode 100644
index 0000000..ae6f341
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
@@ -0,0 +1,77 @@
+! Test the output of -fopt-info-note-openmp.
+
+! { dg-additional-options "-fopt-info-note-openmp" }
+
+program test
+  implicit none
+
+  integer x, y, z
+
+  !$acc parallel loop seq
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop worker
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop vector
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang vector
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang worker
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop worker vector
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop gang worker vector
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop
+  do x = 1, 10
+  end do
+
+  !$acc parallel loop
+  do x = 1, 10
+     !$acc loop
+     do y = 1, 10
+     end do
+  end do
+
+  !$acc parallel loop gang
+  do x = 1, 10
+     !$acc loop worker
+     do y = 1, 10
+        !$acc loop vector
+        do z = 1, 10
+        end do
+     end do
+  end do
+end program test
+
+! { dg-message "note-parallelism.f90:10:0: note: Detected parallelism <acc loop seq>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:14:0: note: Detected parallelism <acc loop gang>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:18:0: note: Detected parallelism <acc loop worker>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:22:0: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:26:0: note: Detected parallelism <acc loop gang vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:30:0: note: Detected parallelism <acc loop gang worker>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:34:0: note: Detected parallelism <acc loop worker vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:38:0: note: Detected parallelism <acc loop gang worker vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:42:0: note: Detected parallelism <acc loop gang vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:46:0: note: Detected parallelism <acc loop gang worker>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:48:0: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:53:0: note: Detected parallelism <acc loop gang>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:55:0: note: Detected parallelism <acc loop worker>" "" { target *-*-* } 0 }
+! { dg-message "note-parallelism.f90:57:0: note: Detected parallelism <acc loop vector>" "" { target *-*-* } 0 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Rename the "openmp" group of optimizations to "omp" (was: Miscellaneous optimization group fixes)
  2017-02-22  9:53           ` Martin Jambor
@ 2017-02-28  8:52             ` Thomas Schwinge
  2017-02-28  9:04             ` Miscellaneous optimization group fixes Thomas Schwinge
  1 sibling, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-28  8:52 UTC (permalink / raw)
  To: Martin Jambor, GCC Patches; +Cc: Jakub Jelinek, Cesar Philippidis

Hi!

On Wed, 22 Feb 2017 10:38:02 +0100, Martin Jambor <mjambor@suse.cz> wrote:
> On Wed, Feb 22, 2017 at 08:58:06AM +0100, Thomas Schwinge wrote:
> > > 
> > >     Rename the "openmp" group of optimizations to "omp"
> > >     
> > >             gcc/
> > >             * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
> > >             all users.
> > >             * dumpfile.c (optgroup_options): Instead of "openmp", associate
> > >             OPTGROUP_OMP with "omp".
> 
> 
> I am of course fine with OPTGROUP_OMP.

Committed to trunk in r245768:

commit f57c8178d89b6b428853767657590c4fb907d1b8
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 28 08:41:55 2017 +0000

    Rename the "openmp" group of optimizations to "omp"
    
            gcc/
            * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
            all users.
            * dumpfile.c (optgroup_options): Instead of "openmp", associate
            OPTGROUP_OMP with "omp".
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245768 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog        | 7 +++++++
 gcc/doc/optinfo.texi | 5 +++--
 gcc/dumpfile.c       | 2 +-
 gcc/dumpfile.h       | 3 ++-
 gcc/omp-expand.c     | 4 ++--
 gcc/omp-low.c        | 4 ++--
 gcc/omp-offload.c    | 6 +++---
 7 files changed, 20 insertions(+), 11 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 9c1025d..b699944 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
+	all users.
+	* dumpfile.c (optgroup_options): Instead of "openmp", associate
+	OPTGROUP_OMP with "omp".
+
 2017-02-27  Pat Haugen  <pthaugen@us.ibm.com>
 
 	PR target/79544
diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index 415e9a9..cf6ce00 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -59,8 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
-@item OPTGROUP_OPENMP
-OpenMP passes. Enabled by @option{-openmp}.
+@item OPTGROUP_OMP
+OMP (Offloading and Multi Processing) passes. Enabled by
+@option{-omp}.
 
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
diff --git gcc/dumpfile.c gcc/dumpfile.c
index 2c5dce2..6b9a47c 100644
--- gcc/dumpfile.c
+++ gcc/dumpfile.c
@@ -140,7 +140,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
-  {"openmp", OPTGROUP_OPENMP},
+  {"omp", OPTGROUP_OMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git gcc/dumpfile.h gcc/dumpfile.h
index 7c8f7a2..3886f98 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -99,7 +99,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+					   Processing) transformations */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
diff --git gcc/omp-expand.c gcc/omp-expand.c
index 55e54e4..ea951d6 100644
--- gcc/omp-expand.c
+++ gcc/omp-expand.c
@@ -8134,7 +8134,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -8181,7 +8181,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
diff --git gcc/omp-low.c gcc/omp-low.c
index 35df02c..c2c69cb 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -8920,7 +8920,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp | PROP_gimple_lomp_dev, /* properties_provided */
@@ -9232,7 +9232,7 @@ const pass_data pass_data_diagnose_omp_blocks =
 {
   GIMPLE_PASS, /* type */
   "*diagnose_omp_blocks", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   0, /* properties_provided */
diff --git gcc/omp-offload.c gcc/omp-offload.c
index aed9e14..fad038f 100644
--- gcc/omp-offload.c
+++ gcc/omp-offload.c
@@ -1625,7 +1625,7 @@ const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
   "oaccdevlow", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
@@ -1727,7 +1727,7 @@ const pass_data pass_data_omp_device_lower =
 {
   GIMPLE_PASS, /* type */
   "ompdevlow", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   PROP_gimple_lomp_dev, /* properties_provided */
@@ -1771,7 +1771,7 @@ const pass_data pass_data_omp_target_link =
 {
   GIMPLE_PASS,			/* type */
   "omptargetlink",		/* name */
-  OPTGROUP_OPENMP,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   PROP_ssa,			/* properties_required */
   0,				/* properties_provided */


Backported to gomp-4_0-branch in r245770:

commit aa3c3ba6ca554614f69fe8e28b07165fc7ba31e6
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 28 08:45:53 2017 +0000

    Rename the "openmp" group of optimizations to "omp"
    
    Backport from trunk r245768:
    
            gcc/
            * dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
            all users.
            * dumpfile.c (optgroup_options): Instead of "openmp", associate
            OPTGROUP_OMP with "omp".
    
            gcc/testsuite/
            * c-c++-common/goacc/note-parallelism.c: Use "-fopt-info-note-omp"
            instead of "-fopt-info-note-openmp".
            * gfortran.dg/goacc/note-parallelism.f90: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245770 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                   | 10 ++++++++++
 gcc/doc/optinfo.texi                                 |  5 +++--
 gcc/dumpfile.c                                       |  2 +-
 gcc/dumpfile.h                                       |  3 ++-
 gcc/omp-low.c                                        | 12 ++++++------
 gcc/testsuite/ChangeLog.gomp                         |  6 ++++++
 gcc/testsuite/c-c++-common/goacc/note-parallelism.c  |  4 ++--
 gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90 |  4 ++--
 8 files changed, 32 insertions(+), 14 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 4492823..00a5b49 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,13 @@
+2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	Backport from trunk r245768:
+	2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
+	all users.
+	* dumpfile.c (optgroup_options): Instead of "openmp", associate
+	OPTGROUP_OMP with "omp".
+
 2017-02-27  Chung-Lin Tang  <cltang@codesourcery.com>
 	    Cesar Philippidis  <cesar@codesourcery.com>
 
diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index 20ca560..1cf3f41 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -59,8 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
-@item OPTGROUP_OPENMP
-OpenMP passes. Enabled by @option{-openmp}.
+@item OPTGROUP_OMP
+OMP (Offloading and Multi Processing) passes. Enabled by
+@option{-omp}.
 
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
diff --git gcc/dumpfile.c gcc/dumpfile.c
index f2430f3..007433d 100644
--- gcc/dumpfile.c
+++ gcc/dumpfile.c
@@ -136,7 +136,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
-  {"openmp", OPTGROUP_OPENMP},
+  {"omp", OPTGROUP_OMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git gcc/dumpfile.h gcc/dumpfile.h
index 72f696b..eb10db3 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -97,7 +97,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OPENMP      (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+					   Processing) transformations */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
                               | OPTGROUP_VEC | OPTGROUP_OTHER)
diff --git gcc/omp-low.c gcc/omp-low.c
index 73666d4..a646272 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -14783,7 +14783,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -14830,7 +14830,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -19099,7 +19099,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp, /* properties_provided */
@@ -19581,7 +19581,7 @@ const pass_data pass_data_diagnose_omp_blocks =
 {
   GIMPLE_PASS, /* type */
   "*diagnose_omp_blocks", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   0, /* properties_provided */
@@ -21264,7 +21264,7 @@ const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
   "oaccdevlow", /* name */
-  OPTGROUP_OPENMP, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
@@ -21309,7 +21309,7 @@ const pass_data pass_data_omp_target_link =
 {
   GIMPLE_PASS,			/* type */
   "omptargetlink",		/* name */
-  OPTGROUP_OPENMP,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   PROP_ssa,			/* properties_required */
   0,				/* properties_provided */
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 4e02ca0..f930824 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/note-parallelism.c: Use "-fopt-info-note-omp"
+	instead of "-fopt-info-note-openmp".
+	* gfortran.dg/goacc/note-parallelism.f90: Likewise.
+
 2017-02-23  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* c-c++-common/goacc/note-parallelism.c: New test.
diff --git gcc/testsuite/c-c++-common/goacc/note-parallelism.c gcc/testsuite/c-c++-common/goacc/note-parallelism.c
index ddbce99..485990e 100644
--- gcc/testsuite/c-c++-common/goacc/note-parallelism.c
+++ gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -1,6 +1,6 @@
-/* Test the output of -fopt-info-not-openmp.  */
+/* Test the output of -fopt-info-note-omp.  */
 
-/* { dg-additional-options "-fopt-info-note-openmp" } */
+/* { dg-additional-options "-fopt-info-note-omp" } */
 
 int
 main ()
diff --git gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90 gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
index ae6f341..93fb6a0 100644
--- gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
+++ gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
@@ -1,6 +1,6 @@
-! Test the output of -fopt-info-note-openmp.
+! Test the output of -fopt-info-note-omp.
 
-! { dg-additional-options "-fopt-info-note-openmp" }
+! { dg-additional-options "-fopt-info-note-omp" }
 
 program test
   implicit none


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Miscellaneous optimization group fixes
  2017-02-22  9:53           ` Martin Jambor
  2017-02-28  8:52             ` Rename the "openmp" group of optimizations to "omp" (was: Miscellaneous optimization group fixes) Thomas Schwinge
@ 2017-02-28  9:04             ` Thomas Schwinge
  1 sibling, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2017-02-28  9:04 UTC (permalink / raw)
  To: Martin Jambor, GCC Patches; +Cc: Jakub Jelinek, Cesar Philippidis

Hi!

On Wed, 22 Feb 2017 10:38:02 +0100, Martin Jambor <mjambor@suse.cz> wrote:
> On Wed, Feb 22, 2017 at 08:58:06AM +0100, Thomas Schwinge wrote:
> > On top of [...], OK to commit the following (not yet tested) -- these all
> > look like oversights to me, but please verify?
> 
> The missing documentation is an oversight.  Thanks for spotting it.

Committed to trunk in r245769:

commit 7246f4462796dfc937c203ba651d5d7ec4c7c89e
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 28 08:42:06 2017 +0000

    Miscellaneous optimization group fixes
    
            gcc/
            * doc/optinfo.texi (Optimization groups): Fix option used for
            OPTGROUP_ALL.
            * doc/invoke.texi (-fopt-info): Document "omp".
            * dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
            (OPTGROUP_ALL): Add OPTGROUP_OMP.
            * hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
            * ipa-hsa.c (pass_data_ipa_hsa): Likewise.
            * omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245769 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog        | 9 +++++++++
 gcc/doc/invoke.texi  | 2 ++
 gcc/doc/optinfo.texi | 2 +-
 gcc/dumpfile.h       | 6 +++---
 gcc/hsa-gen.c        | 2 +-
 gcc/ipa-hsa.c        | 2 +-
 gcc/omp-simd-clone.c | 2 +-
 7 files changed, 18 insertions(+), 7 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index b699944..cd95521 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,5 +1,14 @@
 2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* doc/optinfo.texi (Optimization groups): Fix option used for
+	OPTGROUP_ALL.
+	* doc/invoke.texi (-fopt-info): Document "omp".
+	* dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
+	(OPTGROUP_ALL): Add OPTGROUP_OMP.
+	* hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
+	* ipa-hsa.c (pass_data_ipa_hsa): Likewise.
+	* omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.
+
 	* dumpfile.h (OPTGROUP_OPENMP): Rename to OPTGROUP_OMP.  Adjust
 	all users.
 	* dumpfile.c (optgroup_options): Instead of "openmp", associate
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 1b9fdfe..d90c95c 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -13155,6 +13155,8 @@ Enable dumps from all interprocedural optimizations.
 Enable dumps from all loop optimizations.
 @item inline
 Enable dumps from all inlining optimizations.
+@item omp
+Enable dumps from all OMP (Offloading and Multi Processing) optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
 @item optall
diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index cf6ce00..e17cb37 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -70,7 +70,7 @@ Vectorization passes. Enabled by @option{-vec}.
 All other optimization passes which do not fall into one of the above.
 
 @item OPTGROUP_ALL
-All optimization passes. Enabled by @option{-all}.
+All optimization passes. Enabled by @option{-optall}.
 
 @end ftable
 
diff --git gcc/dumpfile.h gcc/dumpfile.h
index 3886f98..fef58f5 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -98,12 +98,12 @@ enum tree_dump_index
 #define OPTGROUP_IPA         (1 << 1)   /* IPA optimization passes */
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
-#define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+#define OPTGROUP_OMP         (1 << 4)   /* OMP (Offloading and Multi
 					   Processing) transformations */
+#define OPTGROUP_VEC         (1 << 5)   /* Vectorization passes */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-                              | OPTGROUP_VEC | OPTGROUP_OTHER)
+                              | OPTGROUP_OMP | OPTGROUP_VEC | OPTGROUP_OTHER)
 
 /* Define a tree dump switch.  */
 struct dump_file_info
diff --git gcc/hsa-gen.c gcc/hsa-gen.c
index 7721fcc..7b69d64 100644
--- gcc/hsa-gen.c
+++ gcc/hsa-gen.c
@@ -6474,7 +6474,7 @@ const pass_data pass_data_gen_hsail =
 {
   GIMPLE_PASS,
   "hsagen",	 			/* name */
-  OPTGROUP_NONE,			/* optinfo_flags */
+  OPTGROUP_OMP,				/* optinfo_flags */
   TV_NONE,				/* tv_id */
   PROP_cfg | PROP_ssa,			/* properties_required */
   0,					/* properties_provided */
diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c
index af70b0a..c02dada 100644
--- gcc/ipa-hsa.c
+++ gcc/ipa-hsa.c
@@ -289,7 +289,7 @@ const pass_data pass_data_ipa_hsa =
 {
   IPA_PASS, /* type */
   "hsa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_IPA_HSA, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
diff --git gcc/omp-simd-clone.c gcc/omp-simd-clone.c
index 09ad40b..99589d4 100644
--- gcc/omp-simd-clone.c
+++ gcc/omp-simd-clone.c
@@ -1690,7 +1690,7 @@ const pass_data pass_data_omp_simd_clone =
 {
   SIMPLE_IPA_PASS,		/* type */
   "simdclone",			/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   ( PROP_ssa | PROP_cfg ),	/* properties_required */
   0,				/* properties_provided */


Backported to gomp-4_0-branch in r245771:

commit c19f2d3949a534fab6e8e6385b56067dff48d5a9
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 28 08:46:04 2017 +0000

    Miscellaneous optimization group fixes
    
    Backport from trunk r245769:
    
            gcc/
            * doc/optinfo.texi (Optimization groups): Fix option used for
            OPTGROUP_ALL.
            * doc/invoke.texi (-fopt-info): Document "omp".
            * dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
            (OPTGROUP_ALL): Add OPTGROUP_OMP.
            * hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
            * ipa-hsa.c (pass_data_ipa_hsa): Likewise.
            * omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245771 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   | 12 ++++++++++++
 gcc/doc/invoke.texi  |  2 ++
 gcc/doc/optinfo.texi |  2 +-
 gcc/dumpfile.h       |  6 +++---
 gcc/hsa-gen.c        |  2 +-
 gcc/ipa-hsa.c        |  2 +-
 gcc/omp-simd-clone.c |  2 +-
 7 files changed, 21 insertions(+), 7 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 00a5b49..5b76668 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,17 @@
 2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
 
+	Backport from trunk r245769:
+	2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* doc/optinfo.texi (Optimization groups): Fix option used for
+	OPTGROUP_ALL.
+	* doc/invoke.texi (-fopt-info): Document "omp".
+	* dumpfile.h: Sort OPTGROUP_OMP before OPTGROUP_VEC.
+	(OPTGROUP_ALL): Add OPTGROUP_OMP.
+	* hsa-gen.c (pass_data_gen_hsail): Use OPTGROUP_OMP.
+	* ipa-hsa.c (pass_data_ipa_hsa): Likewise.
+	* omp-simd-clone.c (pass_data_omp_simd_clone): Likewise.
+
 	Backport from trunk r245768:
 	2017-02-28  Thomas Schwinge  <thomas@codesourcery.com>
 
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index fd8ba42..299dab1 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -12240,6 +12240,8 @@ Enable dumps from all interprocedural optimizations.
 Enable dumps from all loop optimizations.
 @item inline
 Enable dumps from all inlining optimizations.
+@item omp
+Enable dumps from all OMP (Offloading and Multi Processing) optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
 @item optall
diff --git gcc/doc/optinfo.texi gcc/doc/optinfo.texi
index 1cf3f41..ff4573d 100644
--- gcc/doc/optinfo.texi
+++ gcc/doc/optinfo.texi
@@ -70,7 +70,7 @@ Vectorization passes. Enabled by @option{-vec}.
 All other optimization passes which do not fall into one of the above.
 
 @item OPTGROUP_ALL
-All optimization passes. Enabled by @option{-all}.
+All optimization passes. Enabled by @option{-optall}.
 
 @end ftable
 
diff --git gcc/dumpfile.h gcc/dumpfile.h
index eb10db3..5d87239 100644
--- gcc/dumpfile.h
+++ gcc/dumpfile.h
@@ -96,12 +96,12 @@ enum tree_dump_index
 #define OPTGROUP_IPA         (1 << 1)   /* IPA optimization passes */
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
-#define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OMP         (1 << 5)   /* OMP (Offloading and Multi
+#define OPTGROUP_OMP         (1 << 4)   /* OMP (Offloading and Multi
 					   Processing) transformations */
+#define OPTGROUP_VEC         (1 << 5)   /* Vectorization passes */
 #define OPTGROUP_OTHER       (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	     (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-                              | OPTGROUP_VEC | OPTGROUP_OTHER)
+                              | OPTGROUP_OMP | OPTGROUP_VEC | OPTGROUP_OTHER)
 
 /* Define a tree dump switch.  */
 struct dump_file_info
diff --git gcc/hsa-gen.c gcc/hsa-gen.c
index cf7d434..0c54557 100644
--- gcc/hsa-gen.c
+++ gcc/hsa-gen.c
@@ -6373,7 +6373,7 @@ const pass_data pass_data_gen_hsail =
 {
   GIMPLE_PASS,
   "hsagen",	 			/* name */
-  OPTGROUP_NONE,			/* optinfo_flags */
+  OPTGROUP_OMP,				/* optinfo_flags */
   TV_NONE,				/* tv_id */
   PROP_cfg | PROP_ssa,			/* properties_required */
   0,					/* properties_provided */
diff --git gcc/ipa-hsa.c gcc/ipa-hsa.c
index 769657f..058ede8 100644
--- gcc/ipa-hsa.c
+++ gcc/ipa-hsa.c
@@ -284,7 +284,7 @@ const pass_data pass_data_ipa_hsa =
 {
   IPA_PASS, /* type */
   "hsa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OMP, /* optinfo_flags */
   TV_IPA_HSA, /* tv_id */
   0, /* properties_required */
   0, /* properties_provided */
diff --git gcc/omp-simd-clone.c gcc/omp-simd-clone.c
index 58ce6cb..3991ede 100644
--- gcc/omp-simd-clone.c
+++ gcc/omp-simd-clone.c
@@ -1659,7 +1659,7 @@ const pass_data pass_data_omp_simd_clone =
 {
   SIMPLE_IPA_PASS,		/* type */
   "simdclone",			/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OMP,			/* optinfo_flags */
   TV_NONE,			/* tv_id */
   ( PROP_ssa | PROP_cfg ),	/* properties_required */
   0,				/* properties_provided */


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [HSA] Avoid ICE when "HSA does not implement indirect calls" (was: [PATCH 4/4] Back-end and IPA bits of hsa branch merge)
       [not found]   ` <yxfpftb48jra.fsf@hertz.schwinge.homeip.net>
@ 2020-06-17 21:57     ` Thomas Schwinge
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2020-06-17 21:57 UTC (permalink / raw)
  To: Martin Jambor, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Hi!

It's been a while...  ;-)

On 2016-11-13T23:22:26+0100, Martin Jambor <mjambor@suse.cz> wrote:
> --- a/gcc/hsa-gen.c
> +++ b/gcc/hsa-gen.c

> @@ -5068,6 +5102,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
>    if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>      {
>        tree function_decl = gimple_call_fndecl (stmt);
> +      /* Prefetch pass can create type-mismatching prefetch builtin calls which
> +      fail the gimple_call_builtin_p test above.  Handle them here.  */
> +      if (DECL_BUILT_IN_CLASS (function_decl)
> +       && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
> +     return;
> +
>        if (function_decl == NULL_TREE)
>       {
>         HSA_SORRY_AT (gimple_location (stmt),

So we're now looking into 'function_decl' before the 'NULL_TREE' check --
not good; ICE.  As obvious, I've pushed '[HSA] Avoid ICE when "HSA does
not implement indirect calls"' to master branch in commit
973bce0fb50bbfd91f47238b82b99935525716ad, releases/gcc-10 branch in
commit ad3f0ec1a80ba6930fd17ad45a9e3ecd793e3f67, releases/gcc-9 branch in
commit b7a185371cb9e5ca07bfe5af9c65fbff874c76f6, and releases/gcc-8
branch in commit e7fad65109690ae41be04a23093cd7504d022d4c, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Attachment #2: 0001-HSA-Avoid-ICE-when-HSA-does-not-implement-indirect-c.patch --]
[-- Type: text/x-diff, Size: 5749 bytes --]

From 973bce0fb50bbfd91f47238b82b99935525716ad Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sat, 6 Jun 2020 18:44:34 +0200
Subject: [PATCH] [HSA] Avoid ICE when "HSA does not implement indirect calls"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made apparent by recent commit dc703151d4f4560e647649506d5b4ceb0ee11e90
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    #2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    #3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    #4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    #5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    #6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60e412fcf1245b4780871553cbdebb956a3 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.
---
 gcc/hsa-gen.c                                 |  9 +++----
 .../c-c++-common/gomp/hsa-indirect-call-1.c   | 24 +++++++++++++++++++
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 43baa2e82c8a..2af999048b22 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5299,10 +5299,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
-      /* Prefetch pass can create type-mismatching prefetch builtin calls which
-	 fail the gimple_call_builtin_p test above.  Handle them here.  */
-      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
-	return;
 
       if (function_decl == NULL_TREE)
 	{
@@ -5311,6 +5307,11 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  return;
 	}
 
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
+	return;
+
       if (hsa_callable_function_p (function_decl))
 	gen_hsa_insns_for_direct_call (stmt, hbb);
       else if (!gen_hsa_insns_for_known_library_call (stmt, hbb))
diff --git a/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
new file mode 100644
index 000000000000..67ee6af309a8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
@@ -0,0 +1,24 @@
+/* Instead of ICE, we'd like "HSA does not implement indirect calls".  */
+
+/* Reduced from 'libgomp.c/target-39.c'.  */
+
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-additional-options "-Whsa" } to override '{gcc,g++}.dg/gomp/gomp.exp'.  */
+
+typedef void (*fnp) (void);
+void f1 (void) { }
+fnp f2 (void) { return f1; }
+#pragma omp declare target to (f1, f2)
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    fnp fnp = f2 ();
+    fnp (); /* { dg-message "note: support for HSA does not implement indirect calls" } */
+  }
+  return 0;
+}
+
+/* { dg-warning "could not emit HSAIL for the function" "" { target *-*-* } 0 } */
-- 
2.27.0


[-- Attachment #3: 0001-HSA-Avoid-ICE-when-HSA-does-not-implement-indire.g10.patch --]
[-- Type: text/x-diff, Size: 5821 bytes --]

From ad3f0ec1a80ba6930fd17ad45a9e3ecd793e3f67 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sat, 6 Jun 2020 18:44:34 +0200
Subject: [PATCH] [HSA] Avoid ICE when "HSA does not implement indirect calls"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made apparent by recent commit dc703151d4f4560e647649506d5b4ceb0ee11e90
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    #2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    #3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    #4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    #5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    #6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60e412fcf1245b4780871553cbdebb956a3 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0fb50bbfd91f47238b82b99935525716ad)
---
 gcc/hsa-gen.c                                 |  9 +++----
 .../c-c++-common/gomp/hsa-indirect-call-1.c   | 24 +++++++++++++++++++
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index d407bcf503ad..767badab6e89 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5299,10 +5299,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
-      /* Prefetch pass can create type-mismatching prefetch builtin calls which
-	 fail the gimple_call_builtin_p test above.  Handle them here.  */
-      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
-	return;
 
       if (function_decl == NULL_TREE)
 	{
@@ -5311,6 +5307,11 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  return;
 	}
 
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
+	return;
+
       if (hsa_callable_function_p (function_decl))
 	gen_hsa_insns_for_direct_call (stmt, hbb);
       else if (!gen_hsa_insns_for_known_library_call (stmt, hbb))
diff --git a/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
new file mode 100644
index 000000000000..67ee6af309a8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
@@ -0,0 +1,24 @@
+/* Instead of ICE, we'd like "HSA does not implement indirect calls".  */
+
+/* Reduced from 'libgomp.c/target-39.c'.  */
+
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-additional-options "-Whsa" } to override '{gcc,g++}.dg/gomp/gomp.exp'.  */
+
+typedef void (*fnp) (void);
+void f1 (void) { }
+fnp f2 (void) { return f1; }
+#pragma omp declare target to (f1, f2)
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    fnp fnp = f2 ();
+    fnp (); /* { dg-message "note: support for HSA does not implement indirect calls" } */
+  }
+  return 0;
+}
+
+/* { dg-warning "could not emit HSAIL for the function" "" { target *-*-* } 0 } */
-- 
2.27.0


[-- Attachment #4: 0001-HSA-Avoid-ICE-when-HSA-does-not-implement-indirec.g9.patch --]
[-- Type: text/x-diff, Size: 5821 bytes --]

From b7a185371cb9e5ca07bfe5af9c65fbff874c76f6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sat, 6 Jun 2020 18:44:34 +0200
Subject: [PATCH] [HSA] Avoid ICE when "HSA does not implement indirect calls"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made apparent by recent commit dc703151d4f4560e647649506d5b4ceb0ee11e90
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    #2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    #3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    #4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    #5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    #6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60e412fcf1245b4780871553cbdebb956a3 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0fb50bbfd91f47238b82b99935525716ad)
---
 gcc/hsa-gen.c                                 |  9 +++----
 .../c-c++-common/gomp/hsa-indirect-call-1.c   | 24 +++++++++++++++++++
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index edcec10f49fd..5d4bc1979c75 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5298,10 +5298,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
-      /* Prefetch pass can create type-mismatching prefetch builtin calls which
-	 fail the gimple_call_builtin_p test above.  Handle them here.  */
-      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
-	return;
 
       if (function_decl == NULL_TREE)
 	{
@@ -5310,6 +5306,11 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  return;
 	}
 
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
+	return;
+
       if (hsa_callable_function_p (function_decl))
 	gen_hsa_insns_for_direct_call (stmt, hbb);
       else if (!gen_hsa_insns_for_known_library_call (stmt, hbb))
diff --git a/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
new file mode 100644
index 000000000000..67ee6af309a8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
@@ -0,0 +1,24 @@
+/* Instead of ICE, we'd like "HSA does not implement indirect calls".  */
+
+/* Reduced from 'libgomp.c/target-39.c'.  */
+
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-additional-options "-Whsa" } to override '{gcc,g++}.dg/gomp/gomp.exp'.  */
+
+typedef void (*fnp) (void);
+void f1 (void) { }
+fnp f2 (void) { return f1; }
+#pragma omp declare target to (f1, f2)
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    fnp fnp = f2 ();
+    fnp (); /* { dg-message "note: support for HSA does not implement indirect calls" } */
+  }
+  return 0;
+}
+
+/* { dg-warning "could not emit HSAIL for the function" "" { target *-*-* } 0 } */
-- 
2.27.0


[-- Attachment #5: 0001-HSA-Avoid-ICE-when-HSA-does-not-implement-indirec.g8.patch --]
[-- Type: text/x-diff, Size: 5917 bytes --]

From e7fad65109690ae41be04a23093cd7504d022d4c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Sat, 6 Jun 2020 18:44:34 +0200
Subject: [PATCH] [HSA] Avoid ICE when "HSA does not implement indirect calls"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made apparent by recent commit dc703151d4f4560e647649506d5b4ceb0ee11e90
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    #2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    #3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    #4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    #5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    #6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60e412fcf1245b4780871553cbdebb956a3 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0fb50bbfd91f47238b82b99935525716ad)
---
 gcc/hsa-gen.c                                 | 11 +++++----
 .../c-c++-common/gomp/hsa-indirect-call-1.c   | 24 +++++++++++++++++++
 2 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 7974fffe360f..5a4b38d717ba 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5251,11 +5251,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
-      /* Prefetch pass can create type-mismatching prefetch builtin calls which
-	 fail the gimple_call_builtin_p test above.  Handle them here.  */
-      if (DECL_BUILT_IN_CLASS (function_decl)
-	  && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
-	return;
 
       if (function_decl == NULL_TREE)
 	{
@@ -5264,6 +5259,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  return;
 	}
 
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (DECL_BUILT_IN_CLASS (function_decl)
+	  && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
+	return;
+
       if (hsa_callable_function_p (function_decl))
 	gen_hsa_insns_for_direct_call (stmt, hbb);
       else if (!gen_hsa_insns_for_known_library_call (stmt, hbb))
diff --git a/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
new file mode 100644
index 000000000000..67ee6af309a8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/hsa-indirect-call-1.c
@@ -0,0 +1,24 @@
+/* Instead of ICE, we'd like "HSA does not implement indirect calls".  */
+
+/* Reduced from 'libgomp.c/target-39.c'.  */
+
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-additional-options "-Whsa" } to override '{gcc,g++}.dg/gomp/gomp.exp'.  */
+
+typedef void (*fnp) (void);
+void f1 (void) { }
+fnp f2 (void) { return f1; }
+#pragma omp declare target to (f1, f2)
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    fnp fnp = f2 ();
+    fnp (); /* { dg-message "note: support for HSA does not implement indirect calls" } */
+  }
+  return 0;
+}
+
+/* { dg-warning "could not emit HSAIL for the function" "" { target *-*-* } 0 } */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2016-11-22 13:27     ` Martin Jambor
  2016-11-22 14:13       ` Jakub Jelinek
@ 2021-01-14 14:50       ` Thomas Schwinge
  2021-01-19 11:37         ` Martin Jambor
                           ` (2 more replies)
  1 sibling, 3 replies; 36+ messages in thread
From: Thomas Schwinge @ 2021-01-14 14:50 UTC (permalink / raw)
  To: gcc-patches, Martin Jambor, Andrew Stubbs, Julian Brown
  Cc: Martin Liska, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3982 bytes --]

Hi!

I'm raising here an issue with HSA libgomp plugin code changes from a
while ago.  While HSA is now no longer relevant for GCC master branch,
the same code has also been copied into the GCN libgomp plugin.

This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
build dependence on HSA run-time":

On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
> --- a/libgomp/plugin/configfrag.ac
> +++ b/libgomp/plugin/configfrag.ac

> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>               tgt_name=hsa
>               PLUGIN_HSA=$tgt
>               PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
> -             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
> -             PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
> +             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
> +             PLUGIN_HSA_LIBS="-ldl"

So this switched from directly linking against 'libhsa-runtime64.so' to a
'libdl'-based runtime linking variant.

Previously, 'libhsa-runtime64.so' would've been found at run time via the
standard search paths.

> +if test "$HSA_RUNTIME_LIB" != ""; then
> +  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
> +fi
> +
> +AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
> +  [Define path to HSA runtime.])

That's new, to propagate '--with-hsa-runtime'/'--with-hsa-runtime-lib'
into the HSA plugin source code.

> --- a/libgomp/plugin/plugin-hsa.c
> +++ b/libgomp/plugin/plugin-hsa.c

> +static const char *hsa_runtime_lib;

>  static void
>  init_enviroment_variables (void)
>  {

> +  hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");

Unless overridden via the 'HSA_RUNTIME_LIB' environment variable...

> +  if (hsa_runtime_lib == NULL)
> +    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";

... we now default to '[HSA_RUNTIME_LIB]/libhsa-runtime64.so' (note
'HSA_RUNTIME_LIB' prefix!)...

> +static bool
> +init_hsa_runtime_functions (void)
> +{
> +  void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);

..., which is then 'dlopen'ed here.

That means, contrary to before, the GCC configure-time
'--with-hsa-runtime' (by definition only valid for GCC configure/build as
well as build-tree testing) leaks into the installed HSA libgomp plugin.
That's a problem if your GCC build system (and build-tree testing)
requires '--with-hsa-runtime' to specify a non-standard location (not in
default search paths) but that location is not valid on your GCC
deployment system (but it has leaked into the HSA libgomp plugin),
meaning that (unless overridden via the 'HSA_RUNTIME_LIB' environment
variable) 'libhsa-runtime64.so' is now no longer found via the standard
search paths, because of the 'HSA_RUNTIME_LIB' prefix passed into
'dlopen'.

Per my understanding this cannot be intentional, so I suggest to restore
the previous behavior as per the attached "libgomp HSA/GCN plugins: don't
prepend the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'".  OK to push
such changes?  I was tempted to push "as obvious", but maybe I fail to
see the rationale behind this change?

For avoidance of doubt, this change doesn't affect (build-tree) testsuite
usage, where we have:

    libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"

    libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"

And, another data point:

    gcc/config/gcn/gcn-run.c:#define HSA_RUNTIME_LIB "libhsa-runtime64.so.1"
    [...]
    gcc/config/gcn/gcn-run.c:  void *handle = dlopen (HSA_RUNTIME_LIB, RTLD_LAZY);

Here, 'libhsa-runtime64.so.1' is 'dlopen'ed without prefix, and thus
found via the standard search paths (as expected).


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_RU.g10.patch --]
[-- Type: text/x-diff, Size: 3037 bytes --]

From 936e7ee10349a6be2bd0a6a2198f70239a8e1ec1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/plugin-gcn.c (init_environment_variables): Likewise.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* configure: Regenerate.
---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-gcn.c  |  2 +-
 libgomp/plugin/plugin-hsa.c  |  2 +-
 4 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index d8d98f182d4..9765a9068fe 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15483,16 +15483,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index fc91702a434..69a3cf4aeaf 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -310,10 +310,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4c6a4c03b6e..d919de191fc 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1074,7 +1074,7 @@ init_environment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("GCN_SUPPORT_CPU_DEVICES");
 
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index abd3bc64163..41951c464ef 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -254,7 +254,7 @@ init_enviroment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2021-01-14 14:50       ` Thomas Schwinge
@ 2021-01-19 11:37         ` Martin Jambor
  2021-01-19 12:49           ` Martin Liška
  2021-03-25 13:40           ` Thomas Schwinge
  2022-04-06  9:20         ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
  2 siblings, 2 replies; 36+ messages in thread
From: Martin Jambor @ 2021-01-19 11:37 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, Andrew Stubbs, Julian Brown
  Cc: Martin Liska, Jakub Jelinek

Hi Thomas,

On Thu, Jan 14 2021, Thomas Schwinge wrote:
> Hi!
>
> I'm raising here an issue with HSA libgomp plugin code changes from a
> while ago.  While HSA is now no longer relevant for GCC master branch,
> the same code has also been copied into the GCN libgomp plugin.
>
> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
> build dependence on HSA run-time":
>
> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>> --- a/libgomp/plugin/configfrag.ac
>> +++ b/libgomp/plugin/configfrag.ac
>
>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>               tgt_name=hsa
>>               PLUGIN_HSA=$tgt
>>               PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>> -             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>> -             PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>> +             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>> +             PLUGIN_HSA_LIBS="-ldl"
>
> So this switched from directly linking against 'libhsa-runtime64.so' to a
> 'libdl'-based runtime linking variant.
>
> Previously, 'libhsa-runtime64.so' would've been found at run time via the
> standard search paths.
>
>> +if test "$HSA_RUNTIME_LIB" != ""; then
>> +  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
>> +fi
>> +
>> +AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
>> +  [Define path to HSA runtime.])
>
> That's new, to propagate '--with-hsa-runtime'/'--with-hsa-runtime-lib'
> into the HSA plugin source code.
>
>> --- a/libgomp/plugin/plugin-hsa.c
>> +++ b/libgomp/plugin/plugin-hsa.c
>
>> +static const char *hsa_runtime_lib;
>
>>  static void
>>  init_enviroment_variables (void)
>>  {
>
>> +  hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
>
> Unless overridden via the 'HSA_RUNTIME_LIB' environment variable...
>
>> +  if (hsa_runtime_lib == NULL)
>> +    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
>
> ... we now default to '[HSA_RUNTIME_LIB]/libhsa-runtime64.so' (note
> 'HSA_RUNTIME_LIB' prefix!)...
>
>> +static bool
>> +init_hsa_runtime_functions (void)
>> +{
>> +  void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
>
> ..., which is then 'dlopen'ed here.
>
> That means, contrary to before, the GCC configure-time
> '--with-hsa-runtime' (by definition only valid for GCC configure/build as
> well as build-tree testing) leaks into the installed HSA libgomp plugin.
> That's a problem if your GCC build system (and build-tree testing)
> requires '--with-hsa-runtime' to specify a non-standard location (not in
> default search paths) but that location is not valid on your GCC
> deployment system (but it has leaked into the HSA libgomp plugin),
> meaning that (unless overridden via the 'HSA_RUNTIME_LIB' environment
> variable) 'libhsa-runtime64.so' is now no longer found via the standard
> search paths, because of the 'HSA_RUNTIME_LIB' prefix passed into
> 'dlopen'.
>
> Per my understanding this cannot be intentional, so I suggest to restore
> the previous behavior as per the attached "libgomp HSA/GCN plugins:
> don't

I honestly do not remember, it is quote possible.  I'm not quite sure
what you mean by "previous behavior" (the previous behavior was static
linking, no?) though.


> prepend the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'".  OK to push
> such changes?  I was tempted to push "as obvious", but maybe I fail to
> see the rationale behind this change?
>
> For avoidance of doubt, this change doesn't affect (build-tree) testsuite
> usage, where we have:
>
>     libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
>
>     libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"
>
> And, another data point:
>
>     gcc/config/gcn/gcn-run.c:#define HSA_RUNTIME_LIB "libhsa-runtime64.so.1"
>     [...]
>     gcc/config/gcn/gcn-run.c:  void *handle = dlopen (HSA_RUNTIME_LIB, RTLD_LAZY);
>
> Here, 'libhsa-runtime64.so.1' is 'dlopen'ed without prefix, and thus
> found via the standard search paths (as expected).
>

Right.  From what I can tell at the moment, which is not much, the idea
was to be able to load it even from a non-standard path and specify that
path at configure time.  If people think that is not useful and is
actually harmful, I guess it can go.

Martin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2021-01-19 11:37         ` Martin Jambor
@ 2021-01-19 12:49           ` Martin Liška
  2021-03-25 13:40           ` Thomas Schwinge
  1 sibling, 0 replies; 36+ messages in thread
From: Martin Liška @ 2021-01-19 12:49 UTC (permalink / raw)
  To: Martin Jambor, Thomas Schwinge, gcc-patches, Andrew Stubbs, Julian Brown
  Cc: Jakub Jelinek

On 1/19/21 12:37 PM, Martin Jambor wrote:
> Right.  From what I can tell at the moment, which is not much, the idea
> was to be able to load it even from a non-standard path and specify that
> path at configure time.  If people think that is not useful and is
> actually harmful, I guess it can go.

And if I remember correctly, the dlopen approach was motivated by fact
that we didn't want to have HSA runtime as a build dependency, but rather
a run-time dependency. So it was done for packaging reasons.

Martin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] Remove build dependence on HSA run-time
  2021-01-19 11:37         ` Martin Jambor
  2021-01-19 12:49           ` Martin Liška
@ 2021-03-25 13:40           ` Thomas Schwinge
  1 sibling, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2021-03-25 13:40 UTC (permalink / raw)
  To: Martin Jambor, gcc-patches, Andrew Stubbs, Julian Brown,
	Martin Liška
  Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 6485 bytes --]

Hi!

On 2021-01-19T12:37:56+0100, Martin Jambor <mjambor@suse.cz> wrote:
> On Thu, Jan 14 2021, Thomas Schwinge wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
>>
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>               tgt_name=hsa
>>>               PLUGIN_HSA=$tgt
>>>               PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -             PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +             PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +             PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
>>
>> Previously, 'libhsa-runtime64.so' would've been found at run time via the
>> standard search paths.
>>
>>> +if test "$HSA_RUNTIME_LIB" != ""; then
>>> +  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
>>> +fi
>>> +
>>> +AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
>>> +  [Define path to HSA runtime.])
>>
>> That's new, to propagate '--with-hsa-runtime'/'--with-hsa-runtime-lib'
>> into the HSA plugin source code.
>>
>>> --- a/libgomp/plugin/plugin-hsa.c
>>> +++ b/libgomp/plugin/plugin-hsa.c
>>
>>> +static const char *hsa_runtime_lib;
>>
>>>  static void
>>>  init_enviroment_variables (void)
>>>  {
>>
>>> +  hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
>>
>> Unless overridden via the 'HSA_RUNTIME_LIB' environment variable...
>>
>>> +  if (hsa_runtime_lib == NULL)
>>> +    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
>>
>> ... we now default to '[HSA_RUNTIME_LIB]/libhsa-runtime64.so' (note
>> 'HSA_RUNTIME_LIB' prefix!)...
>>
>>> +static bool
>>> +init_hsa_runtime_functions (void)
>>> +{
>>> +  void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY);
>>
>> ..., which is then 'dlopen'ed here.
>>
>> That means, contrary to before, the GCC configure-time
>> '--with-hsa-runtime' (by definition only valid for GCC configure/build as
>> well as build-tree testing) leaks into the installed HSA libgomp plugin.
>> That's a problem if your GCC build system (and build-tree testing)
>> requires '--with-hsa-runtime' to specify a non-standard location (not in
>> default search paths) but that location is not valid on your GCC
>> deployment system (but it has leaked into the HSA libgomp plugin),
>> meaning that (unless overridden via the 'HSA_RUNTIME_LIB' environment
>> variable) 'libhsa-runtime64.so' is now no longer found via the standard
>> search paths, because of the 'HSA_RUNTIME_LIB' prefix passed into
>> 'dlopen'.
>>
>> Per my understanding this cannot be intentional, so I suggest to restore
>> the previous behavior as per the attached "libgomp HSA/GCN plugins:
>> don't
>
> I honestly do not remember, it is quote possible.  I'm not quite sure
> what you mean by "previous behavior"

Sorry if that was unclear: I meant "previous behavior" as user-visible
behavior, where (not how) 'libhsa-runtime64.so' is searched/loaded.

> (the previous behavior was static
> linking, no?) though.

Before commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
build dependence on HSA run-time": '-lhsa-runtime64' (linking against
shared library, I suppose), so at run-time 'libhsa-runtime64.so' is found
via standard serach paths.

After commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
build dependence on HSA run-time":
'dlopen("[HSA_RUNTIME_LIB]/libhsa-runtime64.so")', so at run-time
'dlopen's 'libhsa-runtime64.so' in 'HSA_RUNTIME_LIB' (as configured by
'--with-hsa-runtime'/'--with-hsa-runtime-lib').

In "libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB' path to
'libhsa-runtime64.so'" I now did (as posted) "restore the previous
behavior" ;-) -- that is: 'dlopen("libhsa-runtime64.so")', so at run-time
'libhsa-runtime64.so' is again found via standard serach paths.

>> prepend the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'".  OK to push
>> such changes?  I was tempted to push "as obvious", but maybe I fail to
>> see the rationale behind this change?
>>
>> For avoidance of doubt, this change doesn't affect (build-tree) testsuite
>> usage, where we have:
>>
>>     libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
>>
>>     libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"
>>
>> And, another data point:
>>
>>     gcc/config/gcn/gcn-run.c:#define HSA_RUNTIME_LIB "libhsa-runtime64.so.1"
>>     [...]
>>     gcc/config/gcn/gcn-run.c:  void *handle = dlopen (HSA_RUNTIME_LIB, RTLD_LAZY);
>>
>> Here, 'libhsa-runtime64.so.1' is 'dlopen'ed without prefix, and thus
>> found via the standard search paths (as expected).
>>
>
> Right.  From what I can tell at the moment, which is not much, the idea
> was to be able to load it even from a non-standard path and specify that
> path at configure time.  If people think that is not useful and is
> actually harmful, I guess it can go.

OK, thanks.  Pushed to master branch in commit
7c1e856bedb4ae190c420ec2d2ca5e08730cf21d, releases/gcc-10 branch in
commit e950dfef6623576e44c1c4382441f2e6fabba064, releases/gcc-9 branch in
commit 75e7d34bbf6219f3087567a60ebabb99e1e84995, releases/gcc-8 branch in
commit 9b49fc1fc97e37182b2c24886e0f7f45410f67f1, and devel/omp/gcc-10
branch in commit 312ed310cf68c6f28ecba0b439cfa7252d0d213b, see attached.

On 2021-01-19T13:49:57+0100, Martin Liška <mliska@suse.cz> wrote:
> And if I remember correctly, the dlopen approach was motivated by fact
> that we didn't want to have HSA runtime as a build dependency, but rather
> a run-time dependency. So it was done for packaging reasons.

ACK, that aspect is certainly fine.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_RUNTIM.patch --]
[-- Type: text/x-diff, Size: 2957 bytes --]

From 7c1e856bedb4ae190c420ec2d2ca5e08730cf21d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-gcn.c (init_environment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.
---
 libgomp/config.h.in          |  3 ---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-gcn.c  |  2 +-
 4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 390e548cf590..03123dc1e60c 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -130,9 +130,6 @@
 /* Define to 1 if you have the `__secure_getenv' function. */
 #undef HAVE___SECURE_GETENV
 
-/* Define path to HSA runtime. */
-#undef HSA_RUNTIME_LIB
-
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
diff --git a/libgomp/configure b/libgomp/configure
index 22123f95d874..1917d7e273b0 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15448,16 +15448,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 1ab17778f0dd..f447def3f283 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -272,10 +272,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 8e6af69988ee..8aab708b0efe 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1072,7 +1072,7 @@ init_environment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so.1";
+    hsa_runtime_lib = "libhsa-runtime64.so.1";
 
   support_cpu_devices = secure_getenv ("GCN_SUPPORT_CPU_DEVICES");
 
-- 
2.30.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_RU.g10.patch --]
[-- Type: text/x-diff, Size: 3607 bytes --]

From e950dfef6623576e44c1c4382441f2e6fabba064 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/plugin-gcn.c (init_environment_variables): Likewise.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.

(cherry picked from commit 7c1e856bedb4ae190c420ec2d2ca5e08730cf21d)
---
 libgomp/config.h.in          |  3 ---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-gcn.c  |  2 +-
 libgomp/plugin/plugin-hsa.c  |  2 +-
 5 files changed, 2 insertions(+), 22 deletions(-)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 2d50fcd5c1a7..faf00d979089 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -130,9 +130,6 @@
 /* Define to 1 if you have the `__secure_getenv' function. */
 #undef HAVE___SECURE_GETENV
 
-/* Define path to HSA runtime. */
-#undef HSA_RUNTIME_LIB
-
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
diff --git a/libgomp/configure b/libgomp/configure
index 73f4a309f552..5240f7e9d399 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15502,16 +15502,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 7eb137472c2d..f85786515df0 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -327,10 +327,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4c6a4c03b6e5..d919de191fce 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1074,7 +1074,7 @@ init_environment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("GCN_SUPPORT_CPU_DEVICES");
 
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index abd3bc64163b..41951c464ef8 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -254,7 +254,7 @@ init_enviroment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
-- 
2.30.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_RUN.g9.patch --]
[-- Type: text/x-diff, Size: 3020 bytes --]

From 75e7d34bbf6219f3087567a60ebabb99e1e84995 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.

(cherry picked from commit 7c1e856bedb4ae190c420ec2d2ca5e08730cf21d)
---
 libgomp/config.h.in          |  3 ---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-hsa.c  |  2 +-
 4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 73f1b12805e3..714020db03a9 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -130,9 +130,6 @@
 /* Define to 1 if you have the `__secure_getenv' function. */
 #undef HAVE___SECURE_GETENV
 
-/* Define path to HSA runtime. */
-#undef HSA_RUNTIME_LIB
-
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
diff --git a/libgomp/configure b/libgomp/configure
index de31f97c2c6c..6f19f8c58220 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15813,16 +15813,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 77e1cda1a737..1f52901d4be2 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -292,10 +292,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index e0bc87c9552f..fbb3609c8117 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -244,7 +244,7 @@ init_enviroment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
-- 
2.30.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_RUN.g8.patch --]
[-- Type: text/x-diff, Size: 3020 bytes --]

From 9b49fc1fc97e37182b2c24886e0f7f45410f67f1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.

(cherry picked from commit 7c1e856bedb4ae190c420ec2d2ca5e08730cf21d)
---
 libgomp/config.h.in          |  3 ---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-hsa.c  |  2 +-
 4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e7bc4d973744..44572747c566 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -109,9 +109,6 @@
 /* Define to 1 if you have the `__secure_getenv' function. */
 #undef HAVE___SECURE_GETENV
 
-/* Define path to HSA runtime. */
-#undef HSA_RUNTIME_LIB
-
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
diff --git a/libgomp/configure b/libgomp/configure
index 2529a8e06037..b731d04f9194 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15562,16 +15562,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index d3470f82f8ce..c001c847ab43 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -274,10 +274,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index 7d279151e2ae..310a3cad4957 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -244,7 +244,7 @@ init_enviroment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
-- 
2.30.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: 0001-libgomp-HSA-GCN-plugins-don-t-prepend-the-HSA_R.og10.patch --]
[-- Type: text/x-diff, Size: 4306 bytes --]

From 312ed310cf68c6f28ecba0b439cfa7252d0d213b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 25 Jun 2020 11:59:42 +0200
Subject: [PATCH] libgomp HSA/GCN plugins: don't prepend the 'HSA_RUNTIME_LIB'
 path to 'libhsa-runtime64.so'

For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.

	libgomp/
	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
	* plugin/plugin-gcn.c (init_environment_variables): Likewise.
	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
	* config.h.in: Regenerate.
	* configure: Likewise.

(cherry picked from commit 7c1e856bedb4ae190c420ec2d2ca5e08730cf21d)
---
 libgomp/ChangeLog.omp        |  7 +++++++
 libgomp/config.h.in          |  3 ---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac |  7 -------
 libgomp/plugin/plugin-gcn.c  |  2 +-
 libgomp/plugin/plugin-hsa.c  |  2 +-
 6 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 05788d5c27a2..e8e1ce96fc3b 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,12 @@
 2021-03-25  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* plugin/plugin-hsa.c (init_enviroment_variables): Don't prepend
+	the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
+	* plugin/plugin-gcn.c (init_environment_variables): Likewise.
+	* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
+	* config.h.in: Regenerate.
+	* configure: Likewise.
+
 	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
 	OpenACC 'serial' construct diagnostic for nvptx offloading.
 
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 8de69c2513c4..8b65e5a293ea 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -130,9 +130,6 @@
 /* Define to 1 if you have the `__secure_getenv' function. */
 #undef HAVE___SECURE_GETENV
 
-/* Define path to HSA runtime. */
-#undef HSA_RUNTIME_LIB
-
 /* Define to 1 if GNU symbol versioning is used for libgomp. */
 #undef LIBGOMP_GNU_SYMBOL_VERSIONING
 
diff --git a/libgomp/configure b/libgomp/configure
index 62fc67e8dfec..8e673edf5cd5 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15509,16 +15509,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-
-cat >>confdefs.h <<_ACEOF
-#define HSA_RUNTIME_LIB "$HSA_RUNTIME_LIB"
-_ACEOF
-
-
 
 # Check for functions needed.
 for ac_func in getloadavg clock_gettime strtoull
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index fc91702a4344..69a3cf4aeaf2 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -310,10 +310,3 @@ AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
 AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
   [Define to 1 if the GCN plugin is built, 0 if not.])
-
-if test "$HSA_RUNTIME_LIB" != ""; then
-  HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
-fi
-
-AC_DEFINE_UNQUOTED([HSA_RUNTIME_LIB], ["$HSA_RUNTIME_LIB"],
-  [Define path to HSA runtime.])
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d2786c651385..5d96b33351b8 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1074,7 +1074,7 @@ init_environment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so.1";
+    hsa_runtime_lib = "libhsa-runtime64.so.1";
 
   support_cpu_devices = secure_getenv ("GCN_SUPPORT_CPU_DEVICES");
 
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index bddb690ca14f..800da9a3273c 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -254,7 +254,7 @@ init_enviroment_variables (void)
 
   hsa_runtime_lib = secure_getenv ("HSA_RUNTIME_LIB");
   if (hsa_runtime_lib == NULL)
-    hsa_runtime_lib = HSA_RUNTIME_LIB "libhsa-runtime64.so";
+    hsa_runtime_lib = "libhsa-runtime64.so";
 
   support_cpu_devices = secure_getenv ("HSA_SUPPORT_CPU_DEVICES");
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2021-01-14 14:50       ` Thomas Schwinge
  2021-01-19 11:37         ` Martin Jambor
@ 2022-04-06  9:20         ` Thomas Schwinge
  2022-04-06  9:24           ` Jakub Jelinek
  2022-04-28 13:48           ` [PING] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
  2 siblings, 2 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-04-06  9:20 UTC (permalink / raw)
  To: Andrew Stubbs, Julian Brown, gcc-patches
  Cc: Martin Jambor, Martin Liska, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

Hi!

On 2021-01-14T15:50:23+0100, I wrote:
> I'm raising here an issue with HSA libgomp plugin code changes from a
> while ago.  While HSA is now no longer relevant for GCC master branch,
> the same code has also been copied into the GCN libgomp plugin.

Here is another small clean-up patch (to enable further clean-up):

> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
> build dependence on HSA run-time":
>
> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>> --- a/libgomp/plugin/configfrag.ac
>> +++ b/libgomp/plugin/configfrag.ac
>
>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>              tgt_name=hsa
>>              PLUGIN_HSA=$tgt
>>              PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>> -            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>> -            PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>> +            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>> +            PLUGIN_HSA_LIBS="-ldl"
>
> So this switched from directly linking against 'libhsa-runtime64.so' to a
> 'libdl'-based runtime linking variant.

(Not intending to change anything regarding that.)

> For avoidance of doubt, [an earlier] change doesn't affect (build-tree) testsuite
> usage, where we have:
>
>     libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
>
>     libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"

But, as I argue in the attached "libgomp testsuite: Don't amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library", we should
actually clean this up as well.  OK to push that?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-testsuite-Don-t-amend-LD_LIBRARY_PATH-for-sy.patch --]
[-- Type: text/x-diff, Size: 2573 bytes --]

From 364d01339883f5276ef09d68a5d9a2e0010ab641 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 10:39:56 +0200
Subject: [PATCH] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.

'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:

    # For build-tree testing, also consider the library paths used for builing.
    # For installed testing, we assume all that to be provided in the sysroot.
    if { $blddir != "" } {
        [...]
        global hsa_runtime_lib
        if { $hsa_runtime_lib != "" } {
            append always_ld_library_path ":$hsa_runtime_lib"
        }
    }

However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing".  It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Don't
	'append always_ld_library_path ":$hsa_runtime_lib"'.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
---
 libgomp/testsuite/lib/libgomp.exp             | 4 ----
 libgomp/testsuite/libgomp-test-support.exp.in | 1 -
 2 files changed, 5 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 8c5ecfff0ac..0aaa58f19c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -202,10 +202,6 @@ proc libgomp_init { args } {
 	    lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
-	global hsa_runtime_lib
-	if { $hsa_runtime_lib != "" } {
-	    append always_ld_library_path ":$hsa_runtime_lib"
-	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 98fb442b537..3c88d1d5a62 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
-set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
 
 set offload_plugins "@offload_plugins@"
 set offload_targets "@offload_targets@"
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2022-04-06  9:20         ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
@ 2022-04-06  9:24           ` Jakub Jelinek
  2022-04-06  9:54             ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library Thomas Schwinge
  2022-04-28 13:48           ` [PING] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
  1 sibling, 1 reply; 36+ messages in thread
From: Jakub Jelinek @ 2022-04-06  9:24 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Andrew Stubbs, Julian Brown, gcc-patches, Martin Jambor, Martin Liska

On Wed, Apr 06, 2022 at 11:20:47AM +0200, Thomas Schwinge wrote:
> However, the libgomp GCN plugin is unconditionally built against the
> GCC-shipped 'include/hsa*.h' header files, and at run time does
> 'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
> library "used for builing".  It thus doesn't make sense to amend
> 'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

But perhaps having some other hsa_runtime_lib path in LD_LIBRARY_PATH
allows that dlopen to succeed if libhsa-runtime64.so.1 isn't installed
in the standard searched directories?

	Jakub


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library
  2022-04-06  9:24           ` Jakub Jelinek
@ 2022-04-06  9:54             ` Thomas Schwinge
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-04-06  9:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andrew Stubbs, Julian Brown, gcc-patches, Martin Jambor, Martin Liska

Hi Jakub!

On 2022-04-06T11:24:17+0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Apr 06, 2022 at 11:20:47AM +0200, Thomas Schwinge wrote:
>> However, the libgomp GCN plugin is unconditionally built against the
>> GCC-shipped 'include/hsa*.h' header files, and at run time does
>> 'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
>> library "used for builing".  It thus doesn't make sense to amend
>> 'LD_LIBRARY_PATH' for system-provided HSA Runtime library.
>
> But perhaps having some other hsa_runtime_lib path in LD_LIBRARY_PATH
> allows that dlopen to succeed if libhsa-runtime64.so.1 isn't installed
> in the standard searched directories?

Yes, but that's then standard test harness set up (for example, set
'LD_LIBRARY_PATH' environment variable accordingly) for 'make check'.
In particular, that won't be different for build-tree vs. installed
testing, and shouldn't be done conditional to 'if { $blddir != "" }' in
'libgomp/testsuite/lib/libgomp.exp:libgomp_init'.


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2021-01-14 14:50       ` Thomas Schwinge
  2021-01-19 11:37         ` Martin Jambor
  2022-04-06  9:20         ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
@ 2022-04-06 10:02         ` Thomas Schwinge
  2022-04-28 13:50           ` [PING] " Thomas Schwinge
                             ` (3 more replies)
  2 siblings, 4 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-04-06 10:02 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs, Julian Brown
  Cc: Martin Jambor, Martin Liska, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

Hi!

On 2021-01-14T15:50:23+0100, I wrote:
> I'm raising here an issue with HSA libgomp plugin code changes from a
> while ago.  While HSA is now no longer relevant for GCC master branch,
> the same code has also been copied into the GCN libgomp plugin.

Here is another small clean-up patch (to enable further clean-up):

> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
> build dependence on HSA run-time":
>
> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>> --- a/libgomp/plugin/configfrag.ac
>> +++ b/libgomp/plugin/configfrag.ac
>
>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>              tgt_name=hsa
>>              PLUGIN_HSA=$tgt
>>              PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>> -            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>> -            PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>> +            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>> +            PLUGIN_HSA_LIBS="-ldl"
>
> So this switched from directly linking against 'libhsa-runtime64.so' to a
> 'libdl'-based runtime linking variant.

(Not intending to change anything regarding that.)

Given the 'PLUGIN_HSA_LIBS' change cited above, OK to push the attached
"libgomp GCN plugin: Clean up unused references to system-provided HSA
Runtime library"?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-GCN-plugin-Clean-up-unused-references-to-sys.patch --]
[-- Type: text/x-diff, Size: 3141 bytes --]

From 5e28a267a34282e4d6001e5f89a3b7bd7a0f20c7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 11:31:45 +0200
Subject: [PATCH] libgomp GCN plugin: Clean up unused references to
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.

Originally changed for the libgomp HSA plugin in
commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'.  It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.

	libgomp/
	* plugin/configfrag.ac (HSA_RUNTIME_CPPFLAGS)
	(HSA_RUNTIME_LDFLAGS): Remove.
	* configure: Regenerate.
---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac | 10 ----------
 2 files changed, 20 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index a73a6d44003..8bb67c650a6 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15244,8 +15244,6 @@ HSA_RUNTIME_INCLUDE=
 HSA_RUNTIME_LIB=
 
 
-HSA_RUNTIME_CPPFLAGS=
-HSA_RUNTIME_LDFLAGS=
 
 
 # Check whether --with-hsa-runtime was given.
@@ -15275,12 +15273,6 @@ fi
 if test "x$with_hsa_runtime_lib" != x; then
   HSA_RUNTIME_LIB=$with_hsa_runtime_lib
 fi
-if test "x$HSA_RUNTIME_INCLUDE" != x; then
-  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
-fi
-if test "x$HSA_RUNTIME_LIB" != x; then
-  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
-fi
 
 PLUGIN_GCN=0
 PLUGIN_GCN_CPPFLAGS=
@@ -15390,8 +15382,6 @@ rm -f core conftest.err conftest.$ac_objext \
 	      *)
 		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
-		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
 		PLUGIN_GCN_LIBS="-ldl"
 		PLUGIN_GCN=1
 		;;
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index da573bd8387..56461b89117 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -97,8 +97,6 @@ HSA_RUNTIME_INCLUDE=
 HSA_RUNTIME_LIB=
 AC_SUBST(HSA_RUNTIME_INCLUDE)
 AC_SUBST(HSA_RUNTIME_LIB)
-HSA_RUNTIME_CPPFLAGS=
-HSA_RUNTIME_LDFLAGS=
 
 AC_ARG_WITH(hsa-runtime,
 	[AS_HELP_STRING([--with-hsa-runtime=PATH],
@@ -121,12 +119,6 @@ fi
 if test "x$with_hsa_runtime_lib" != x; then
   HSA_RUNTIME_LIB=$with_hsa_runtime_lib
 fi
-if test "x$HSA_RUNTIME_INCLUDE" != x; then
-  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
-fi
-if test "x$HSA_RUNTIME_LIB" != x; then
-  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
-fi
 
 PLUGIN_GCN=0
 PLUGIN_GCN_CPPFLAGS=
@@ -225,8 +217,6 @@ if test x"$enable_offload_targets" != x; then
 	      *)
 		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
-		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
 		PLUGIN_GCN_LIBS="-ldl"
 		PLUGIN_GCN=1
 		;;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PING] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2022-04-06  9:20         ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
  2022-04-06  9:24           ` Jakub Jelinek
@ 2022-04-28 13:48           ` Thomas Schwinge
  2022-05-05 19:20             ` [PING^2] " Thomas Schwinge
  1 sibling, 1 reply; 36+ messages in thread
From: Thomas Schwinge @ 2022-04-28 13:48 UTC (permalink / raw)
  To: Andrew Stubbs, Julian Brown, gcc-patches
  Cc: Martin Jambor, Martin Liska, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2093 bytes --]

Hi!

Ping.

On 2022-04-06T11:20:47+0200, I wrote:
> On 2021-01-14T15:50:23+0100, I wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
>
> Here is another small clean-up patch (to enable further clean-up):
>
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>             tgt_name=hsa
>>>             PLUGIN_HSA=$tgt
>>>             PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -           PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +           PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
>
> (Not intending to change anything regarding that.)
>
>> For avoidance of doubt, [an earlier] change doesn't affect (build-tree) testsuite
>> usage, where we have:
>>
>>     libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
>>
>>     libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"
>
> But, as I argue in the attached "libgomp testsuite: Don't amend
> 'LD_LIBRARY_PATH' for system-provided HSA Runtime library", we should
> actually clean this up as well.  OK to push that?
>
>
> Grüße
>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-testsuite-Don-t-amend-LD_LIBRARY_PATH-for-sy.patch --]
[-- Type: text/x-diff, Size: 2573 bytes --]

From 364d01339883f5276ef09d68a5d9a2e0010ab641 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 10:39:56 +0200
Subject: [PATCH] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.

'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:

    # For build-tree testing, also consider the library paths used for builing.
    # For installed testing, we assume all that to be provided in the sysroot.
    if { $blddir != "" } {
        [...]
        global hsa_runtime_lib
        if { $hsa_runtime_lib != "" } {
            append always_ld_library_path ":$hsa_runtime_lib"
        }
    }

However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing".  It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Don't
	'append always_ld_library_path ":$hsa_runtime_lib"'.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
---
 libgomp/testsuite/lib/libgomp.exp             | 4 ----
 libgomp/testsuite/libgomp-test-support.exp.in | 1 -
 2 files changed, 5 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 8c5ecfff0ac..0aaa58f19c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -202,10 +202,6 @@ proc libgomp_init { args } {
 	    lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
-	global hsa_runtime_lib
-	if { $hsa_runtime_lib != "" } {
-	    append always_ld_library_path ":$hsa_runtime_lib"
-	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 98fb442b537..3c88d1d5a62 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
-set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
 
 set offload_plugins "@offload_plugins@"
 set offload_targets "@offload_targets@"
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PING] libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
@ 2022-04-28 13:50           ` Thomas Schwinge
  2022-04-28 14:18           ` Andrew Stubbs
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-04-28 13:50 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs, Julian Brown
  Cc: Martin Jambor, Martin Liska, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

Hi!

Ping.

On 2022-04-06T12:02:08+0200, I wrote:
> On 2021-01-14T15:50:23+0100, I wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
>
> Here is another small clean-up patch (to enable further clean-up):
>
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>             tgt_name=hsa
>>>             PLUGIN_HSA=$tgt
>>>             PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -           PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +           PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
>
> (Not intending to change anything regarding that.)
>
> Given the 'PLUGIN_HSA_LIBS' change cited above, OK to push the attached
> "libgomp GCN plugin: Clean up unused references to system-provided HSA
> Runtime library"?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-GCN-plugin-Clean-up-unused-references-to-sys.patch --]
[-- Type: text/x-diff, Size: 3141 bytes --]

From 5e28a267a34282e4d6001e5f89a3b7bd7a0f20c7 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 11:31:45 +0200
Subject: [PATCH] libgomp GCN plugin: Clean up unused references to
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.

Originally changed for the libgomp HSA plugin in
commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'.  It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.

	libgomp/
	* plugin/configfrag.ac (HSA_RUNTIME_CPPFLAGS)
	(HSA_RUNTIME_LDFLAGS): Remove.
	* configure: Regenerate.
---
 libgomp/configure            | 10 ----------
 libgomp/plugin/configfrag.ac | 10 ----------
 2 files changed, 20 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index a73a6d44003..8bb67c650a6 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15244,8 +15244,6 @@ HSA_RUNTIME_INCLUDE=
 HSA_RUNTIME_LIB=
 
 
-HSA_RUNTIME_CPPFLAGS=
-HSA_RUNTIME_LDFLAGS=
 
 
 # Check whether --with-hsa-runtime was given.
@@ -15275,12 +15273,6 @@ fi
 if test "x$with_hsa_runtime_lib" != x; then
   HSA_RUNTIME_LIB=$with_hsa_runtime_lib
 fi
-if test "x$HSA_RUNTIME_INCLUDE" != x; then
-  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
-fi
-if test "x$HSA_RUNTIME_LIB" != x; then
-  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
-fi
 
 PLUGIN_GCN=0
 PLUGIN_GCN_CPPFLAGS=
@@ -15390,8 +15382,6 @@ rm -f core conftest.err conftest.$ac_objext \
 	      *)
 		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
-		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
 		PLUGIN_GCN_LIBS="-ldl"
 		PLUGIN_GCN=1
 		;;
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index da573bd8387..56461b89117 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -97,8 +97,6 @@ HSA_RUNTIME_INCLUDE=
 HSA_RUNTIME_LIB=
 AC_SUBST(HSA_RUNTIME_INCLUDE)
 AC_SUBST(HSA_RUNTIME_LIB)
-HSA_RUNTIME_CPPFLAGS=
-HSA_RUNTIME_LDFLAGS=
 
 AC_ARG_WITH(hsa-runtime,
 	[AS_HELP_STRING([--with-hsa-runtime=PATH],
@@ -121,12 +119,6 @@ fi
 if test "x$with_hsa_runtime_lib" != x; then
   HSA_RUNTIME_LIB=$with_hsa_runtime_lib
 fi
-if test "x$HSA_RUNTIME_INCLUDE" != x; then
-  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
-fi
-if test "x$HSA_RUNTIME_LIB" != x; then
-  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
-fi
 
 PLUGIN_GCN=0
 PLUGIN_GCN_CPPFLAGS=
@@ -225,8 +217,6 @@ if test x"$enable_offload_targets" != x; then
 	      *)
 		tgt_plugin=gcn
 		PLUGIN_GCN=$tgt
-		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
-		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
 		PLUGIN_GCN_LIBS="-ldl"
 		PLUGIN_GCN=1
 		;;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
  2022-04-28 13:50           ` [PING] " Thomas Schwinge
@ 2022-04-28 14:18           ` Andrew Stubbs
  2022-05-11 12:38           ` libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS' (was: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library) Thomas Schwinge
  2022-05-11 12:40           ` libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' " Thomas Schwinge
  3 siblings, 0 replies; 36+ messages in thread
From: Andrew Stubbs @ 2022-04-28 14:18 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, Julian Brown; +Cc: Jakub Jelinek

On 06/04/2022 11:02, Thomas Schwinge wrote:
> Hi!
> 
> On 2021-01-14T15:50:23+0100, I wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
> 
> Here is another small clean-up patch (to enable further clean-up):
> 
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>               tgt_name=hsa
>>>               PLUGIN_HSA=$tgt
>>>               PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -            PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +            PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
> 
> (Not intending to change anything regarding that.)
> 
> Given the 'PLUGIN_HSA_LIBS' change cited above, OK to push the attached
> "libgomp GCN plugin: Clean up unused references to system-provided HSA
> Runtime library"?

OK.

Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PING^2] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)
  2022-04-28 13:48           ` [PING] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
@ 2022-05-05 19:20             ` Thomas Schwinge
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-05-05 19:20 UTC (permalink / raw)
  To: Andrew Stubbs, Julian Brown, gcc-patches; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2224 bytes --]

Hi!

Ping^2.


Grüße
 Thomas


On 2022-04-28T15:48:13+0200, I wrote:
> Hi!
>
> Ping.
>
> On 2022-04-06T11:20:47+0200, I wrote:
>> On 2021-01-14T15:50:23+0100, I wrote:
>>> I'm raising here an issue with HSA libgomp plugin code changes from a
>>> while ago.  While HSA is now no longer relevant for GCC master branch,
>>> the same code has also been copied into the GCN libgomp plugin.
>>
>> Here is another small clean-up patch (to enable further clean-up):
>>
>>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>>> build dependence on HSA run-time":
>>>
>>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>>> --- a/libgomp/plugin/configfrag.ac
>>>> +++ b/libgomp/plugin/configfrag.ac
>>>
>>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>>             tgt_name=hsa
>>>>             PLUGIN_HSA=$tgt
>>>>             PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>>> -           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>>> -           PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>>> +           PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>>> +           PLUGIN_HSA_LIBS="-ldl"
>>>
>>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>>> 'libdl'-based runtime linking variant.
>>
>> (Not intending to change anything regarding that.)
>>
>>> For avoidance of doubt, [an earlier] change doesn't affect (build-tree) testsuite
>>> usage, where we have:
>>>
>>>     libgomp/testsuite/libgomp-test-support.exp.in:set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
>>>
>>>     libgomp/testsuite/lib/libgomp.exp:          append always_ld_library_path ":$hsa_runtime_lib"
>>
>> But, as I argue in the attached "libgomp testsuite: Don't amend
>> 'LD_LIBRARY_PATH' for system-provided HSA Runtime library", we should
>> actually clean this up as well.  OK to push that?
>>
>>
>> Grüße
>>  Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-testsuite-Don-t-amend-LD_LIBRARY_PATH-for-sy.patch --]
[-- Type: text/x-diff, Size: 2573 bytes --]

From 364d01339883f5276ef09d68a5d9a2e0010ab641 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 10:39:56 +0200
Subject: [PATCH] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for
 system-provided HSA Runtime library

This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.

'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:

    # For build-tree testing, also consider the library paths used for builing.
    # For installed testing, we assume all that to be provided in the sysroot.
    if { $blddir != "" } {
        [...]
        global hsa_runtime_lib
        if { $hsa_runtime_lib != "" } {
            append always_ld_library_path ":$hsa_runtime_lib"
        }
    }

However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing".  It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Don't
	'append always_ld_library_path ":$hsa_runtime_lib"'.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
---
 libgomp/testsuite/lib/libgomp.exp             | 4 ----
 libgomp/testsuite/libgomp-test-support.exp.in | 1 -
 2 files changed, 5 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 8c5ecfff0ac..0aaa58f19c5 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -202,10 +202,6 @@ proc libgomp_init { args } {
 	    lappend ALWAYS_CFLAGS "additional_flags=-L$cuda_driver_lib"
 	    append always_ld_library_path ":$cuda_driver_lib"
 	}
-	global hsa_runtime_lib
-	if { $hsa_runtime_lib != "" } {
-	    append always_ld_library_path ":$hsa_runtime_lib"
-	}
     }
 
     # We use atomic operations in the testcases to validate results.
diff --git a/libgomp/testsuite/libgomp-test-support.exp.in b/libgomp/testsuite/libgomp-test-support.exp.in
index 98fb442b537..3c88d1d5a62 100644
--- a/libgomp/testsuite/libgomp-test-support.exp.in
+++ b/libgomp/testsuite/libgomp-test-support.exp.in
@@ -1,6 +1,5 @@
 set cuda_driver_include "@CUDA_DRIVER_INCLUDE@"
 set cuda_driver_lib "@CUDA_DRIVER_LIB@"
-set hsa_runtime_lib "@HSA_RUNTIME_LIB@"
 
 set offload_plugins "@offload_plugins@"
 set offload_targets "@offload_targets@"
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS' (was: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library)
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
  2022-04-28 13:50           ` [PING] " Thomas Schwinge
  2022-04-28 14:18           ` Andrew Stubbs
@ 2022-05-11 12:38           ` Thomas Schwinge
  2022-05-11 12:40           ` libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' " Thomas Schwinge
  3 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-05-11 12:38 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs, Julian Brown; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

Hi!

On 2022-04-06T12:02:08+0200, I wrote:
> On 2021-01-14T15:50:23+0100, I wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
>
> Here is another small clean-up patch (to enable further clean-up):
>
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>              tgt_name=hsa
>>>              PLUGIN_HSA=$tgt
>>>              PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -            PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +            PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
>
> (Not intending to change anything regarding that.)
>
> Given the 'PLUGIN_HSA_LIBS' change cited above, OK to push the attached
> "libgomp GCN plugin: Clean up unused references to system-provided HSA
> Runtime library"?

With that done, I've then pushed to master branch
commit 91a6dcd14915181b4bce51cd44b56a3e9f9d35d8 "libgomp GCN plugin:
Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'", see
attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-GCN-plugin-Clean-up-always-empty-PLUGIN_GCN_.patch --]
[-- Type: text/x-diff, Size: 5506 bytes --]

From 91a6dcd14915181b4bce51cd44b56a3e9f9d35d8 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 12:15:28 +0200
Subject: [PATCH] libgomp GCN plugin: Clean up always-empty
 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'

After recent commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056
"libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime
library", these aren't set anymore.

	libgomp/
	* plugin/Makefrag.am (libgomp_plugin_gcn_la_CPPFLAGS): Don't
	consider 'PLUGIN_GCN_CPPFLAGS'.
	(libgomp_plugin_gcn_la_LDFLAGS): Don't consider
	'PLUGIN_GCN_LDFLAGS'.
	* plugin/configfrag.ac (PLUGIN_GCN_CPPFLAGS, PLUGIN_GCN_LDFLAGS):
	Remove.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
---
 libgomp/Makefile.in           | 10 ++++------
 libgomp/configure             | 10 ++--------
 libgomp/plugin/Makefrag.am    |  3 +--
 libgomp/plugin/configfrag.ac  |  4 ----
 libgomp/testsuite/Makefile.in |  2 --
 5 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 22cb2136a08..1c2ac5695ab 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -434,8 +434,6 @@ PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
 PLUGIN_GCN = @PLUGIN_GCN@
-PLUGIN_GCN_CPPFLAGS = @PLUGIN_GCN_CPPFLAGS@
-PLUGIN_GCN_LDFLAGS = @PLUGIN_GCN_LDFLAGS@
 PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
 PLUGIN_NVPTX = @PLUGIN_NVPTX@
 PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
@@ -574,12 +572,12 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
 # AMD GCN plugin
 @PLUGIN_GCN_TRUE@libgomp_plugin_gcn_version_info = -version-info $(libtool_VERSION)
 @PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_SOURCES = plugin/plugin-gcn.c
-@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_GCN_CPPFLAGS) \
+@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_CPPFLAGS = $(AM_CPPFLAGS) \
 @PLUGIN_GCN_TRUE@	-D_GNU_SOURCE
 
-@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_LDFLAGS =  \
-@PLUGIN_GCN_TRUE@	$(libgomp_plugin_gcn_version_info) \
-@PLUGIN_GCN_TRUE@	$(lt_host_flags) $(PLUGIN_GCN_LDFLAGS)
+@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_LDFLAGS = $(libgomp_plugin_gcn_version_info) \
+@PLUGIN_GCN_TRUE@	$(lt_host_flags)
+
 @PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_LIBADD = libgomp.la $(PLUGIN_GCN_LIBS)
 @PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_LIBTOOLFLAGS = --tag=disable-static
 nodist_noinst_HEADERS = libgomp_f.h
diff --git a/libgomp/configure b/libgomp/configure
index cf1d1fbe195..e735e4c5f2a 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -674,8 +674,6 @@ offload_additional_options
 offload_targets
 offload_plugins
 PLUGIN_GCN_LIBS
-PLUGIN_GCN_LDFLAGS
-PLUGIN_GCN_CPPFLAGS
 PLUGIN_GCN
 HSA_RUNTIME_LIB
 HSA_RUNTIME_INCLUDE
@@ -11431,7 +11429,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11434 "configure"
+#line 11432 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11537,7 +11535,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11540 "configure"
+#line 11538 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15275,14 +15273,10 @@ if test "x$with_hsa_runtime_lib" != x; then
 fi
 
 PLUGIN_GCN=0
-PLUGIN_GCN_CPPFLAGS=
-PLUGIN_GCN_LDFLAGS=
 PLUGIN_GCN_LIBS=
 
 
 
-
-
 # Parse '--enable-offload-targets', figure out the corresponding libgomp
 # plugins, and configure to find the corresponding offload compilers.
 # 'offload_plugins' and 'offload_targets' will be populated in the same order.
diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 3fe50b61cfd..11929d4ff29 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -44,11 +44,10 @@ if PLUGIN_GCN
 libgomp_plugin_gcn_version_info = -version-info $(libtool_VERSION)
 toolexeclib_LTLIBRARIES += libgomp-plugin-gcn.la
 libgomp_plugin_gcn_la_SOURCES = plugin/plugin-gcn.c
-libgomp_plugin_gcn_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_GCN_CPPFLAGS) \
+libgomp_plugin_gcn_la_CPPFLAGS = $(AM_CPPFLAGS) \
 	-D_GNU_SOURCE
 libgomp_plugin_gcn_la_LDFLAGS = $(libgomp_plugin_gcn_version_info) \
 	$(lt_host_flags)
-libgomp_plugin_gcn_la_LDFLAGS += $(PLUGIN_GCN_LDFLAGS)
 libgomp_plugin_gcn_la_LIBADD = libgomp.la $(PLUGIN_GCN_LIBS)
 libgomp_plugin_gcn_la_LIBTOOLFLAGS = --tag=disable-static
 endif
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 52c3da09b43..94d357f9a26 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -121,12 +121,8 @@ if test "x$with_hsa_runtime_lib" != x; then
 fi
 
 PLUGIN_GCN=0
-PLUGIN_GCN_CPPFLAGS=
-PLUGIN_GCN_LDFLAGS=
 PLUGIN_GCN_LIBS=
 AC_SUBST(PLUGIN_GCN)
-AC_SUBST(PLUGIN_GCN_CPPFLAGS)
-AC_SUBST(PLUGIN_GCN_LDFLAGS)
 AC_SUBST(PLUGIN_GCN_LIBS)
 
 # Parse '--enable-offload-targets', figure out the corresponding libgomp
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index e48c3f2f9b0..f0c7da68601 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -212,8 +212,6 @@ PACKAGE_VERSION = @PACKAGE_VERSION@
 PATH_SEPARATOR = @PATH_SEPARATOR@
 PERL = @PERL@
 PLUGIN_GCN = @PLUGIN_GCN@
-PLUGIN_GCN_CPPFLAGS = @PLUGIN_GCN_CPPFLAGS@
-PLUGIN_GCN_LDFLAGS = @PLUGIN_GCN_LDFLAGS@
 PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
 PLUGIN_NVPTX = @PLUGIN_NVPTX@
 PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' (was: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library)
  2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
                             ` (2 preceding siblings ...)
  2022-05-11 12:38           ` libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS' (was: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library) Thomas Schwinge
@ 2022-05-11 12:40           ` Thomas Schwinge
  3 siblings, 0 replies; 36+ messages in thread
From: Thomas Schwinge @ 2022-05-11 12:40 UTC (permalink / raw)
  To: gcc-patches, Andrew Stubbs, Julian Brown; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

Hi!

On 2022-04-06T12:02:08+0200, Thomas Schwinge <thomas@codesourcery.com> wrote:
> On 2021-01-14T15:50:23+0100, I wrote:
>> I'm raising here an issue with HSA libgomp plugin code changes from a
>> while ago.  While HSA is now no longer relevant for GCC master branch,
>> the same code has also been copied into the GCN libgomp plugin.
>
> Here is another small clean-up patch (to enable further clean-up):
>
>> This is commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove
>> build dependence on HSA run-time":
>>
>> On 2016-11-22T14:27:44+0100, Martin Jambor <mjambor@suse.cz> wrote:
>>> --- a/libgomp/plugin/configfrag.ac
>>> +++ b/libgomp/plugin/configfrag.ac
>>
>>> @@ -195,8 +183,8 @@ if test x"$enable_offload_targets" != x; then
>>>              tgt_name=hsa
>>>              PLUGIN_HSA=$tgt
>>>              PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
>>> -            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS $HSA_KMT_LDFLAGS"
>>> -            PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
>>> +            PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
>>> +            PLUGIN_HSA_LIBS="-ldl"
>>
>> So this switched from directly linking against 'libhsa-runtime64.so' to a
>> 'libdl'-based runtime linking variant.
>
> (Not intending to change anything regarding that.)
>
> Given the 'PLUGIN_HSA_LIBS' change cited above, OK to push the attached
> "libgomp GCN plugin: Clean up unused references to system-provided HSA
> Runtime library"?

With that done, I've then pushed to master branch
commit 876ac21b7e796f9efb859dfb46ae2a4126b0b782
"libgomp: Remove unused '--with-hsa-runtime',
'--with-hsa-runtime-include', '--with-hsa-runtime-lib'",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp-Remove-unused-with-hsa-runtime-with-hsa-runt.patch --]
[-- Type: text/x-diff, Size: 7382 bytes --]

From 876ac21b7e796f9efb859dfb46ae2a4126b0b782 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 6 Apr 2022 12:26:13 +0200
Subject: [PATCH] libgomp: Remove unused '--with-hsa-runtime',
 '--with-hsa-runtime-include', '--with-hsa-runtime-lib'

With recent commit 2e309a4eff80e55b53d32d26926a2a94eabfea21 "libgomp testsuite:
Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library",
and commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056 "libgomp GCN plugin:
 Clean up unused references to system-provided HSA Runtime library", the last
uses of '--with-hsa-runtime' etc. are gone.

	gcc/
	* doc/install.texi: Don't document '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib'.
	libgomp/
	* plugin/configfrag.ac: Remove '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib' processing.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
---
 gcc/doc/install.texi          | 12 --------
 libgomp/Makefile.in           |  2 --
 libgomp/configure             | 55 ++---------------------------------
 libgomp/plugin/configfrag.ac  | 29 ------------------
 libgomp/testsuite/Makefile.in |  2 --
 5 files changed, 2 insertions(+), 98 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 76392302653..042241e9fad 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2329,18 +2329,6 @@ those are in separate optional packages and where the presence or absence
 of those optional packages should determine the actual supported offloading
 target set rather than the GCC configure-time selection.
 
-@item --with-hsa-runtime=@var{pathname}
-@itemx --with-hsa-runtime-include=@var{pathname}
-@itemx --with-hsa-runtime-lib=@var{pathname}
-
-If you configure GCC with offloading which uses an HSA run-time such as
-AMDGCN but do not have the HSA run-time library installed in a standard
-location then you can explicitly specify the directory where they are
-installed.  The @option{--with-hsa-runtime=@/@var{hsainstalldir}} option
-is a shorthand for
-@option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
-@option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
-
 @item --enable-cet
 @itemx --disable-cet
 Enable building target run-time libraries with control-flow
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 1c2ac5695ab..f2712aa5133 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -385,8 +385,6 @@ FC = @FC@
 FCFLAGS = @FCFLAGS@
 FGREP = @FGREP@
 GREP = @GREP@
-HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
-HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
diff --git a/libgomp/configure b/libgomp/configure
index e735e4c5f2a..3de8eb2641f 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -675,8 +675,6 @@ offload_targets
 offload_plugins
 PLUGIN_GCN_LIBS
 PLUGIN_GCN
-HSA_RUNTIME_LIB
-HSA_RUNTIME_INCLUDE
 PLUGIN_NVPTX_LIBS
 PLUGIN_NVPTX_LDFLAGS
 PLUGIN_NVPTX_CPPFLAGS
@@ -834,9 +832,6 @@ enable_maintainer_mode
 with_cuda_driver
 with_cuda_driver_include
 with_cuda_driver_lib
-with_hsa_runtime
-with_hsa_runtime_include
-with_hsa_runtime_lib
 enable_linux_futex
 enable_tls
 enable_symvers
@@ -1519,16 +1514,6 @@ Optional Packages:
   --with-cuda-driver-lib=PATH
                           specify directory for the installed CUDA driver
                           library
-  --with-hsa-runtime=PATH specify prefix directory for installed HSA run-time
-                          package. Equivalent to
-                          --with-hsa-runtime-include=PATH/include plus
-                          --with-hsa-runtime-lib=PATH/lib
-  --with-hsa-runtime-include=PATH
-                          specify directory for installed HSA run-time include
-                          files
-  --with-hsa-runtime-lib=PATH
-                          specify directory for the installed HSA run-time
-                          library
   --with-gcc-major-version-only
                           use only GCC major number in filesystem paths
 
@@ -11429,7 +11414,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11432 "configure"
+#line 11417 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11535,7 +11520,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11538 "configure"
+#line 11523 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15236,42 +15221,6 @@ PLUGIN_NVPTX_DYNAMIC=0
 
 
 
-# Look for HSA run-time, its includes and libraries
-
-HSA_RUNTIME_INCLUDE=
-HSA_RUNTIME_LIB=
-
-
-
-
-# Check whether --with-hsa-runtime was given.
-if test "${with_hsa_runtime+set}" = set; then :
-  withval=$with_hsa_runtime;
-fi
-
-
-# Check whether --with-hsa-runtime-include was given.
-if test "${with_hsa_runtime_include+set}" = set; then :
-  withval=$with_hsa_runtime_include;
-fi
-
-
-# Check whether --with-hsa-runtime-lib was given.
-if test "${with_hsa_runtime_lib+set}" = set; then :
-  withval=$with_hsa_runtime_lib;
-fi
-
-if test "x$with_hsa_runtime" != x; then
-  HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
-  HSA_RUNTIME_LIB=$with_hsa_runtime/lib
-fi
-if test "x$with_hsa_runtime_include" != x; then
-  HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
-fi
-if test "x$with_hsa_runtime_lib" != x; then
-  HSA_RUNTIME_LIB=$with_hsa_runtime_lib
-fi
-
 PLUGIN_GCN=0
 PLUGIN_GCN_LIBS=
 
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 94d357f9a26..9eeac4562e4 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -91,35 +91,6 @@ AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
 AC_SUBST(PLUGIN_NVPTX_LIBS)
 
-# Look for HSA run-time, its includes and libraries
-
-HSA_RUNTIME_INCLUDE=
-HSA_RUNTIME_LIB=
-AC_SUBST(HSA_RUNTIME_INCLUDE)
-AC_SUBST(HSA_RUNTIME_LIB)
-
-AC_ARG_WITH(hsa-runtime,
-	[AS_HELP_STRING([--with-hsa-runtime=PATH],
-		[specify prefix directory for installed HSA run-time package.
-		 Equivalent to --with-hsa-runtime-include=PATH/include
-		 plus --with-hsa-runtime-lib=PATH/lib])])
-AC_ARG_WITH(hsa-runtime-include,
-	[AS_HELP_STRING([--with-hsa-runtime-include=PATH],
-		[specify directory for installed HSA run-time include files])])
-AC_ARG_WITH(hsa-runtime-lib,
-	[AS_HELP_STRING([--with-hsa-runtime-lib=PATH],
-		[specify directory for the installed HSA run-time library])])
-if test "x$with_hsa_runtime" != x; then
-  HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
-  HSA_RUNTIME_LIB=$with_hsa_runtime/lib
-fi
-if test "x$with_hsa_runtime_include" != x; then
-  HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
-fi
-if test "x$with_hsa_runtime_lib" != x; then
-  HSA_RUNTIME_LIB=$with_hsa_runtime_lib
-fi
-
 PLUGIN_GCN=0
 PLUGIN_GCN_LIBS=
 AC_SUBST(PLUGIN_GCN)
diff --git a/libgomp/testsuite/Makefile.in b/libgomp/testsuite/Makefile.in
index f0c7da68601..32be337b8fc 100644
--- a/libgomp/testsuite/Makefile.in
+++ b/libgomp/testsuite/Makefile.in
@@ -163,8 +163,6 @@ FC = @FC@
 FCFLAGS = @FCFLAGS@
 FGREP = @FGREP@
 GREP = @GREP@
-HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
-HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
 INSTALL = @INSTALL@
 INSTALL_DATA = @INSTALL_DATA@
 INSTALL_PROGRAM = @INSTALL_PROGRAM@
-- 
2.35.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-05-11 12:41 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-13 23:20 [PATCH 0/4] Merge from HSA branch to trunk Martin Jambor
2016-11-13 23:20 ` [PATCH 4/4] Back-end and IPA bits of hsa branch merge Martin Jambor
     [not found]   ` <yxfpftb48jra.fsf@hertz.schwinge.homeip.net>
2020-06-17 21:57     ` [HSA] Avoid ICE when "HSA does not implement indirect calls" (was: [PATCH 4/4] Back-end and IPA bits of hsa branch merge) Thomas Schwinge
2016-11-13 23:20 ` [PATCH 1/4] Remove build dependence on HSA run-time Martin Jambor
2016-11-18 10:23   ` Jakub Jelinek
2016-11-22 13:27     ` Martin Jambor
2016-11-22 14:13       ` Jakub Jelinek
2021-01-14 14:50       ` Thomas Schwinge
2021-01-19 11:37         ` Martin Jambor
2021-01-19 12:49           ` Martin Liška
2021-03-25 13:40           ` Thomas Schwinge
2022-04-06  9:20         ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
2022-04-06  9:24           ` Jakub Jelinek
2022-04-06  9:54             ` libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library Thomas Schwinge
2022-04-28 13:48           ` [PING] libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time) Thomas Schwinge
2022-05-05 19:20             ` [PING^2] " Thomas Schwinge
2022-04-06 10:02         ` libgomp GCN plugin: Clean up unused references to " Thomas Schwinge
2022-04-28 13:50           ` [PING] " Thomas Schwinge
2022-04-28 14:18           ` Andrew Stubbs
2022-05-11 12:38           ` libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS' (was: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library) Thomas Schwinge
2022-05-11 12:40           ` libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' " Thomas Schwinge
2016-11-13 23:20 ` [PATCH 2/4] HSA specific built-ins Martin Jambor
2016-11-18 10:27   ` Jakub Jelinek
2016-11-22 13:30     ` Martin Jambor
2016-11-13 23:20 ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
2016-11-18 10:39   ` Jakub Jelinek
2016-03-16 14:13     ` [omp] Create openmp -fopt-info optimization group Martin Jambor
2017-02-21  8:09       ` [gomp4] add -finform-parallelism Cesar Philippidis
2017-02-22  8:28         ` Thomas Schwinge
2017-02-23 15:06           ` Cesar Philippidis
2016-11-22 13:43     ` [PATCH 3/4] OpenMP lowering changes from the hsa branch Martin Jambor
2017-02-22  7:58       ` Rename the "openmp" group of optimizations to "omp" (was: [PATCH 3/4] OpenMP lowering changes from the hsa branch) Thomas Schwinge
2017-02-22  8:17         ` Miscellaneous optimization group fixes (was: Rename the "openmp" group of optimizations to "omp") Thomas Schwinge
2017-02-22  9:53           ` Martin Jambor
2017-02-28  8:52             ` Rename the "openmp" group of optimizations to "omp" (was: Miscellaneous optimization group fixes) Thomas Schwinge
2017-02-28  9:04             ` Miscellaneous optimization group fixes Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).